Select Features

class select_features.AutoFeatureSelector(problem_type: str = 'classification', model: Any | None = None, param_distributions: Dict[str, Any] | None = None, cv: int = 5, n_iter: int = 50, scoring: str | None = None, random_state: int = 42, search_type: str = 'grid')[source]

A transformer that automatically selects the best feature selection method and optimizes its parameters.

fit(X: DataFrame, y: Series | None = None) AutoFeatureSelector[source]

Fits the feature selector to the data, automatically selecting the best method and parameters.

Parameters:
  • X (pd.DataFrame) – The input feature matrix.

  • y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.

Returns:

Returns self.

Return type:

AutomatedFeatureSelector

fit_transform(X: DataFrame, y: Series | None = None) DataFrame[source]

Fits the feature selector and transforms the input data to contain only the selected features.

Parameters:
  • X (pd.DataFrame) – The input feature matrix.

  • y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.

Returns:

The transformed feature matrix containing only the selected features.

Return type:

pd.DataFrame

get_feature_names_out(input_features: List[str] | None = None) List[str][source]

Get output feature names for transformation.

Parameters:

input_features (List[str], optional) – Input feature names. If None, feature names are taken from the DataFrame columns.

Returns:

The list of selected feature names.

Return type:

List[str]

get_support(indices: bool = False) ndarray | List[int][source]

Get a mask, or integer index, of the features selected.

Parameters:

indices (bool) – If True, the return value will be an array of indices of the selected features. If False, the return value will be a boolean mask.

Returns:

The mask of selected features, or array of indices.

Return type:

Union[np.ndarray, List[int]]

transform(X: DataFrame) DataFrame[source]

Transforms the input data to contain only the selected features.

Parameters:

X (pd.DataFrame) – The input feature matrix.

Returns:

The transformed feature matrix containing only the selected features.

Return type:

pd.DataFrame

class select_features.FeatureSelector(method: str = 'kbest_anova', k: int = 10, threshold: float = 0.0, model: Any | None = None, estimator: Any | None = None, scoring: str | None = None, alpha: float = 1.0, corr_threshold: float = 0.9, problem_type: str = 'classification', **kwargs: Any)[source]

A transformer for selecting important features from datasets using various statistical tests and model-based methods. This class provides several techniques, including chi-squared tests, ANOVA F-tests, mutual information, recursive feature elimination (RFE), Lasso (L1) regularization, and correlation-based elimination.

fit(X: DataFrame, y: Series | None = None) FeatureSelector[source]

Fits the feature selector to the data.

Parameters:
  • X (pd.DataFrame) – The input feature matrix.

  • y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.

Returns:

Returns self.

Return type:

FeatureSelector

fit_transform(X: DataFrame, y: Series | None = None) DataFrame[source]

Fits the feature selector and transforms the input data to contain only the selected features.

Parameters:
  • X (pd.DataFrame) – The input feature matrix.

  • y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.

Returns:

The transformed feature matrix containing only the selected features.

Return type:

pd.DataFrame

get_feature_names_out(input_features: List[str] | None = None) List[str][source]

Get output feature names for transformation.

Parameters:

input_features (List[str], optional) – Input feature names. If None, feature names are taken from the DataFrame columns.

Returns:

The list of selected feature names.

Return type:

List[str]

get_support(indices: bool = False) ndarray | List[int][source]

Get a mask, or integer index, of the features selected.

Parameters:

indices (bool) – If True, the return value will be an array of indices of the selected features. If False, the return value will be a boolean mask.

Returns:

The mask of selected features, or array of indices.

Return type:

Union[np.ndarray, List[int]]

transform(X: DataFrame) DataFrame[source]

Transforms the input data to contain only the selected features.

Parameters:

X (pd.DataFrame) – The input feature matrix.

Returns:

The transformed feature matrix containing only the selected features.

Return type:

pd.DataFrame