Select Features
- class select_features.AutoFeatureSelector(problem_type: str = 'classification', model: Any | None = None, param_distributions: Dict[str, Any] | None = None, cv: int = 5, n_iter: int = 50, scoring: str | None = None, random_state: int = 42, search_type: str = 'grid')[source]
A transformer that automatically selects the best feature selection method and optimizes its parameters.
- fit(X: DataFrame, y: Series | None = None) AutoFeatureSelector[source]
Fits the feature selector to the data, automatically selecting the best method and parameters.
- Parameters:
X (pd.DataFrame) – The input feature matrix.
y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.
- Returns:
Returns self.
- Return type:
AutomatedFeatureSelector
- fit_transform(X: DataFrame, y: Series | None = None) DataFrame[source]
Fits the feature selector and transforms the input data to contain only the selected features.
- Parameters:
X (pd.DataFrame) – The input feature matrix.
y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.
- Returns:
The transformed feature matrix containing only the selected features.
- Return type:
pd.DataFrame
- get_feature_names_out(input_features: List[str] | None = None) List[str][source]
Get output feature names for transformation.
- Parameters:
input_features (List[str], optional) – Input feature names. If None, feature names are taken from the DataFrame columns.
- Returns:
The list of selected feature names.
- Return type:
List[str]
- get_support(indices: bool = False) ndarray | List[int][source]
Get a mask, or integer index, of the features selected.
- Parameters:
indices (bool) – If True, the return value will be an array of indices of the selected features. If False, the return value will be a boolean mask.
- Returns:
The mask of selected features, or array of indices.
- Return type:
Union[np.ndarray, List[int]]
- class select_features.FeatureSelector(method: str = 'kbest_anova', k: int = 10, threshold: float = 0.0, model: Any | None = None, estimator: Any | None = None, scoring: str | None = None, alpha: float = 1.0, corr_threshold: float = 0.9, problem_type: str = 'classification', **kwargs: Any)[source]
A transformer for selecting important features from datasets using various statistical tests and model-based methods. This class provides several techniques, including chi-squared tests, ANOVA F-tests, mutual information, recursive feature elimination (RFE), Lasso (L1) regularization, and correlation-based elimination.
- fit(X: DataFrame, y: Series | None = None) FeatureSelector[source]
Fits the feature selector to the data.
- Parameters:
X (pd.DataFrame) – The input feature matrix.
y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.
- Returns:
Returns self.
- Return type:
- fit_transform(X: DataFrame, y: Series | None = None) DataFrame[source]
Fits the feature selector and transforms the input data to contain only the selected features.
- Parameters:
X (pd.DataFrame) – The input feature matrix.
y (pd.Series, optional) – The target variable. Required for supervised feature selection methods.
- Returns:
The transformed feature matrix containing only the selected features.
- Return type:
pd.DataFrame
- get_feature_names_out(input_features: List[str] | None = None) List[str][source]
Get output feature names for transformation.
- Parameters:
input_features (List[str], optional) – Input feature names. If None, feature names are taken from the DataFrame columns.
- Returns:
The list of selected feature names.
- Return type:
List[str]
- get_support(indices: bool = False) ndarray | List[int][source]
Get a mask, or integer index, of the features selected.
- Parameters:
indices (bool) – If True, the return value will be an array of indices of the selected features. If False, the return value will be a boolean mask.
- Returns:
The mask of selected features, or array of indices.
- Return type:
Union[np.ndarray, List[int]]