Solve Regression
- class solve_regression.RegressionSolver(models: Dict[str, BaseEstimator] | None = None, random_state: int = 42)[source]
A comprehensive class for solving regression problems using various machine learning models. Includes methods for data preprocessing, model training, evaluation, hyperparameter tuning, cross-validation, and model persistence.
- auto_select_best_model(X_train: DataFrame, y_train: Series, cv: int = 5, scoring: str = 'neg_mean_squared_error') Tuple[str, float][source]
Automatically selects the best model based on cross-validated score. It checks if a hyperparameter-tuned version of the model is available and uses it if present.
- Parameters:
X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
cv (int) – Number of cross-validation folds (default: 5).
scoring (str) – Scoring metric for evaluation.
- Returns:
The name of the best performing model and its score based on cross-validation.
- Return type:
Tuple[str, float]
- evaluate_model(model: BaseEstimator, X_test: DataFrame, y_test: Series) Dict[str, Any][source]
Evaluates the regression model on test data.
- Parameters:
model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.
- Returns:
A dictionary containing evaluation metrics.
- Return type:
Dict[str, Any]
- hyperparameter_tuning(model_name: str, X_train: DataFrame, y_train: Series, param_grid: Dict[str, List[Any]] | None = None, cv: int = 5, search_type: str = 'grid', n_iter: int = 50, scoring: str = 'neg_mean_squared_error') None[source]
Performs hyperparameter tuning using GridSearchCV or RandomizedSearchCV for one or all models and stores the best models.
- Parameters:
model_name (str) – The name of the model to tune. If ‘all’, tunes all models in self.models.
X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
param_grid (Optional[Dict[str, List[Any]]]) – Parameter grid for hyperparameter tuning. If None, uses default.
cv (int) – Number of cross-validation folds.
search_type (str) – Type of search (‘grid’ or ‘random’).
n_iter (int) – Number of iterations for RandomizedSearchCV.
scoring (str) – Scoring metric for evaluation.
- Returns:
The best models are stored in self.tuned_models.
- Return type:
None
- load_model(filename: str) BaseEstimator[source]
Loads a trained model from disk.
- Parameters:
filename (str) – The path and filename to load the model from.
- Returns:
The loaded model.
- Return type:
BaseEstimator
- plot_feature_importance(model: BaseEstimator, feature_names: List[str]) None[source]
Plots feature importance for models that support it.
- Parameters:
model (BaseEstimator) – The trained model.
feature_names (List[str]) – List of feature names.
- plot_learning_curve(model: BaseEstimator, X_train: DataFrame, y_train: Series, cv: int = 5, scoring: str = 'neg_mean_squared_error') None[source]
Plots the learning curve of the model.
- Parameters:
model (BaseEstimator) – The model to plot learning curve for.
X_train (pd.DataFrame) – Feature matrix.
y_train (pd.Series) – Target variable.
cv (int) – Number of cross-validation folds.
scoring (str) – Scoring metric.
- plot_residual_distribution(model: BaseEstimator, X_test: DataFrame, y_test: Series) None[source]
Plots the distribution of residuals (prediction errors).
- Parameters:
model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.
- plot_residuals(model: BaseEstimator, X_test: DataFrame, y_test: Series) None[source]
Plots residuals of the regression model.
- Parameters:
model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.
- save_model(model: BaseEstimator, filename: str) None[source]
Saves the trained model to disk.
- Parameters:
model (BaseEstimator) – The trained model.
filename (str) – The path and filename to save the model.
- split_data(X: DataFrame, y: Series, test_size: float = 0.2, random_state: int | None = None) Tuple[DataFrame, DataFrame, Series, Series][source]
Splits the data into training and testing sets.
- Parameters:
X (pd.DataFrame) – Feature matrix.
y (pd.Series) – Target variable.
test_size (float) – Proportion of the dataset to include in the test split.
random_state (Optional[int]) – Random seed.
- Returns:
Training and testing sets for features and target.
- Return type:
Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series]
- train_model(model_name: str, X_train: DataFrame, y_train: Series, use_pipeline: bool = False) BaseEstimator[source]
Trains a given regression model.
- Parameters:
model_name (str) – The name of the model to train.
X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
use_pipeline (bool) – Whether to use a pipeline with scaling.
- Returns:
The trained model.
- Return type:
BaseEstimator