Solve Regression

class solve_regression.RegressionSolver(models: Dict[str, BaseEstimator] | None = None, random_state: int = 42)[source]

A comprehensive class for solving regression problems using various machine learning models. Includes methods for data preprocessing, model training, evaluation, hyperparameter tuning, cross-validation, and model persistence.

auto_select_best_model(X_train: DataFrame, y_train: Series, cv: int = 5, scoring: str = 'neg_mean_squared_error') → Tuple[str, float][source]

Automatically selects the best model based on cross-validated score. It checks if a hyperparameter-tuned version of the model is available and uses it if present.

Parameters:

X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
cv (int) – Number of cross-validation folds (default: 5).
scoring (str) – Scoring metric for evaluation.

Returns:

The name of the best performing model and its score based on cross-validation.

Return type:

Tuple[str, float]

evaluate_model(model: BaseEstimator, X_test: DataFrame, y_test: Series) → Dict[str, Any][source]

Evaluates the regression model on test data.

Parameters:

model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.

Returns:

A dictionary containing evaluation metrics.

Return type:

Dict[str, Any]

hyperparameter_tuning(model_name: str, X_train: DataFrame, y_train: Series, param_grid: Dict[str, List[Any]] | None = None, cv: int = 5, search_type: str = 'grid', n_iter: int = 50, scoring: str = 'neg_mean_squared_error') → None[source]

Performs hyperparameter tuning using GridSearchCV or RandomizedSearchCV for one or all models and stores the best models.

Parameters:

model_name (str) – The name of the model to tune. If ‘all’, tunes all models in self.models.
X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
param_grid (Optional[Dict[str, List[Any]]]) – Parameter grid for hyperparameter tuning. If None, uses default.
cv (int) – Number of cross-validation folds.
search_type (str) – Type of search (‘grid’ or ‘random’).
n_iter (int) – Number of iterations for RandomizedSearchCV.
scoring (str) – Scoring metric for evaluation.

Returns:

The best models are stored in self.tuned_models.

Return type:

None

load_model(filename: str) → BaseEstimator[source]

Loads a trained model from disk.

Parameters:: filename (str) – The path and filename to load the model from.
Returns:: The loaded model.
Return type:: BaseEstimator

plot_feature_importance(model: BaseEstimator, feature_names: List[str]) → None[source]

Plots feature importance for models that support it.

Parameters:

model (BaseEstimator) – The trained model.
feature_names (List[str]) – List of feature names.

plot_learning_curve(model: BaseEstimator, X_train: DataFrame, y_train: Series, cv: int = 5, scoring: str = 'neg_mean_squared_error') → None[source]

Plots the learning curve of the model.

Parameters:

model (BaseEstimator) – The model to plot learning curve for.
X_train (pd.DataFrame) – Feature matrix.
y_train (pd.Series) – Target variable.
cv (int) – Number of cross-validation folds.
scoring (str) – Scoring metric.

plot_residual_distribution(model: BaseEstimator, X_test: DataFrame, y_test: Series) → None[source]

Plots the distribution of residuals (prediction errors).

Parameters:

model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.

plot_residuals(model: BaseEstimator, X_test: DataFrame, y_test: Series) → None[source]

Plots residuals of the regression model.

Parameters:

model (BaseEstimator) – The trained model.
X_test (pd.DataFrame) – Testing features.
y_test (pd.Series) – Testing target.

save_model(model: BaseEstimator, filename: str) → None[source]

Saves the trained model to disk.

Parameters:

model (BaseEstimator) – The trained model.
filename (str) – The path and filename to save the model.

split_data(X: DataFrame, y: Series, test_size: float = 0.2, random_state: int | None = None) → Tuple[DataFrame, DataFrame, Series, Series][source]

Splits the data into training and testing sets.

Parameters:

X (pd.DataFrame) – Feature matrix.
y (pd.Series) – Target variable.
test_size (float) – Proportion of the dataset to include in the test split.
random_state (Optional[int]) – Random seed.

Returns:

Training and testing sets for features and target.

Return type:

Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series]

train_model(model_name: str, X_train: DataFrame, y_train: Series, use_pipeline: bool = False) → BaseEstimator[source]

Trains a given regression model.

Parameters:

model_name (str) – The name of the model to train.
X_train (pd.DataFrame) – Training features.
y_train (pd.Series) – Training target.
use_pipeline (bool) – Whether to use a pipeline with scaling.

Returns:

The trained model.

Return type:

BaseEstimator