Handle Outliers

class handle_outliers.DBSCANOutlierDetector(eps: float = 0.5, min_samples: int = 5, **kwargs: Any)[source]

Detects outliers using the DBSCAN method.

fit(X: DataFrame, y: Series | None = None) DBSCANOutlierDetector[source]

Fits the DBSCAN model.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted detector.

Return type:

DBSCANOutlierDetector

get_outliers() Series[source]

Returns a boolean Series indicating outliers.

Returns:

Boolean Series indicating True for outliers.

Return type:

pd.Series

transform(X: DataFrame) DataFrame[source]

Removes outliers from the dataset.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with outliers removed.

Return type:

pd.DataFrame

class handle_outliers.IQRBasedOutlierDetector(factor: float = 1.5)[source]

Detects outliers using the Interquartile Range (IQR) method.

fit(X: DataFrame, y: Series | None = None) IQRBasedOutlierDetector[source]

Calculates IQR for the dataset.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted detector.

Return type:

IQRBasedOutlierDetector

get_outliers() DataFrame[source]

Returns a boolean DataFrame indicating outliers.

Returns:

Boolean DataFrame indicating True for outliers.

Return type:

pd.DataFrame

transform(X: DataFrame) DataFrame[source]

Removes outliers from the dataset.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with outliers removed.

Return type:

pd.DataFrame

class handle_outliers.IsolationForestOutlierDetector(contamination: float = 0.1, random_state: int | None = None, **kwargs: Any)[source]

Detects outliers using the Isolation Forest method.

fit(X: DataFrame, y: Series | None = None) IsolationForestOutlierDetector[source]

Fits the Isolation Forest model.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted detector.

Return type:

IsolationForestOutlierDetector

get_outliers() Series[source]

Returns a boolean Series indicating outliers.

Returns:

Boolean Series indicating True for outliers.

Return type:

pd.Series

transform(X: DataFrame) DataFrame[source]

Removes outliers from the dataset.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with outliers removed.

Return type:

pd.DataFrame

class handle_outliers.OutlierCapper(method: str = 'iqr', factor: float = 1.5)[source]

Caps outliers by setting values beyond a threshold to a maximum or minimum value.

fit(X: DataFrame, y: Series | None = None) OutlierCapper[source]

Calculates the bounds for capping outliers.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted transformer.

Return type:

OutlierCapper

transform(X: DataFrame) DataFrame[source]

Caps outliers in the data.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with outliers capped.

Return type:

pd.DataFrame

class handle_outliers.RobustScalerTransformer(with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), copy: bool = True, unit_variance: bool = False)[source]

Scales data using the RobustScaler method, which is less sensitive to outliers.

fit(X: DataFrame, y: Series | None = None) RobustScalerTransformer[source]

Fits the RobustScaler to the data.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted transformer.

Return type:

RobustScalerTransformer

transform(X: DataFrame) DataFrame[source]

Transforms the data using the RobustScaler.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

Scaled DataFrame.

Return type:

pd.DataFrame

class handle_outliers.Winsorizer(limits: Tuple[float, float] = (0.05, 0.05))[source]

Applies Winsorization to limit extreme values in the data.

fit(X: DataFrame, y: Series | None = None) Winsorizer[source]

Fits the Winsorizer (no action needed).

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted transformer.

Return type:

Winsorizer

transform(X: DataFrame) DataFrame[source]

Applies Winsorization to the data.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

Winsorized DataFrame.

Return type:

pd.DataFrame

class handle_outliers.ZScoreOutlierDetector(threshold: float = 3.0)[source]

Detects outliers using the Z-Score method.

fit(X: DataFrame, y: Series | None = None) ZScoreOutlierDetector[source]

Calculates Z-scores for the dataset.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (not used).

Returns:

Fitted detector.

Return type:

ZScoreOutlierDetector

get_outliers() DataFrame[source]

Returns a boolean DataFrame indicating outliers.

Returns:

Boolean DataFrame indicating True for outliers.

Return type:

pd.DataFrame

transform(X: DataFrame) DataFrame[source]

Removes outliers from the dataset.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with outliers removed.

Return type:

pd.DataFrame