Group Features

class group_features.GroupByFeatureGenerator(group_by: str | List[str], aggregations: Dict[str, List[str]], suffix: str | None = None, as_index: bool = False)[source]

A transformer that generates aggregated features by grouping data based on categorical or time-based features. It supports grouping by single or multiple categories, time-based aggregation, rolling aggregation, and percentile calculation.

fit(X: DataFrame, y: Series | None = None) GroupByFeatureGenerator[source]

Fits the transformer by performing group-by aggregations.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (ignored).

Returns:

Returns self.

Return type:

GroupByFeatureGenerator

transform(X: DataFrame) DataFrame[source]

Merges the aggregated features back into the original DataFrame.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with new aggregated features.

Return type:

pd.DataFrame

class group_features.PercentileCalculator(group_by: str | List[str], column: str, percentiles: List[float], suffix: str | None = None)[source]

A transformer that calculates specified percentiles for grouped data.

fit(X: DataFrame, y: Series | None = None) PercentileCalculator[source]

Fits the transformer by calculating the percentiles.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (ignored).

Returns:

Returns self.

Return type:

PercentileCalculator

transform(X: DataFrame) DataFrame[source]

Merges the calculated percentiles back into the original DataFrame.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with percentile features added.

Return type:

pd.DataFrame

class group_features.RollingAggregator(columns: str | List[str], window: int, statistics: List[str], group_by: str | List[str] | None = None, min_periods: int = 1, center: bool = False, suffix: str | None = None)[source]

A transformer that applies rolling window aggregations on numerical columns.

fit(X: DataFrame, y: Series | None = None) RollingAggregator[source]

Fit method does nothing as no fitting is required.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (ignored).

Returns:

Returns self.

Return type:

RollingAggregator

transform(X: DataFrame) DataFrame[source]

Applies rolling aggregations to the specified columns.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with rolling aggregation features added.

Return type:

pd.DataFrame

class group_features.TimeBasedAggregator(datetime_column: str, aggregations: Dict[str, str | List[str]], rule: str, suffix: str | None = None)[source]

A transformer that performs time-based aggregation using resampling rules.

fit(X: DataFrame, y: Series | None = None) TimeBasedAggregator[source]

Fits the transformer by resampling and aggregating the data.

Parameters:
  • X (pd.DataFrame) – Input DataFrame.

  • y (pd.Series, optional) – Target variable (ignored).

Returns:

Returns self.

Return type:

TimeBasedAggregator

transform(X: DataFrame) DataFrame[source]

Merges the resampled and aggregated features back into the original DataFrame.

Parameters:

X (pd.DataFrame) – Input DataFrame.

Returns:

DataFrame with new time-based aggregated features.

Return type:

pd.DataFrame