Operate DataFrame

class operate_dataframe.DataFrameOperator[source]

A class that provides various DataFrame operations such as merging, concatenation, splitting, and other utility functions for DataFrame manipulation.

static apply_function(df: DataFrame, columns: List[str], func: Callable, element_wise: bool = True) DataFrame[source]

Apply a custom function to specified columns.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • columns (List[str]) – List of column names to apply the function to.

  • func (Callable) – The function to apply.

  • element_wise (bool, optional) – If True, apply function element-wise. If False, apply column-wise. Defaults to True.

Returns:

A DataFrame with the function applied to the specified columns.

Return type:

pd.DataFrame

static change_column_types(df: DataFrame, columns_types: Dict[str, str | type]) DataFrame[source]

Change the data types of specified columns.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • columns_types (Dict[str, Union[str, type]]) – A dictionary mapping column names to target data types.

Returns:

A DataFrame with the specified column types changed.

Return type:

pd.DataFrame

static concat_dataframes(dfs: List[DataFrame], axis: int = 0, join: str = 'outer', ignore_index: bool = False, keys: List | None = None, levels: List | None = None, names: List[str] | None = None, verify_integrity: bool = False, sort: bool = False, copy: bool = True) DataFrame[source]

Concatenate pandas objects along a particular axis.

Parameters:
  • dfs (List[pd.DataFrame]) – List of DataFrames to concatenate.

  • axis (int, optional) – The axis to concatenate along (0 for index, 1 for columns). Defaults to 0.

  • join (str, optional) – How to handle indexes on other axes (‘inner’, ‘outer’). Defaults to ‘outer’.

  • ignore_index (bool, optional) – If True, do not use the index values along the concatenation axis. Defaults to False.

  • keys (List, optional) – Sequence of keys to use to construct a hierarchical index. Defaults to None.

  • levels (List, optional) – Specific levels to use for the hierarchical index. Defaults to None.

  • names (List[str], optional) – Names for the levels in the resulting hierarchical index. Defaults to None.

  • verify_integrity (bool, optional) – Check whether the new concatenated axis contains duplicates. Defaults to False.

  • sort (bool, optional) – Sort non-concatenation axis if not aligned. Defaults to False.

  • copy (bool, optional) – If False, do not copy data unnecessarily. Defaults to True.

Returns:

The concatenated DataFrame.

Return type:

pd.DataFrame

static drop_columns(df: DataFrame, columns: List[str | int]) DataFrame[source]

Drop specified columns from the DataFrame by name or index position.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • columns (List[Union[str, int]]) – List of column names or index positions to drop.

Returns:

A DataFrame with the specified columns dropped.

Return type:

pd.DataFrame

static drop_duplicates(df: DataFrame, subset: List[str] | None = None, keep: str = 'first', inplace: bool = False, ignore_index: bool = False) DataFrame[source]

Remove duplicate rows from the DataFrame.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • subset (List[str], optional) – Columns to consider when identifying duplicates.

  • keep (str, optional) – Which duplicate to keep (‘first’, ‘last’, False). Defaults to ‘first’.

  • inplace (bool, optional) – If True, perform operation in-place. Defaults to False.

  • ignore_index (bool, optional) – If True, reset index after dropping duplicates. Defaults to False.

Returns:

The DataFrame with duplicates removed.

Return type:

pd.DataFrame

static fill_missing(df: DataFrame, value: float | Dict[str, float | str] | None = 0, columns: List[str] | None = None, method: str | None = None, axis: int | None = None, limit: int | None = None) DataFrame[source]

Fill missing values in the DataFrame.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • value (Union[float, Dict[str, Union[float, str]]], optional) – Value to use for filling holes.

  • columns (List[str], optional) – Specific columns to fill missing values in.

  • method (str, optional) – Method to use for filling holes (‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None).

  • axis (int, optional) – Axis along which to fill missing values.

  • limit (int, optional) – Maximum number of consecutive NaNs to fill.

Returns:

A DataFrame with missing values filled.

Return type:

pd.DataFrame

static filter_rows(df: DataFrame, condition: str) DataFrame[source]

Filter rows in the DataFrame based on a given condition.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • condition (str) – The condition to filter rows by (e.g., “age > 30”).

Returns:

A new DataFrame with filtered rows.

Return type:

pd.DataFrame

static groupby(df: DataFrame, by: str | List[str], agg_funcs: str | List[str] | Dict[str, str | List[str]]) DataFrame[source]

Perform a group-by operation and apply aggregation functions.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • by (Union[str, List[str]]) – Column(s) to group by.

  • agg_funcs (Union[str, List[str], Dict[str, Union[str, List[str]]]]) – Aggregation function(s).

Returns:

A DataFrame with grouped and aggregated data.

Return type:

pd.DataFrame

static merge_dataframes(df1: DataFrame, df2: DataFrame, on: str | List[str] | None = None, how: str = 'inner', left_on: str | List[str] | None = None, right_on: str | List[str] | None = None, left_index: bool = False, right_index: bool = False, suffixes: Tuple[str, str] = ('_x', '_y'), indicator: bool = False, validate: str | None = None) DataFrame[source]

Merge two DataFrames using database-style joins.

Parameters:
  • df1 (pd.DataFrame) – The first DataFrame.

  • df2 (pd.DataFrame) – The second DataFrame.

  • on (Union[str, List[str], None], optional) – Column or index level names to join on.

  • how (str, optional) – Type of merge to be performed (‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’). Defaults to ‘inner’.

  • left_on (Union[str, List[str], None], optional) – Column(s) from the left DataFrame to use as keys.

  • right_on (Union[str, List[str], None], optional) – Column(s) from the right DataFrame to use as keys.

  • left_index (bool, optional) – Use index from the left DataFrame as join key. Defaults to False.

  • right_index (bool, optional) – Use index from the right DataFrame as join key. Defaults to False.

  • suffixes (Tuple[str, str], optional) – Suffixes to apply to overlapping column names. Defaults to (‘_x’, ‘_y’).

  • indicator (bool, optional) – Adds a column ‘_merge’ with merge information. Defaults to False.

  • validate (str, optional) – Checks if merge is of specified type. Defaults to None.

Returns:

A merged DataFrame.

Return type:

pd.DataFrame

static pivot_table(df: DataFrame, values: str | List[str] | None = None, index: str | List[str] | None = None, columns: str | List[str] | None = None, aggfunc: str | List[str] | Dict[str, str | List[str]] = 'mean', fill_value: Any | None = None, margins: bool = False, dropna: bool = True, margins_name: str = 'All', observed: bool = False, sort: bool = True) DataFrame[source]

Create a spreadsheet-style pivot table as a DataFrame.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • values (Union[str, List[str]], optional) – Column(s) to aggregate.

  • index (Union[str, List[str]], optional) – Keys to group by on the pivot table index.

  • columns (Union[str, List[str]], optional) – Keys to group by on the pivot table column.

  • aggfunc (Union[str, List[str], Dict[str, Union[str, List[str]]]], optional) – Aggregation function(s). Defaults to ‘mean’.

  • fill_value (Any, optional) – Value to replace missing values with.

  • margins (bool, optional) – Add all rows/columns (subtotals). Defaults to False.

  • dropna (bool, optional) – Do not include columns whose entries are all NaN. Defaults to True.

  • margins_name (str, optional) – Name of the row/column that will contain the totals. Defaults to ‘All’.

  • observed (bool, optional) – This only applies if any of the groupers are categoricals. Defaults to False.

  • sort (bool, optional) – Sort group keys. Defaults to True.

Returns:

The pivot table.

Return type:

pd.DataFrame

static rename_columns(df: DataFrame, columns_dict: Dict[str, str]) DataFrame[source]

Rename columns in the DataFrame based on a given dictionary.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • columns_dict (Dict[str, str]) – A dictionary mapping old column names to new ones.

Returns:

A DataFrame with renamed columns.

Return type:

pd.DataFrame

static sample_dataframe(df: DataFrame, n: int | None = None, frac: float | None = None, replace: bool = False, weights: str | Series | None = None, random_state: int | None = None, axis: int = 0) DataFrame[source]

Return a random sample of items from an axis of object.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • n (int, optional) – Number of items from axis to return.

  • frac (float, optional) – Fraction of axis items to return.

  • replace (bool, optional) – Sample with or without replacement. Defaults to False.

  • weights (Union[str, pd.Series], optional) – Weights for sampling.

  • random_state (int, optional) – Seed for the random number generator.

  • axis (int, optional) – Axis to sample. Defaults to 0.

Returns:

A random sample of the DataFrame.

Return type:

pd.DataFrame

static sort_values(df: DataFrame, by: str | List[str], ascending: bool | List[bool] = True, inplace: bool = False, na_position: str = 'last') DataFrame[source]

Sort the DataFrame by specified column(s).

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • by (Union[str, List[str]]) – Column name(s) to sort by.

  • ascending (Union[bool, List[bool]], optional) – Sort ascending vs. descending. Defaults to True.

  • inplace (bool, optional) – If True, perform operation in-place. Defaults to False.

  • na_position (str, optional) – ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end. Defaults to ‘last’.

Returns:

The sorted DataFrame.

Return type:

pd.DataFrame

static split_by_missing_values(df: DataFrame) Tuple[DataFrame, DataFrame][source]

Split the DataFrame into two DataFrames based on missing values.

Parameters:

df (pd.DataFrame) – The input DataFrame.

Returns:

  • DataFrame with columns that contain missing values.

  • DataFrame with columns that do not have any missing values.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

static split_dataframe(df: DataFrame, columns: List[str]) Tuple[DataFrame, DataFrame][source]

Split a DataFrame into two DataFrames based on specified columns.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • columns (List[str]) – List of column names to separate.

Returns:

A tuple containing:
  • DataFrame with the specified columns.

  • DataFrame without the specified columns.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]