Describer

Describer(config)

Generate descriptive statistics for datasets.

Parameters

  • config (dict): Descriptive statistics configuration
  • method (str): Operation name
    • ‘default’: Use default set of statistics methods
  • describe (list): List of statistics methods to apply
    • See supported methods table below
    • For percentile, use dictionary format: {'percentile': k}

Examples

from petsard import Describer


# Using default descriptive methods
desc = Describer(method='default')

# Using custom descriptive methods
desc = Describer(
    method='default',
    describe_method=['mean', 'median', 'std', 'percentile'],
    percentile=0.95,
)

# Analysis
desc.create()
desc_result: dict[str, pd.DataFrame] = desc.eval({'data': df})

# Get results
global_stats: pd.DataFrame = desc_result.get('global')      # Global statistics
column_stats: pd.DataFrame = desc_result.get('columnwise')  # Column-wise statistics
pairwise_stats: pd.DataFrame = desc_result.get('pairwise')  # Pairwise statistics

Methods

create()

Initialize descriptor.

Parameters

None

Returns

None

eval()

Perform descriptive statistical analysis.

Parameters

  • data (dict): Data to analyze
    • Format: {'data': pd.DataFrame}

Returns

(dict[str, pd.DataFrame]), varies by module:

  • ‘global’: Single row dataframe representing overall dataset desciption results
  • ‘columnwise’: Column-level desciption results, each row representing desciption results for one column
  • ‘pairwise’: Column pair desciption results, each row representing desciption results for a pair of columns

Appendix: Supported Methods

Overview

Descriptive statistics are divided into three levels:

  • Global analysis: Calculate overall dataset properties (e.g., row count)
  • Column analysis: Calculate statistics for each column (e.g., mean, standard deviation)
  • Pairwise analysis: Calculate relationships between columns (e.g., correlation)

Supported Methods

LevelMethodParameterDescription
GlobalDescriberRowCount‘row_count’Calculate number of rows
GlobalDescriberColumnCount‘col_count’Calculate number of columns
GlobalDeescriberGlobalNA‘global_na_count’Calculate rows containing NA
ColumnDescriberMean‘mean’Calculate mean
ColumnDescriberMedian‘median’Calculate median
ColumnDescriberStd‘std’Calculate standard deviation
ColumnDescriberVar‘var’Calculate variance
ColumnDescriberMin‘min’Calculate minimum
ColumnDescriberMax‘max’Calculate maximum
ColumnDescriberKurtosis‘kurtosis’Calculate kurtosis
ColumnDescriberSkew‘skew’Calculate skewness
ColumnDescriberQ1‘q1’Calculate first quartile
ColumnDescriberQ3‘q3’Calculate third quartile
ColumnDescriberIQR‘iqr’Calculate interquartile range
ColumnDescriberRange‘range’Calculate range
ColumnDescriberPercentile‘percentile’Calculate custom percentile
ColumnDescriberColNA‘col_na_count’Calculate NA count per column
ColumnDescriberNUnique’nunique’Calculate number of unique values
PairwiseDescriberCov‘cov’Calculate covariance
PairwiseDescriberCorr‘corr’Calculate correlation