Loader

Loader(
    filepath=None,
    method=None,
    column_types=None,
    header_names=None,
    na_values=None
)

Module for loading tabular data.

Parameters

  • filepath (str, optional): Path to the dataset file. Cannot be used with method
    • Default: None
    • If using benchmark dataset, format as benchmark://{dataset_name}
  • method (str, optional): Loading method. Cannot be used with filepath
    • Default: None
    • Values: ‘default’- loads PETsARD’s default dataset ‘adult-income’
  • column_types (dict, optional): Column type definitions
    • Default: None
    • Format: {type: [colname]}
    • Available types (case-insensitive):
      • ‘category’: Categorical columns
      • ‘datetime’: Datetime columns
  • header_names (list, optional): Column names for data without headers
    • Default: None
  • na_values (str | list | dict, optional): Values to be recognized as NA/NaN
    • Default: None
    • If str or list: Apply to all columns
    • If dict: Apply per-column with format {colname: na_values}
    • Example: {'workclass': '?', 'age': [-1]}

Examples

from petsard import Loader


# Basic usage
load = Loader('data.csv')
load.load()

# Using benchmark dataset
load = Loader('benchmark://adult-income')
load.load()

Methods

load()

Read and load the data.

Parameters

None.

Return

  • data (pd.DataFrame): Loaded DataFrame
  • metadata (Metadata): Dataset metadata
loader = Loader('data.csv')
data, metadata = loader.load() # get loaded DataFrame

Attributes

  • config (LoaderConfig): Configuration dictionary containing:
    • filepath (str): Local data file path
    • method (str): Loading method
    • file_ext (str): File extension
    • benchmark (bool): Whether using benchmark dataset
    • dtypes (dict): Column data types
    • column_types (dict): User-defined column types
    • header_names (list): Column headers
    • na_values (str | list | dict): NA value definitions
    • For benchmark datasets only:
      • filepath_raw (str): Original input filepath
      • benchmark_name (str): Benchmark dataset name
      • benchmark_filename (str): Benchmark dataset filename
      • benchmark_access (str): Benchmark dataset access type
      • benchmark_region_name (str): Amazon region name
      • benchmark_bucket_name (str): Amazon bucket name
      • benchmark_sha256 (str): SHA-256 value of benchmark dataset