Loader
Loader(
filepath=None,
method=None,
column_types=None,
header_names=None,
na_values=None
)
Module for loading tabular data.
Parameters
filepath
(str
, optional): Path to the dataset file. Cannot be used withmethod
- Default: None
- If using benchmark dataset, format as
benchmark://{dataset_name}
method
(str
, optional): Loading method. Cannot be used withfilepath
- Default: None
- Values: ‘default’- loads PETsARD’s default dataset ‘adult-income’
column_types
(dict
, optional): Column type definitions- Default: None
- Format:
{type: [colname]}
- Available types (case-insensitive):
- ‘category’: Categorical columns
- ‘datetime’: Datetime columns
header_names
(list
, optional): Column names for data without headers- Default: None
na_values
(str
|list
|dict
, optional): Values to be recognized as NA/NaN- Default: None
- If str or list: Apply to all columns
- If dict: Apply per-column with format
{colname: na_values}
- Example:
{'workclass': '?', 'age': [-1]}
Examples
from petsard import Loader
# Basic usage
load = Loader('data.csv')
load.load()
# Using benchmark dataset
load = Loader('benchmark://adult-income')
load.load()
Methods
load()
Read and load the data.
Parameters
None.
Return
data
(pd.DataFrame
): Loaded DataFramemetadata
(Metadata
): Dataset metadata
loader = Loader('data.csv')
data, metadata = loader.load() # get loaded DataFrame
Attributes
config
(LoaderConfig
): Configuration dictionary containing:filepath
(str
): Local data file pathmethod
(str
): Loading methodfile_ext
(str
): File extensionbenchmark
(bool
): Whether using benchmark datasetdtypes
(dict
): Column data typescolumn_types
(dict
): User-defined column typesheader_names
(list
): Column headersna_values
(str
|list
|dict
): NA value definitions- For benchmark datasets only:
filepath_raw
(str
): Original input filepathbenchmark_name
(str
): Benchmark dataset namebenchmark_filename
(str
): Benchmark dataset filenamebenchmark_access
(str
): Benchmark dataset access typebenchmark_region_name
(str
): Amazon region namebenchmark_bucket_name
(str
): Amazon bucket namebenchmark_sha256
(str
): SHA-256 value of benchmark dataset