Splitter
Splitter(
method=None,
num_samples=1,
train_split_ratio=0.8,
random_state=None
)
For experimental purposes, splits data into training and validation sets. Designed to support privacy evaluation tasks like Anonymeter, where multiple splits can reduce bias in synthetic data assessment. For imbalanced datasets, larger num_samples
is recommended.
Parameters
method
(str, optional): Loading method for existing split data- Default: None
- Values: ‘custom_data’ - load split data from filepath
num_samples
(int, optional): Number of times to resample the data- Default: 1
train_split_ratio
(float, optional): Ratio of data for training set- Default: 0.8
- Must be between 0 and 1
random_state
(int | float | str, optional): Seed for reproducibility- Default: None
Examples
from petsard import Splitter
# Basic usage
split = Splitter(num_samples=5, train_split_ratio=0.8)
split.split(data=df)
train_df = split.data[1]['train'] # first split's training set
Methods
split()
split.split(data, exclude_index=None, metadata=None)
Perform data splitting.
Parameters
data
(pd.DataFrame, optional): Dataset to be split- Not required if
method='custom_data'
- Not required if
exclude_index
(list[int], optional): List of indices to exclude from sampling- Default: None
metadata
(Metadata, optional): Metadata object of the dataset- Default: None
Returns
None.
- Split results are stored in:
Splitter.data
: Dictionary containing all split resultsSplitter.metadata
: Dataset metadata
- Access split data via:
Splitter.data[sample_num]['train']
: Training set for specific sampleSplitter.data[sample_num]['validation']
: Validation set for specific sample
Attributes
data
: Split datasets dictionary- Format:
{sample_num: {'train': pd.DataFrame, 'validation': pd.DataFrame}}
- Format:
metadata
: Dataset metadata (Metadata object)config
: Configuration dictionary containing:- If
method=None
:num_samples
(int): Resample timestrain_split_ratio
(float): Split ratiorandom_state
(int | float | str): Random seed
- If
method='custom_data'
:method
(str): Loading methodfilepath
(dict): Data file paths- Additional Loader configurations
- If