Get Started
Get Started
Installation
Below we demonstrate the native Python environment setup. However, for better dependency management, we recommend using:
Recommended tools:
pyenv
- Python version managementpoetry
/uv
- Package management
Native Python Setup
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows
Upgrade pip:
python -m pip install --upgrade pip
Install required packages:
pip install -r requirements.txt
Quick Start
PETsARD is a privacy-enhancing data synthesis and evaluation framework. To start using PETsARD:
Create a minimal YAML configuration file:
# config.yaml Loader: demo: method: 'default' # Uses Adult Income dataset Synthesizer: demo: method: 'default' # Uses SDV Gaussian Copula Reporter: output: method: 'save_data' output: 'result' source: 'Synthesizer'
Run with two lines of code:
from petsard import Executor exec = Executor(config='config.yaml') exec.run()
Framework Structure
PETsARD follows this workflow:
Loader
: Loads data from files or benchmark datasetsSplitter
: Splits data into training/validation sets (optional)Preprocessor
: Prepares data for synthesis (e.g., encoding categorical values)Synthesizer
: Creates privacy-enhanced synthetic dataPostprocessor
: Formats synthetic data back to original structureEvaluator
: Measures synthesis quality and privacy metricsDescriber
: Generates dataset statistics and insightsReporter
: Saves results and generates reports
Basic Configuration
Here’s a simple example that demonstrates the complete workflow of PETsARD. This configuration will:
- Loads the Adult Income demo dataset
- Automatically determines data types and applies appropriate preprocessing
- Generates synthetic data using SDV’s Gaussian Copula method
- Evaluates basic quality metrics and privacy measures using SDMetrics
- Saves both synthetic data and evaluation report
Loader:
demo:
method: 'default'
Preprocessor:
demo:
method: 'default'
Synthesizer:
demo:
method: 'default'
Postprocessor:
demo:
method: 'default'
Evaluator:
demo:
method: 'default'
Reporter:
save_data:
method: 'save_data'
output: 'demo_result'
source: 'Postprocessor'
save_report:
method: 'save_report'
output: 'demo_report'
eval: 'demo'
granularity: 'global'
Next Steps
- Check the Tutorial section for detailed examples
- Visit the API Documentation for complete module references
- Explore benchmark datasets for testing
- Review example configurations in the GitHub repository