External Synthesis with Default Evaluation
External synthesis with default evaluation. Enabling users to evaluate synthetic data from external solutions.
Click the below button to run this example in Colab:
---
Splitter:
custom:
method: 'custom_data'
filepath:
ori: 'benchmark/adult-income_ori.csv'
control: 'benchmark/adult-income_control.csv'
Synthesizer:
custom:
method: 'custom_data'
filepath: 'benchmark/adult-income_syn.csv'
Evaluator:
demo-diagnostic:
method: 'sdmetrics-diagnosticreport'
demo-quality:
method: 'sdmetrics-qualityreport'
demo-singlingout:
method: 'anonymeter-singlingout'
demo-linkability:
method: 'anonymeter-linkability'
aux_cols:
-
- 'age'
- 'marital-status'
- 'relationship'
- 'gender'
-
- 'workclass'
- 'educational-num'
- 'occupation'
- 'income'
demo-inference:
method: 'anonymeter-inference'
secret: 'income'
demo-classification:
method: 'mlutility-classification'
target: 'income'
Reporter:
save_report_global:
method: 'save_report'
granularity: 'global'
...
External Data Preparation Overview
Pre-synthesized data evaluation requires attention to three key components:
- Training Set - used for synthetic data generation
- Testing Set - for privacy risk evaluation
- Synthetic Data - based only on the training set
Note: Using both training and testing data for synthesis would affect the accuracy of privacy evaluation.
External Data Requirements
Splitter
:
method: 'custom_data'
: For pre-split datasets provided externallyfilepath
: Points to original (ori
) and control (control
) datasets- Recommended ratio: 80% training, 20% testing unless specific reasons otherwise
Synthesizer
:
method: 'custom_data'
: For externally generated synthetic datafilepath
: Points to pre-synthesized dataset- Must be generated using only the training portion of data
Evaluator
:
- Ensures fair comparison between different synthetic data solutions