Use Cases

Use Cases

When developing privacy-preserving data synthesis workflows, you may encounter special requirements. The following scenarios will help you handle these situations. Each topics provides complete examples that you can execute and test directly through Colab links.

Data Understanding:

Data Insights: Data Description

  • Understand your data before synthesis
  • Analyze data characteristics at different granularities
  • Includes global, column-wise, and pairwise statistics

Data Generating:

  • If the synthesis results are not satisfactory, you can:
    • Try different synthesis algorithms
    • Adjust model parameters (if any)
    • Perform more detailed data preprocessing

Data Quality Enhancement: Data Preprocessing

  • Systematically address various data quality issues
  • Provide multiple methods for handling missing values, encoding, and outliers
  • Include uniform encoding, standardization, and discretization techniques

Synthesis Method Selection: Comparing Synthesizers

  • Compare effects of different synthesis algorithms
  • Use multiple algorithms in a single experiment
  • Includes Gaussian Copula, CTGAN, and TVAE

Custom Synthesis: Custom Synthesis

  • Create your own synthesis methods
  • Integrate into PETsARD’s synthesis workflow

Data Plausibility: Data Constraining

  • Ensure synthetic data complies with real business rules
  • Provide constraints for field values, field combinations, and null values
  • Include numeric range limits, category relationships, and null handling strategies

Data Evaluating

Machine Learning-based Data Utility:ML Utility

  • Evaluate synthetic data utility for classification, regression, and clustering
  • Uses dual-source control group evaluation by default for fair comparison
  • Support multiple experimental designs for different use cases

Custom Evaluation: Custom Evaluation

  • Create your own evaluation methods
  • Implement assessments at different granularities
  • Integrate into PETsARD’s evaluation workflow

Workflow improvement

Workflow Validation: Benchmark Datasets

  • Test your synthesis workflow on benchmark data
  • Verify synthesis parameter settings
  • Provide reliable reference standards