In our project, "Synthetic Data Generation in Medical Applications," we explored the generation of synthetic datasets to address the challenges of privacy and accessibility in healthcare data. The dataset we used are the following:
Type | Dataset Name |
---|---|
Tabular | Heart Disease |
Breast Cancer | |
Timeseries | Biosignals for estimating mental concentration |
Diabetes | |
Images | KneeXrayOA-simple |
ChestXRay Pneumonia |
Our methodology included:
-
for Tabular Data
- Advanced Methods like GANs (CTGANSynthesizer, TVAESynthesizer, CopulaGANSynthesizer, and medGAN)
- Statistical Methods (SMOTE, GaussianCopulaSynthesizer)
-
for Time series:
- PARSynthesizer
-
for Images:
- WGAN and
- WGAN-GP