You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce the distribution metrics established using SUPPORT, as stated on page 13 of the SynthVAE report.
I have downloaded available code and checked that my libraries are identical to those given in the requirements.txt file. I am using Python version 3.8.0.
I have ran the following code (for both pre-processing methods) on windows in command prompt:
python scratch_vae_expts.py --pre_proc_method GMM
and:
python scratch_vae_expts.py --pre_proc_method Standard
I wasn't clear on which pre-processing method was used in the report. However, in both cases regardless, the distribution metrics that I have computed for the VAE model are different to those stated in the pdf. Please can you help suggest how to fix this? I have not modified the available code in any way. Perhaps the issue is due to seeding?
Thank you in advance.
The text was updated successfully, but these errors were encountered:
Hi Joe, as part of our work using SynthVAE in the synthetic data pipeline we found similar issues in terms of reproducability. We found that any metrics that use sklearn components are not reproducible and cannot be made so without changing the sdv code.
The reason for this is that setting the numpy random seed doesn't have the scope to set the sklearn random_state when it's imported from another file. As a result any metrics that use a sklearn component with a random_state argument will not be reproducible.
Metrics such as GMLogLikelihood, detection metrics e.g. logistic regression, standard vector machine all use sklearn so will be affected by this.
Although no fix is available at the moment, I wanted to add the above to give more info around the likely cause of this.
Hello,
I am trying to reproduce the distribution metrics established using SUPPORT, as stated on page 13 of the SynthVAE report.
I have downloaded available code and checked that my libraries are identical to those given in the requirements.txt file. I am using Python version 3.8.0.
I have ran the following code (for both pre-processing methods) on windows in command prompt:
python scratch_vae_expts.py --pre_proc_method GMM
and:
python scratch_vae_expts.py --pre_proc_method Standard
I wasn't clear on which pre-processing method was used in the report. However, in both cases regardless, the distribution metrics that I have computed for the VAE model are different to those stated in the pdf. Please can you help suggest how to fix this? I have not modified the available code in any way. Perhaps the issue is due to seeding?
Thank you in advance.
The text was updated successfully, but these errors were encountered: