Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Summary Validation #62

Open
NickKramer87 opened this issue Oct 19, 2023 · 1 comment
Open

Database Summary Validation #62

NickKramer87 opened this issue Oct 19, 2023 · 1 comment

Comments

@NickKramer87
Copy link

NickKramer87 commented Oct 19, 2023

As a database generator, I want to be certain that the synthetic database I am generating is comparable to real-world data so that I can have confidence in the accuracy of the software that will be written using it.

Requirements:

  1. Task "Automated Summary Generation" must be completed first.

Proposed Subtasks:

  1. Create a comparison tool for the summary statistics from step 30 and the real-world summary statistics from HCAI.
  2. Determine an acceptable similarity percentage for the synthetic database summary.
  3. Test the output of the database creation tool to ensure that it meets this threshold.

Acceptance Criteria:

  1. A tool that will compare the synthetic and real summaries and give a percent similarity or possibly a correlation coefficient if that is easier.
  2. A brief report specifying the threshold for deeming a dataset similar and the reasons behond that threshold.
  3. A report where at least five different databases are tested using the tool wherein all five (or 95% if a large number is possible) of the datasets pass the similarity threshold.
@rileeki
Copy link
Contributor

rileeki commented Oct 31, 2023

Real-world summary statistics on inpatient discharges in California are available here: https://data.chhs.ca.gov/dataset/hospital-inpatient-characteristics-by-facility-pivot-profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants