Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keck Hospital Test #64

Open
NickKramer87 opened this issue Oct 19, 2023 · 11 comments
Open

Keck Hospital Test #64

NickKramer87 opened this issue Oct 19, 2023 · 11 comments
Milestone

Comments

@NickKramer87
Copy link

NickKramer87 commented Oct 19, 2023

As a database generator, I want to validate the accuracy of the synthetic database by comparing the summary statistics of a database of patients that would go to Keck Hospital to the actual summary statistics from Keck Hospital to determine if significant changes are needed to the database generation program.

Requirements:

  1. Task "Automatic Summary Generation" and "Database Summary Validation" must be completed prior to start.

Potential subtasks:

  1. A radius or list of zip codes for patients who are likely to go to Keck Hospital.
  2. A synthetic database of patients within the area from subtask 1.
  3. A passing grade according to the success criterium from subtask 36.

Acceptance Criteria:

  1. A tool that will compare the synthetic and real summaries and give a percent similarity or possibly a correlation coefficient if that is easier.
  2. A brief report specifying the threshold for deeming a dataset similar and the reasons behond that threshold.
  3. A report where at least five different databases are tested using the tool from task 3 wherein all five (or 95% if a large number is possible) of the datasets pass the similarity threshold.
@TravisHaussler TravisHaussler self-assigned this Oct 30, 2023
@TravisHaussler
Copy link
Contributor

Its possible to override the hospitals list to only include Keck and then set the location for the run to be somewhat close. We did a test run with Pasadena and it produced results.

@TravisHaussler
Copy link
Contributor

TravisHaussler commented Nov 1, 2023

Sub task notes:

  • Test out of location population to see what generates
  • Add location as a possible command line
  • Compare running with a local hospital vs a far hospital

@rileeki
Copy link
Contributor

rileeki commented Nov 1, 2023

Medicare did an analysis comparing Synthea's data to real Medicare claims data. This might be something of a model for the analysis. See pages 13-27 of this document.

@TravisHaussler TravisHaussler added this to the phase3 milestone Nov 7, 2023
@TravisHaussler
Copy link
Contributor

https://github.com/orchid-initiative/synthetic-database-project/blob/main/csv_formatted_data_09-11-2023_134827.csv

Here is a 500 male and 500 female run of synthea in los angeles area with only keck as a possible hospital

@rileeki
Copy link
Contributor

rileeki commented Nov 10, 2023

Thanks, @TravisHaussler! lol They all still live in Massachusetts somehow.

@rileeki
Copy link
Contributor

rileeki commented Nov 10, 2023

Hm, the layout of this doesn't seem quite right and it looks like the diagnosis codes are still SNOMED. Could you upload the log and fixed-width output too? @TravisHaussler

@TravisHaussler
Copy link
Contributor

TravisHaussler commented Nov 10, 2023 via email

@TravisHaussler
Copy link
Contributor

TravisHaussler commented Nov 10, 2023

Here is what I see for the first bunch of rows from that file (taking out the mass of extra blank diagnosis and procedure code fields):
The diagnosis codes are ICD, while the Procedure codes are still SNOMED and I see all the addresses in CA

<style type="text/css"></style>
Type of Care Facility Identification Number Facility Name Date of Birth Sex Ethnicity Race Not in Use 1 Admission Date Point of Origin Route of Admission Type of Admission Discharge Date Principal Diagnosis Present on Admission for Principal Diagnosis Diagnosis 2 Present on Admission 2 Diagnosis 3 Present on Admission 3 Diagnosis Codes Present on Admission Principal Procedure Code Principal Procedure Date Procedure Code 2 Procedure Date 2 Procedure Codes Procedure Dates External Causes of Morbidity and Present on Admission Patient SSN Disposition of Patient Total Charges Abstract Record Number (Optional) Prehospital Care & Resuscitation - DNR Order Payer Category Type of Coverage Plan Code Number Preferred Spoken Language Patient Address - Address Number and Street Name Patient Address - City Patient Address - State Patient Address - Zip Code Patient Address - Country Code Patient Address - Homeless Indicator
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   19980816 6 3 2 19980817 Z9851 N         ('Z9851',) ('N',)               999713129 85 12088 1 Y 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   19980816 1 3 3 19980817 Z9851 N         ('Z9851',) ('N',)               999713129 84 12088 2 N 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   19980816 6 3 4 19980817 Z9851 N         ('Z9851',) ('N',)               999713129 85 12088 3 N 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   19980816 8 3 5 19980817 Z9851 N         ('Z9851',) ('N',)               999713129 50 12088 4 Y 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   20180807 1 1 1 20180810 S83519 U S83519 Y     ('S83519', 'S83519') ('U', 'Y') 699253003 20180807 133899007 20180807 ('699253003', '133899007') ('20180807', '20180807')   999713129 86 29949 5 Y 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   20180807 2 3 9 20180810 S83519 U S83519 Y     ('S83519', 'S83519') ('U', 'Y') 699253003 20180807 133899007 20180807 ('699253003', '133899007') ('20180807', '20180807')   999713129 21 29949 6 N 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   20180807 E 1 1 20180810 S83519 U S83519 Y     ('S83519', 'S83519') ('U', 'Y') 699253003 20180807 133899007 20180807 ('699253003', '133899007') ('20180807', '20180807')   999713129 81 29949 7 Y 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19621021 F E2 R5   20180807 5 3 3 20180810 S83519 U S83519 Y     ('S83519', 'S83519') ('U', 'Y') 699253003 20180807 133899007 20180807 ('699253003', '133899007') ('20180807', '20180807')   999713129 62 29949 8 Y 6       140 Flatley Arcade Apt 56 Los Angeles California 90036 US  
1 10735 KECK MEDICAL CENTER OF USC 19971128 F E2 R5   20230308 F 3 2 20230309 Z9851 W Z302 W Z9851 N ('Z9851', 'Z302', 'Z9851') ('W', 'W', 'N') 287664005 20230308 133899007 20230308 ('287664005', '133899007') ('20230308', '20230308')   999904365 91 16155 9 N 6       707 Considine Way Apt 91 Los Angeles California 90061 US  
1 10735 KECK MEDICAL CENTER OF USC 19971128 F E2 R5   20230308 4 3 9 20230309 Z9851 W Z302 W Z9851 N ('Z9851', 'Z302', 'Z9851') ('W', 'W', 'N') 287664005 20230308 133899007 20230308 ('287664005', '133899007') ('20230308', '20230308')   999904365 0 16155 10 Y 6       707 Considine Way Apt 91 Los Angeles California 90061 US  
1 10735 KECK MEDICAL CENTER OF USC 19971128 F E2 R5   20230308 1 3 9 20230309 Z9851 W Z302 W Z9851 N ('Z9851', 'Z302', 'Z9851') ('W', 'W', 'N') 287664005 20230308 133899007 20230308 ('287664005', '133899007') ('20230308', '20230308')   999904365 87 16155 11 Y 6       707 Considine Way Apt 91 Los Angeles California 90061 US  
1 10735 KECK MEDICAL CENTER OF USC 19971128 F E2 R5   20230308 1 3 5 20230309 Z9851 W Z302 W Z9851 N ('Z9851', 'Z302', 'Z9851') ('W', 'W', 'N') 287664005 20230308 133899007 20230308 ('287664005', '133899007') ('20230308', '20230308')   999904365 84 16155 12 Y 6       707 Considine Way Apt 91 Los Angeles California 90061 US  
1 10735 KECK MEDICAL CENTER OF USC 19690208 F E2 R5   20040820 8 3 5 20040821 Z9851 Y         ('Z9851',) ('Y',)               999925265 83 9185 13 N 8       320 Bayer Crossing Suite 37 Los Angeles California 90291 US  
1 10735 KECK MEDICAL CENTER OF USC 19690208 F E2 R5   20040820 6 3 4 20040821 Z9851 Y         ('                                              

@rileeki
Copy link
Contributor

rileeki commented Nov 10, 2023

@TravisHaussler You are totally right. I'm not even sure what I was looking at... I'm sorry about that!

@rileeki rileeki self-assigned this Nov 17, 2023
@rileeki
Copy link
Contributor

rileeki commented Nov 17, 2023

@TravisHaussler I'll pick this up for the next two weeks. I plan to slice and dice the data you provided and provide a report at our next check-in comparing this dataset to the publicly available summary statistics.

@TravisHaussler
Copy link
Contributor

TravisHaussler commented Nov 17, 2023 via email

@rileeki rileeki removed their assignment Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants