-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcharacterization.jmd
186 lines (131 loc) · 5.79 KB
/
characterization.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# Exploring Patient Treatment Pathways
## Required Packages
Here are the packages we will need for exploring patient pathways grouped by primary use cases in this exploration:
- Interfacing with databases
* [`DBInterface.jl`](https://github.com/JuliaDatabases/DBInterface.jl) - Database interface definitions for Julia
* [`SQLite`](https://github.com/JuliaDatabases/SQLite.jl) - A Julia interface to the SQLite library
- Health analytics built specifically for working with OMOP CDM databases
* [`OHDSICohortExpressions.jl`](https://github.com/MechanicalRabbit/OHDSICohortExpressions.jl) - Implementation of a conversion from the JSON cohort definitions used in the OHDSI ecosystem into an SQL transaction.
* [`OMOPCDMCohortCreator.jl`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) - Create cohorts from databases utilizing the OMOP CDM
- General data analytics tools
* [`DataFrames.jl`](https://github.com/JuliaData/DataFrames.jl) - In-memory tabular data in Julia
- Miscellaneous packages
* [`HealthSampleData.jl`](https://github.com/JuliaHealth/HealthSampleData.jl) - Sample health data for a variety of health formats and use cases
* [`Random`](https://docs.julialang.org/en/v1/stdlib/Random/#Random-Numbers) - Support for generating random numbers
## Interfacing with Synthetic OMOP CDM Database
To start with, we will need to download a synthetic patient database in the OMOP CDM format called _Eunomia_.
We can download it onto your computer from the `HealthSampleData.jl` package by executing the next cell:
```julia
import HealthSampleData:
Eunomia
ENV["DATADEPS_ALWAYS_ACCEPT"] = true
eunomia = Eunomia()
```
Then, by passing the path of where the SQLite database is on your computer (given by the `Eunomia()` command), we can create a database connection to the database as follows:
```julia
import SQLite:
DB
conn = DB(eunomia);
```
## Constructing an Initial Patient Cohort
Next, we can build a cohort of patients who match a phenotype definition of a disease.
Any such definitions have been defined using the [ATLAS tool](https://atlas-demo.ohdsi.org) and these definitions can be transformed by `OHDSICohortExpressions.jl` into a SQL statement.
We can read and pass a given definition to `OHDSICohortExpressions.jl` as follows:
```julia
import OHDSICohortExpressions:
translate,
Model
cohort_expression = read("strep_throat.json", String)
#=
This defines where patients matching our disease
definition get put.
In this case, to the database schema called
"main" and the target table called "cohort"
=#
model = Model(cdm_version = v"5.3.1",
cdm_schema = "main",
vocabulary_schema = "main",
results_schema = "main",
target_schema = "main",
target_table = "cohort");
#=
We execute our disease definition here against the
database and create a patient cohort
associated to the ID, 1.
=#
sql = translate(cohort_expression,
dialect = :sqlite,
model = model,
cohort_definition_id = 1);
```
Taking the SQL that was prepared by `OHDSICohortExpressions.jl`, we can now construct our cohort of patients that match our particular definition of interest.
We can do this using the package, `DBInterface`:
```julia
import DBInterface:
execute
for query in split(sql, ";")[1:end-1]
execute(conn, query)
end
```
## Exploring the Patient Cohort with `OMOPCDMCohortCreator.jl`
### Finding Patients Belonging To Our Cohort
With database details defined, we can now fully explore the patient cohort we have defined using `OMOPCDMCohortCreator.jl` (I'll refer to this as `occ` from now on).
This next cell performs all the set-up required by `occ` -- you'll see some informational messages pop up which means that this worked successfully:
```julia
import OMOPCDMCohortCreator as occ
# Defines what kind of database occ is connecting to
occ.GenerateDatabaseDetails(
:sqlite,
"main"
)
#=
Generates internal representation of what tables are
available for occ to operate upon
=#
occ.GenerateTables(conn)
```
Now we can pull our cohort that matches our disease definition of interest from the database's `cohort` table and store it within a `DataFrame` using `DataFrames.jl`:
```julia
import DataFrames as DF
query_results = execute(conn,
"""
SELECT
subject_id AS person_id
FROM
cohort
WHERE
cohort_definition_id = 1;
""")
# C represents the cohort
C = DF.DataFrame(query_results)
```
### Initial Characterization of Our Cohort
Now, we can choose what cofactors we want to explore within our population and iteratively build up the dataset we want to explore.
In this case, we can build a dataset that characterizes over race, gender, and age group:
```julia
C_race = occ.GetPatientRace(C.person_id, conn)
C_gender = occ.GetPatientGender(C.person_id, conn)
C_age_group = occ.GetPatientAgeGroup(C.person_id, conn)
```
We can group these factors together into one DataFrame:
```julia
C_characterized = DF.outerjoin(C_race,
C_gender,
C_age_group;
on = :person_id,
matchmissing = :equal)
```
### Final Grouping and Removal of Personal Identifiers
Finally, we can remove personal identifiers for each patient by removing the `person_id` feature of our dataset.
```julia
C_characterized = C_characterized[:, DF.Not(:person_id)]
```
Then doing a final grouping, we can find how many patients belong to what patient grouping based on the characteristics we explored:
```julia
C_groups = DF.groupby(C_characterized,
[:race_concept_id,
:gender_concept_id,
:age_group]
)
C_final = DF.combine(C_groups, DF.nrow => :count)
```