We should be able to convert between different ontological code vocabularies. #204

mmcdermott · 2024-10-11T17:01:22Z

Target: We should be able to take a MEDS dataset (with parent-code entries in the metadata) and run a script to map codes in one omop ontology space (e.g., ICD9) to another (e.g., ICD10) using standardized vocabulary mapping tables (e.g. OHDSI vocabulary concept relationship tables)

This will entail two steps (yet to be determined on how to localize into actual stages):

Take the codes.parquet metadata file, use the vocabulary relationships to remap in the parent code space into the target output space. Store the original code string and the updated code string in some pre-set format. (with codes not in the vocabulary conversion step omitted)
Leverage the updated codes.parquet with the original and new code columns to perform a one-to-many mapping from the original shards to shards where the codes have been remapped (with codes not in the vocabulary conversion step omitted).

The text was updated successfully, but these errors were encountered:

mmcdermott · 2024-10-11T17:12:44Z

Most similar existing stage is vocabulary ID creation and assignment / tokenization:
Similar two steps:

Map each code string into a integer vocab ID (all in codes.parquet): https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/fit_vocabulary_indices.py
(alongside other normalization steps) join codes to vocab IDs and convert via the metadata file: https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/transforms/normalization.py

mmcdermott · 2024-10-11T17:31:30Z

Open question: How to download/store/access the ohdsi vocab remapping tables?

prenc · 2024-10-17T15:26:06Z

I wonder if the pipeline's first step should consider that there might be code/vocab_index in codes.parquet and account for that. If we translate m:m, we will most likely introduce a new word in the vocabulary that will require a new vocabulary index. Is it already assumed that the vocabulary fitting will occur later in the whole pipeline?

And @mmcdermott , could you look at the file structure of what I have added so far to confirm if this complies with the framework?

mmcdermott · 2024-10-17T16:08:56Z

Thanks for the nudge @prenc ; I will try to take a look later today! Also, yes, we should assume that vocabulary fitting will occur later in the pipeline so we do not need to worry about that at this stage.

prenc self-assigned this Oct 17, 2024

prenc linked a pull request Oct 17, 2024 that will close this issue

#204 Add code translations #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We should be able to convert between different ontological code vocabularies. #204

We should be able to convert between different ontological code vocabularies. #204

mmcdermott commented Oct 11, 2024 •

edited

Loading

mmcdermott commented Oct 11, 2024 •

edited

Loading

mmcdermott commented Oct 11, 2024

prenc commented Oct 17, 2024 •

edited

Loading

mmcdermott commented Oct 17, 2024

We should be able to convert between different ontological code vocabularies. #204

We should be able to convert between different ontological code vocabularies. #204

Comments

mmcdermott commented Oct 11, 2024 • edited Loading

mmcdermott commented Oct 11, 2024 • edited Loading

mmcdermott commented Oct 11, 2024

prenc commented Oct 17, 2024 • edited Loading

mmcdermott commented Oct 17, 2024

mmcdermott commented Oct 11, 2024 •

edited

Loading

mmcdermott commented Oct 11, 2024 •

edited

Loading

prenc commented Oct 17, 2024 •

edited

Loading