Generating vocabulary based codelists
a04_Vocab_based_codelists.Rmd
For this vignette we are going to produce codelists based on the OMOP CDM vocabulary tables. First we will create medication codelists based on ATC classifications and drug ingredients. Second, we will create condition codes based on ICD10 chapters and subchapters.
Medication codelists based on drug ingredients
The function getDrugIngredientCodes()
can be used to
generate the medication codelists based around ingredient codes. Here,
for example, we will create a codelist using ingredient codes only for
acetaminophen. We´ll do this using the Eunomia example data.
By default the function will return a codelist. As Eunomia only contains a subset of the OMOP CDM vocabularies we see a few codes returned, but we would get many more if working with the full set of vocabularies.
acetaminophen_codes <- getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen"
)
acetaminophen_codes
#>
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#>
#> - 161_acetaminophen (7 codes)
acetaminophen_codes$acetaminophen
#> NULL
Alternatively, instead of returning a codelist with only the concept IDs we could get them with details such as their name and domain.
acetaminophen_codes_with_details <- getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen",
type = "codelist_with_details"
)
acetaminophen_codes_with_details
#>
#> ── 1 codelist with details ─────────────────────────────────────────────────────
#>
#> - 161_acetaminophen (7 codes)
acetaminophen_codes_with_details[[1]] |>
glimpse()
#> Rows: 7
#> Columns: 5
#> $ concept_id <int> 1125315, 1127078, 1127433, 40229134, 40231925, 401625…
#> $ concept_name <chr> "Acetaminophen", "Acetaminophen 160 MG Oral Tablet", …
#> $ domain_id <chr> "Drug", "Drug", "Drug", "Drug", "Drug", "Drug", "Drug"
#> $ vocabulary_id <chr> "RxNorm", "RxNorm", "RxNorm", "RxNorm", "RxNorm", "Rx…
#> $ standard_concept <chr> "S", "S", "S", "S", "S", "S", "S"
Instead of getting back all concepts for acetaminophen, we could require that only concepts associated with acetaminophen and at least one more drug ingredient (i.e. combination therapies) are returned.
acetaminophen_two_or_more_ingredients <- getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen",
ingredientRange = c(2,Inf),
type = "codelist_with_details"
)
acetaminophen_two_or_more_ingredients
#>
#> ── 1 codelist with details ─────────────────────────────────────────────────────
#>
#> - 161_acetaminophen (4 codes)
acetaminophen_two_or_more_ingredients[[1]] |>
glimpse()
#> Rows: 4
#> Columns: 5
#> $ concept_id <int> 40229134, 40231925, 40162522, 19133768
#> $ concept_name <chr> "Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobro…
#> $ domain_id <chr> "Drug", "Drug", "Drug", "Drug"
#> $ vocabulary_id <chr> "RxNorm", "RxNorm", "RxNorm", "RxNorm"
#> $ standard_concept <chr> "S", "S", "S", "S"
Or we could instead only return concepts associated with acetaminophen and no other drug ingredient.
acetaminophen_one_ingredient <- getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen",
ingredientRange = c(1,1),
type = "codelist_with_details"
)
acetaminophen_one_ingredient
#>
#> ── 1 codelist with details ─────────────────────────────────────────────────────
#>
#> - 161_acetaminophen (3 codes)
acetaminophen_one_ingredient[[1]] |>
glimpse()
#> Rows: 3
#> Columns: 5
#> $ concept_id <int> 1125315, 1127078, 1127433
#> $ concept_name <chr> "Acetaminophen", "Acetaminophen 160 MG Oral Tablet", …
#> $ domain_id <chr> "Drug", "Drug", "Drug"
#> $ vocabulary_id <chr> "RxNorm", "RxNorm", "RxNorm"
#> $ standard_concept <chr> "S", "S", "S"
Lastly, we could also restrict to a particular dose form. Let’s try to see if there are any injection dose form of acetaminophen.
acetaminophen_injections <- getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen",
doseForm = "injection",
type = "codelist_with_details"
)
#> Warning: No descendant codes found
acetaminophen_injections
#> list()
In this case we see that in Eunomia there no concept for acetaminophen with an injection dose form.
The previous examples have focused on single drug ingredient. We can though specify multiple ingredients, in which case we will get a codelist back for each.
acetaminophen_heparin_codes <- getDrugIngredientCodes(
cdm = cdm,
name = c("acetaminophen", "heparin")
)
acetaminophen_heparin_codes
#>
#> ── 2 codelists ─────────────────────────────────────────────────────────────────
#>
#> - 161_acetaminophen (7 codes)
#> - 5224_heparin (1 codes)
And if we don´t specify an ingredient, we´ll get a codelist for every drug ingredient in the vocabularies.
ingredient_codes <- getDrugIngredientCodes(cdm = cdm)
ingredient_codes
#>
#> ── 91 codelists ────────────────────────────────────────────────────────────────
#>
#> - 10318_tacrine (2 codes)
#> - 10582_levothyroxine (2 codes)
#> - 11170_verapamil (2 codes)
#> - 11248_vitamin_b_12 (2 codes)
#> - 11289_warfarin (2 codes)
#> - 11636_drospirenone (2 codes)
#> along with 85 more codelists
Medication codelists based on ATC classifications
Analogous to getDrugIngredientCodes()
,
getATCCodes()
can be used to generate a codelist based on a
particular ATC class. To show this, we´ll use a the mock vocabulary from
CodelistGenerator.
cdm_mock <- mockVocabRef()
#> Warning: ! 5 column in person do not match expected column type:
#> • `person_id` is numeric but expected integer
#> • `gender_concept_id` is numeric but expected integer
#> • `year_of_birth` is numeric but expected integer
#> • `race_concept_id` is numeric but expected integer
#> • `ethnicity_concept_id` is numeric but expected integer
#> Warning: ! 3 column in observation_period do not match expected column type:
#> • `observation_period_id` is numeric but expected integer
#> • `person_id` is numeric but expected integer
#> • `period_type_concept_id` is numeric but expected integer
#> Warning: ! 8 column in cdm_source do not match expected column type:
#> • `cdm_source_abbreviation` is logical but expected character
#> • `cdm_holder` is logical but expected character
#> • `source_description` is logical but expected character
#> • `source_documentation_reference` is logical but expected character
#> • `cdm_etl_reference` is logical but expected character
#> • `source_release_date` is logical but expected date
#> • `cdm_release_date` is logical but expected date
#> • `vocabulary_version` is logical but expected character
#> Warning: ! 3 column in concept do not match expected column type:
#> • `valid_start_date` is logical but expected date
#> • `valid_end_date` is logical but expected date
#> • `invalid_reason` is logical but expected character
#> Warning: ! 1 column in vocabulary do not match expected column type:
#> • `vocabulary_concept_id` is numeric but expected integer
#> Warning: ! 3 column in concept_relationship do not match expected column type:
#> • `valid_start_date` is logical but expected date
#> • `valid_end_date` is logical but expected date
#> • `invalid_reason` is logical but expected character
#> Warning: ! 1 column in concept_synonym do not match expected column type:
#> • `language_concept_id` is logical but expected integer
#> Warning: ! 2 column in concept_ancestor do not match expected column type:
#> • `min_levels_of_separation` is numeric but expected integer
#> • `max_levels_of_separation` is numeric but expected integer
#> Warning: ! 7 column in drug_strength do not match expected column type:
#> • `amount_value` is logical but expected numeric
#> • `amount_unit_concept_id` is numeric but expected integer
#> • `numerator_unit_concept_id` is numeric but expected integer
#> • `denominator_unit_concept_id` is numeric but expected integer
#> • `box_size` is logical but expected integer
#> • `valid_start_date` is logical but expected date
#> • `valid_end_date` is logical but expected date
#> Warning: ! 8 column in achilles_analysis do not match expected column type:
#> • `analysis_id` is numeric but expected integer
#> • `analysis_name` is numeric but expected character
#> • `stratum_1_name` is logical but expected character
#> • `stratum_2_name` is logical but expected character
#> • `stratum_3_name` is logical but expected character
#> • `stratum_4_name` is logical but expected character
#> • `stratum_5_name` is logical but expected character
#> • `category` is logical but expected character
#> Warning: ! 7 column in achilles_results do not match expected column type:
#> • `analysis_id` is numeric but expected integer
#> • `stratum_1` is numeric but expected character
#> • `stratum_2` is logical but expected character
#> • `stratum_3` is logical but expected character
#> • `stratum_4` is logical but expected character
#> • `stratum_5` is logical but expected character
#> • `count_value` is numeric but expected integer
#> Warning: ! 16 column in achilles_results_dist do not match expected column type:
#> • `analysis_id` is numeric but expected integer
#> • `stratum_1` is logical but expected character
#> • `stratum_2` is logical but expected character
#> • `stratum_3` is logical but expected character
#> • `stratum_4` is logical but expected character
#> • `stratum_5` is logical but expected character
#> • `min_value` is logical but expected integer
#> • `max_value` is logical but expected integer
#> • `avg_value` is logical but expected numeric
#> • `stdev_value` is logical but expected numeric
#> • `median_value` is logical but expected numeric
#> • `p10_value` is logical but expected numeric
#> • `p25_value` is logical but expected numeric
#> • `p75_value` is logical but expected numeric
#> • `p90_value` is logical but expected numeric
#> • `count_value` is numeric but expected integer
#> Warning in validateCdmReference(cdm, soft = .softValidation): There are observation period end dates after the current date: 2024-11-11
#> ℹ The latest max observation period end date found is 2025-12-31
In this example, we will produce an ATC level 1 codelist based on Alimentary Tract and Metabolism Drugs.
atc_codelist <- getATCCodes(
cdm = cdm_mock,
level = "ATC 1st",
name = "alimentary tract and metabolism"
)
atc_codelist
#>
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#>
#> - 1234_alimentary_tract_and_metabolism (2 codes)
Condition Codelists using ICD10 chapters and subchapters
We can use getICD10StandardCodes()
to generate condition
codes based on ICD10 chapters and subchapters. As ICD10 is a
non-standard vocabulary in the OMOP CDM this function returns standard
concepts associated with these ICD10 chapters and subchapters directly
via a mapping from them or indirectly from being a descendant concept of
a code that is mapped from them. It is important to note that
getICD10StandardCodes()
will only return results if the ICD
codes are included in the vocabulary tables.
For this example, we will try to generate a codelist for arthropathies.
arthropathy_codes <- getICD10StandardCodes(
cdm = cdm_mock,
name = "arthropathies"
)
#> Getting non-standard ICD10 concepts
#> Mapping from non-standard to standard concepts
#> Getting descendant concepts
arthropathy_codes
#>
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#>
#> - arthropathies (3 codes)
arthropathy_codes$arthropathies
#> [1] 3 4 5
As with the above functions, we could return concepts with their details rather than as a codelist.
arthropathy_codes <- getICD10StandardCodes(
cdm = cdm_mock,
name = "arthropathies",
type = "codelist_with_details"
)
#> Getting non-standard ICD10 concepts
#> Mapping from non-standard to standard concepts
#> Getting descendant concepts
arthropathy_codes
#>
#> ── 1 codelist with details ─────────────────────────────────────────────────────
#>
#> - 1234_arthropathies (3 codes)
arthropathy_codes[[1]]
#> # A tibble: 3 × 6
#> name concept_id concept_name domain_id vocabulary_id concept_code
#> <chr> <int> <chr> <chr> <chr> <chr>
#> 1 1234_arthropathi… 3 Arthritis Condition SNOMED 1234
#> 2 1234_arthropathi… 4 Osteoarthri… Condition SNOMED 1234
#> 3 1234_arthropathi… 5 Osteoarthri… Condition SNOMED 1234