Compare, subset or stratify codelists • CodelistGenerator

Introduction: Generate codelist subsets, exploring codelist utility functions

This vignette introduces a set of functions designed to manipulate and explore codelists within an OMOP CDM. Specifically, we will learn how to:

Subset a codelist to keep only codes meeting a certain criteria.
Stratify a codelist based on attributes like dose unit or route of administration.
Compare two codelists to identify shared and unique concepts.

First of all, we will load the required packages and connect to a mock database.

library(DBI)
library(duckdb)
library(dplyr)
library(CDMConnector)
library(CodelistGenerator)

# Connect to the database and create the cdm object
con <- dbConnect(duckdb(), 
                      eunomiaDir("synpuf-1k", "5.3"))
cdm <- cdmFromCon(con = con, 
                  cdmName = "Eunomia Synpuf",
                  cdmSchema   = "main",
                  writeSchema = "main", 
                  achillesSchema = "main")

We will start by generating a codelist for acetaminophen using getDrugIngredientCodes()

acetaminophen <- getDrugIngredientCodes(cdm,
                                        name = "acetaminophen",
                                        nameStyle = "{concept_name}",
                                        type = "codelist")

Subsetting a Codelist

Subsetting a codelist will allow us to reduce a codelist to only those concepts that meet certain conditions.

Subset to Codes in Use

This function keeps only those codes observed in the database with at least a specified frequency (minimumCount) and in the table specified (table). Note that this function depends on ACHILLES tables being available in your CDM object.

acetaminophen_in_use <- subsetToCodesInUse(x = acetaminophen, 
                                           cdm, 
                                           minimumCount = 0,
                                           table = "drug_exposure")
acetaminophen_in_use # Only the first 5 concepts will be shown
#> 
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen (228 codes)

Subset by Domain

We will now subset to those concepts that have domain = "Drug". Remember that, to see the domains available in the cdm, you can use getDomains(cdm).

acetaminophen_drug <- subsetOnDomain(acetaminophen_in_use, cdm, domain = "Drug")

acetaminophen_drug
#> 
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen (228 codes)

We can use the negate argument to exclude concepts with a certain domain:

acetaminophen_no_drug <- subsetOnDomain(acetaminophen_in_use, cdm, domain = "Drug", negate = TRUE)

acetaminophen_no_drug
#> 
#> ── 0 codelists ─────────────────────────────────────────────────────────────────

Subset on Dose Unit

We will now filter to only include concepts with specified dose units. Remember that you can use getDoseUnit(cdm) to explore the dose units available in your cdm.

acetaminophen_mg_unit <- subsetOnDoseUnit(acetaminophen_drug, cdm, c("milligram", "unit"))
acetaminophen_mg_unit
#> 
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen (228 codes)

As before, we can use argument negate = TRUE to exclude instead.

Subset on route category

We will now subset to those concepts that do not have an “unclassified_route” or “transmucosal_rectal”:

acetaminophen_route <- subsetOnRouteCategory(acetaminophen_mg_unit, 
                                             cdm, c("transmucosal_rectal","unclassified_route"), 
                                             negate = TRUE)
acetaminophen_route
#> 
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen (221 codes)

Stratify codelist

Instead of filtering, stratification allows us to split a codelist into subgroups based on defined vocabulary properties.

Stratify by Dose Unit

acetaminophen_doses <- stratifyByDoseUnit(acetaminophen, cdm, keepOriginal = TRUE)

acetaminophen_doses
#> 
#> ── 4 codelists ─────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen (23935 codes)
#> - acetaminophen_milligram (22256 codes)
#> - acetaminophen_unit (1 codes)
#> - acetaminophen_unkown_dose_unit (1679 codes)

Stratify by Route Category

acetaminophen_routes <- stratifyByRouteCategory(acetaminophen, cdm)

acetaminophen_routes
#> 
#> ── 6 codelists ─────────────────────────────────────────────────────────────────
#> 
#> - acetaminophen_inhalable (3 codes)
#> - acetaminophen_injectable (689 codes)
#> - acetaminophen_oral (17219 codes)
#> - acetaminophen_topical (6 codes)
#> - acetaminophen_transmucosal_rectal (1459 codes)
#> - acetaminophen_unclassified_route (4559 codes)

Compare codelists

Now we will compare two codelists to identify overlapping and unique codes.

acetaminophen <- getDrugIngredientCodes(cdm, 
                                           name = "acetaminophen", 
                                           nameStyle = "{concept_name}",
                                           type = "codelist_with_details")
hydrocodone <- getDrugIngredientCodes(cdm, 
                                      name = "hydrocodone", 
                                      doseUnit = "milligram", 
                                      nameStyle = "{concept_name}",
                                      type = "codelist_with_details")

Compare the two sets:

comparison <- compareCodelists(acetaminophen$acetaminophen, hydrocodone$hydrocodone)

comparison |> glimpse()
#> Rows: 25,469
#> Columns: 5
#> $ concept_id   <int> 1124009, 1125315, 1125320, 1125321, 1125357, 1125358, 112…
#> $ concept_name <chr> "acetaminophen 635 MG / phenyltoloxamine citrate 55 MG Or…
#> $ codelist_1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ codelist_2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ codelist     <chr> "Only codelist 1", "Only codelist 1", "Only codelist 1", …

comparison |> filter(codelist == "Both")
#> # A tibble: 253 × 5
#>    concept_id concept_name                        codelist_1 codelist_2 codelist
#>         <int> <chr>                                    <dbl>      <dbl> <chr>   
#>  1    1129026 acetaminophen 500 MG / hydrocodone…          1          1 Both    
#>  2   40002683 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  3   40002684 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  4   40002685 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  5   40002686 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  6   40002687 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  7   40002688 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  8   40002689 acetaminophen / hydrocodone Oral C…          1          1 Both    
#>  9   40002690 acetaminophen / hydrocodone Oral C…          1          1 Both    
#> 10   40002691 acetaminophen / hydrocodone Oral C…          1          1 Both    
#> # ℹ 243 more rows