Introduction to CodelistGenerator
a01_Introduction_to_CodelistGenerator.Rmd
Creating a code list for dementia
For this example we are going to generate a candidate codelist for dementia, only looking for codes in the condition domain. Let’s first load some libraries
Connect to the OMOP CDM vocabularies
CodelistGenerator works with a cdm_reference
to the
vocabularies tables of the OMOP CDM using the CDMConnector package.
# example with postgres database connection details
db <- DBI::dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv("server"),
port = Sys.getenv("port"),
host = Sys.getenv("host"),
user = Sys.getenv("user"),
password = Sys.getenv("password")
)
# create cdm reference
cdm <- CDMConnector::cdm_from_con(
con = db,
cdm_schema = Sys.getenv("vocabulary_schema")
)
Check version of the vocabularies
It is important to note that the results from CodelistGenerator will be specific to a particular version of the OMOP CDM vocabularies. We can see the version of the vocabulary being used like so
getVocabVersion(cdm = cdm)
#> [1] "vocabVersion"
A code list from “Dementia” (4182210) and its descendants
The simplest approach to identifying potential codes is to take a high-level code and include all its descendants.
codesFromDescendants <- tbl(
db,
sql(paste0(
"SELECT * FROM ",
vocabularyDatabaseSchema,
".concept_ancestor"
))
) %>%
filter(ancestor_concept_id == "4182210") %>%
select("descendant_concept_id") %>%
rename("concept_id" = "descendant_concept_id") %>%
left_join(tbl(db, sql(paste0(
"SELECT * FROM ",
vocabularyDatabaseSchema,
".concept"
)))) %>%
select(
"concept_id", "concept_name",
"domain_id", "vocabulary_id"
) %>%
collect()
codesFromDescendants %>%
glimpse()
#> Rows: 151
#> Columns: 4
#> $ concept_id <int> 35610098, 4043241, 4139421, 37116466, 4046089, 44782559,…
#> $ concept_name <chr> "Predominantly cortical dementia", "Familial Alzheimer's…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition", "Con…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOME…
This looks to pick up most relevant codes. But, this approach misses codes that are not a descendant of 4182210. For example, codes such as “Wandering due to dementia” (37312577; https://athena.ohdsi.org/search-terms/terms/37312577) and “Anxiety due to dementia” (37312031; https://athena.ohdsi.org/search-terms/terms/37312031) are not picked up.
Generating a candidate code list using CodelistGenerator
To try and include all such terms that could be included we can use CodelistGenerator.
First, let’s do a simple search for a single keyword of “dementia”, including descendants of the identified codes.
dementiaCodes1 <- getCandidateCodes(
cdm = cdm,
keywords = "dementia",
domains = "Condition",
includeDescendants = TRUE
)
dementiaCodes1%>%
glimpse()
#> Rows: 187
#> Columns: 6
#> $ concept_id <int> 374326, 374888, 375791, 376085, 376094, 376095, 37694…
#> $ found_from <chr> "From initial search", "From initial search", "From i…
#> $ concept_name <chr> "Arteriosclerotic dementia with depression", "Dementi…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ standard_concept <chr> "standard", "standard", "standard", "standard", "stan…
Comparing code lists
What is the difference between this code list and the one from 4182210 and its descendants?
codeComparison <- compareCodelists(
codesFromDescendants,
dementiaCodes1
)
codeComparison %>%
group_by(codelist) %>%
tally()
#> # A tibble: 2 × 2
#> codelist n
#> <chr> <int>
#> 1 Both 151
#> 2 Only codelist 2 36
What are these extra codes picked up by CodelistGenerator?
codeComparison %>%
filter(codelist == "Only codelist 2") %>%
glimpse()
#> Rows: 36
#> Columns: 3
#> $ concept_id <int> 4041685, 4043378, 4044415, 4046091, 4092747, 4187091, 425…
#> $ concept_name <chr> "Amyotrophic lateral sclerosis with dementia", "Frontotem…
#> $ codelist <chr> "Only codelist 2", "Only codelist 2", "Only codelist 2", …
Review mappings from non-standard vocabularies
Perhaps we want to see what ICD10CM codes map to our candidate code list. We can get these by running
icdMappings <- getMappings(
cdm = cdm,
candidateCodelist = dementiaCodes1,
nonStandardVocabularies = "ICD10CM"
)
icdMappings %>%
glimpse()
#> Rows: 191
#> Columns: 7
#> $ standard_concept_id <int> 372610, 374341, 374888, 374888, 374888, 374…
#> $ standard_concept_name <chr> "Postconcussion syndrome", "Huntington's ch…
#> $ standard_vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id <int> 45571706, 35207314, 1568088, 1568089, 37402…
#> $ non_standard_concept_name <chr> "Postconcussional syndrome", "Huntington's …
#> $ non_standard_concept_code <chr> "F07.81", "G10", "F02", "F02.8", "F02.811",…
#> $ non_standard_vocabulary_id <chr> "ICD10CM", "ICD10CM", "ICD10CM", "ICD10CM",…
readMappings <- getMappings(
cdm = cdm,
candidateCodelist = dementiaCodes1,
nonStandardVocabularies = "Read"
)
readMappings %>%
glimpse()
#> Rows: 93
#> Columns: 7
#> $ standard_concept_id <int> 372610, 372610, 372610, 372610, 372610, 372…
#> $ standard_concept_name <chr> "Postconcussion syndrome", "Postconcussion …
#> $ standard_vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id <int> 45446542, 45446553, 45453190, 45459905, 455…
#> $ non_standard_concept_name <chr> "Post-concussion syndrome", "[X]Post-trauma…
#> $ non_standard_concept_code <chr> "E2A2.00", "Eu06212", "E2A2.11", "E2A2.12",…
#> $ non_standard_vocabulary_id <chr> "READ", "READ", "READ", "READ", "READ", "RE…