CodelistGenerator options: examples with a with mock vocabulary
a03_Options_for_CodelistGenerator.Rmd
Mock vocabulary database
Let´s say we have a mock vocabulary database with these hypothetical concepts and relationships.
Search for exact keyword match
To find “Musculoskeletal disorder” we can search for that like so
codes <- getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
includeDescendants = FALSE,
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Search completed. Finishing up.
#> ✔ 1 candidate concept identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id <int> 1
#> $ found_from <chr> "From initial search"
#> $ concept_name <chr> "Musculoskeletal disorder"
#> $ domain_id <chr> "Condition"
#> $ vocabulary_id <chr> "SNOMED"
#> $ standard_concept <chr> "S"
Note, we would also identify it based on a partial match
codes <- getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal",
domains = "Condition",
includeDescendants = FALSE
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Search completed. Finishing up.
#> ✔ 1 candidate concept identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id <int> 1
#> $ found_from <chr> "From initial search"
#> $ concept_name <chr> "Musculoskeletal disorder"
#> $ domain_id <chr> "Condition"
#> $ vocabulary_id <chr> "SNOMED"
#> $ standard_concept <chr> "S"
Add descendants
To include descendants of an identified code, we can set includeDescendants to TRUE
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
includeDescendants = TRUE
) |>
glimpse()
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 5 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
#> Rows: 5
#> Columns: 6
#> $ concept_id <int> 1, 2, 3, 4, 5
#> $ found_from <chr> "From initial search", "From descendants", "From desc…
#> $ concept_name <chr> "Musculoskeletal disorder", "Osteoarthrosis", "Arthri…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S", "S", "S", "S"
Multiple search terms
We can also search for multiple keywords at the same time, and would have picked these all up with the following search
codes <- getCandidateCodes(
cdm = cdm,
keywords = c(
"Musculoskeletal disorder",
"arthritis",
"arthrosis"
),
domains = "Condition",
includeDescendants = FALSE
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Search completed. Finishing up.
#> ✔ 5 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 5
#> Columns: 6
#> $ concept_id <int> 1, 3, 4, 5, 2
#> $ found_from <chr> "From initial search", "From initial search", "From i…
#> $ concept_name <chr> "Musculoskeletal disorder", "Arthritis", "Osteoarthri…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S", "S", "S", "S"
Add ancestor
To include the ancestors one level above the identified concepts we can set includeAncestor to TRUE
codes <- getCandidateCodes(
cdm = cdm,
keywords = "Osteoarthritis of knee",
includeAncestor = TRUE,
domains = "Condition"
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Adding ancestor
#> Search completed. Finishing up.
#> ✔ 2 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 2
#> Columns: 6
#> $ concept_id <int> 4, 3
#> $ found_from <chr> "From initial search", "From ancestor"
#> $ concept_name <chr> "Osteoarthritis of knee", "Arthritis"
#> $ domain_id <chr> "Condition", "Condition"
#> $ vocabulary_id <chr> "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S"
Searches with multiple words
We can also find concepts with multiple words even if they are in a different order. For example, a search for “Knee osteoarthritis” will pick up “Osteoarthritis of knee”.
codes <- getCandidateCodes(
cdm = cdm,
keywords = "Knee osteoarthritis",
domains = "Condition",
includeDescendants = TRUE
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 1 candidate concept identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id <int> 4
#> $ found_from <chr> "From initial search"
#> $ concept_name <chr> "Osteoarthritis of knee"
#> $ domain_id <chr> "Condition"
#> $ vocabulary_id <chr> "SNOMED"
#> $ standard_concept <chr> "S"
With exclusions
We can also exclude specific terms
codes <- getCandidateCodes(
cdm = cdm,
keywords = "arthritis",
exclude = "Hip osteoarthritis",
domains = "Condition"
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 2 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 2
#> Columns: 6
#> $ concept_id <int> 3, 4
#> $ found_from <chr> "From initial search", "From initial search"
#> $ concept_name <chr> "Arthritis", "Osteoarthritis of knee"
#> $ domain_id <chr> "Condition", "Condition"
#> $ vocabulary_id <chr> "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S"
Search using synonyms
We can also pick up codes based on their synonyms. In this case “Arthritis” has a synonym of “Osteoarthrosis” and so a search of both the primary name of a concept and any of its associated synonyms would pick up this synonym and it would be included.
codes <- getCandidateCodes(
cdm = cdm,
keywords = "osteoarthrosis",
domains = "Condition",
searchInSynonyms = TRUE
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding concepts using synonymns
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 4 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 4
#> Columns: 6
#> $ concept_id <int> 2, 3, 4, 5
#> $ found_from <chr> "From initial search", "In synonyms", "From descendan…
#> $ concept_name <chr> "Osteoarthrosis", "Arthritis", "Osteoarthritis of kne…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition"
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S", "S", "S"
Search via non-standard
Or we could have also picked up “Osteoarthrosis” by searching via non-standard.
codes <- getCandidateCodes(
cdm = cdm,
keywords = c("arthritis", "arthropathy"),
domains = "Condition",
searchNonStandard = TRUE
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Adding codes from non-standard
#> Search completed. Finishing up.
#> ✔ 4 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 4
#> Columns: 6
#> $ concept_id <int> 3, 4, 5, 2
#> $ found_from <chr> "From initial search", "From initial search", "From i…
#> $ concept_name <chr> "Arthritis", "Osteoarthritis of knee", "Osteoarthriti…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition"
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S", "S", "S"
Search for both standard and non-standard concepts
We can also include non-standard codes in our results like so
codes <- getCandidateCodes(
cdm = cdm,
keywords = c(
"Musculoskeletal disorder",
"arthritis",
"arthropathy",
"arthrosis"
),
domains = "Condition",
standardConcept = c("Standard", "Non-standard")
)
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 8 candidate concepts identified
#>
#> Time taken: 0 minutes and 0 seconds
codes |>
glimpse()
#> Rows: 8
#> Columns: 6
#> $ concept_id <int> 1, 3, 4, 5, 8, 17, 7, 2
#> $ found_from <chr> "From initial search", "From initial search", "From i…
#> $ concept_name <chr> "Musculoskeletal disorder", "Arthritis", "Osteoarthri…
#> $ domain_id <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "Read", "ICD1…
#> $ standard_concept <chr> "S", "S", "S", "S", NA, NA, NA, "S"