Summarise cohort entries • CohortCharacteristics

Introduction

In this example we’re going to summarise the characteristics of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic data.

We’ll begin by creating our study cohorts.

library(omock)
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(PatientProfiles)
library(CohortCharacteristics)
library(clock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "injuries",
  conceptSet = list(
    "ankle_sprain" = 81151,
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399
  ),
  end = "event_end_date",
  limit = "all"
)

Summarising cohort counts

We can first quickly summarise and present the overall counts of our cohorts.

cohortCounts <- summariseCohortCount(cdm$injuries)
tableCohortCount(cohortCounts)

CDM name	Variable name	Estimate name	Cohort name
CDM name	Variable name	Estimate name	ankle_sprain	ankle_fracture	forearm_fracture	hip_fracture
GiBleed	Number records	N	1,915	464	569	138
	Number subjects	N	1,357	427	510	132

Moreover, we can also easily stratify these counts. For example, here we add age groups and then stratify our counts by t We can summarise the overall counts of our cohorts.

cdm$injuries <- cdm$injuries |>
  addAge(
    ageGroup = list(c(0, 3), c(4, 17), c(18, Inf)),
    name = "injuries"
  )

cohortCounts <- summariseCohortCount(cdm[["injuries"]], strata = "age_group")
tableCohortCount(cohortCounts)

CDM name	Age group	Variable name	Estimate name	Cohort name
CDM name	Age group	Variable name	Estimate name	ankle_sprain	ankle_fracture	forearm_fracture	hip_fracture
GiBleed	overall	Number records	N	1,915	464	569	138
		Number subjects	N	1,357	427	510	132
	0 to 3	Number records	N	202	49	51	7
		Number subjects	N	196	49	51	7
	18 or above	Number records	N	1,047	213	268	88
		Number subjects	N	847	204	249	83
	4 to 17	Number records	N	666	202	250	43
		Number subjects	N	597	195	239	43

We can also apply minimum cell count suppression to our cohort counts. In this case we will obscure any counts below 10.

cohortCounts <- cohortCounts |>
  suppress(minCellCount = 10)
tableCohortCount(cohortCounts)

CDM name	Age group	Variable name	Estimate name	Cohort name
CDM name	Age group	Variable name	Estimate name	ankle_sprain	ankle_fracture	forearm_fracture	hip_fracture
GiBleed	overall	Number records	N	1,915	464	569	138
		Number subjects	N	1,357	427	510	132
	0 to 3	Number records	N	202	49	51	<10
		Number subjects	N	196	49	51	<10
	18 or above	Number records	N	1,047	213	268	88
		Number subjects	N	847	204	249	83
	4 to 17	Number records	N	666	202	250	43
		Number subjects	N	597	195	239	43

Summarising cohort attrition

Say we specify two inclusion criteria. First, we keep only cohort entries after the year 2000. Second, we keep only cohort entries for those aged 18 or older. We can easily create plots summarising our cohort attrition.

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "ankle_sprain",
  conceptSet = list("ankle_sprain" = 81151),
  end = "event_end_date",
  limit = "all"
)

cdm$ankle_sprain <- cdm$ankle_sprain |>
  filter(get_year(cohort_start_date) >= 2000) |>
  compute(temporary = FALSE, name = "ankle_sprain") |>
  recordCohortAttrition("Restrict to cohort_start_date >= 2000")

attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)

plotCohortAttrition(attritionSummary)

cdm$ankle_sprain <- cdm$ankle_sprain |>
  addAge() |>
  filter(age >= 18) |>
  compute(temporary = FALSE, name = "ankle_sprain") |>
  recordCohortAttrition("Restrict to age >= 18")

attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)

plotCohortAttrition(attritionSummary)

We could, of course, have applied these requirements the other way around.

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "ankle_sprain",
  conceptSet = list("ankle_sprain" = 81151),
  end = "event_end_date",
  limit = "all"
)

cdm$ankle_sprain <- cdm$ankle_sprain |>
  addAge() |>
  filter(age >= 18) |>
  compute(temporary = FALSE, name = "ankle_sprain") |>
  recordCohortAttrition("Restrict to age >= 18")

cdm$ankle_sprain <- cdm$ankle_sprain |>
  filter(get_year(cohort_start_date) >= 2000) |>
  compute(temporary = FALSE, name = "ankle_sprain") |>
  recordCohortAttrition("Restrict to cohort_start_date >= 2000")

attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)

plotCohortAttrition(attritionSummary)

As well as plotting cohort attrition, we can also create a table of our results.

tableCohortAttrition(attritionSummary)

Reason	Variable name
Reason	number_records	number_subjects	excluded_records	excluded_subjects
GiBleed; ankle_sprain
Initial qualifying events	1,915	1,357	0	0
Restrict to age >= 18	1,047	847	868	510
Restrict to cohort_start_date >= 2000	454	420	593	427