Integration Unit Tests for Pharmacoepidemiological Studies • TestGenerator

Did my cohort pick the correct number of patients? Am I calculating an intersection in the right way? Is that the expected value for treatment duration? It only takes one incorrect parameter to get incoherent results in a pharmacoepidemiological study, and testing calculations on huge, complex databases is very challenging.

That is why TestGenerator is useful: it lets you push a small sample of patients to unit test a study on the OMOP CDM. It includes tools to create a blank CDM with a complete vocabulary and check whether the code is doing what we expect in very specific cases.

This package is based on the unit tests written for the Erasmus MC Ranitidine Study.

Installation

Install the released version from CRAN:

install.packages("TestGenerator")

Basic workflow

TestGenerator starts from a small patient dataset. The data can be stored in an Excel workbook, with one sheet per OMOP CDM table, or in a folder of CSV files, with one file per table.

For help creating an Excel input from scratch, see the Start from a Blank Excel Template section of the website.

The package then converts those files into a Unit Test Definition JSON file. This JSON file is the object you keep in your package tests.

TestGenerator::readPatients(
  filePath = "inst/extdata/icu_sample_population.xlsx",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4"
)

If outputPath = NULL, the JSON file is written to tests/testthat/testCases, which is the usual location for package tests.

You can also call the Excel and CSV readers directly:

TestGenerator::readPatients.xl(
  filePath = "inst/extdata/icu_sample_population.xlsx",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4"
)

TestGenerator::readPatients.csv(
  filePath = "inst/extdata/icu_sample_population_csv",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4",
  reduceLargeIds = FALSE
)

Create a test CDM

Use patientsCDM() to load one Unit Test Definition into a blank OMOP CDM. By default, this creates a local DuckDB CDM with the small patient population and the vocabulary needed for testing.

cdm <- TestGenerator::patientsCDM(
  pathJson = "tests/testthat/testCases",
  testName = "icu_sample",
  cdmVersion = "5.4"
)

If pathJson = NULL, TestGenerator looks for the JSON file in tests/testthat/testCases.

The example below uses the sample ICU population included in the package.

file_path <- system.file(
  "extdata",
  "icu_sample_population.xlsx",
  package = "TestGenerator"
)

output_path <- file.path(tempdir(), "testgenerator-example")
dir.create(output_path, recursive = TRUE, showWarnings = FALSE)

TestGenerator::readPatients(
  filePath = file_path,
  testName = "icu_sample",
  outputPath = output_path,
  cdmVersion = "5.4"
)

cdm <- TestGenerator::patientsCDM(
  pathJson = output_path,
  testName = "icu_sample",
  cdmVersion = "5.4"
)

DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
unlink(output_path, recursive = TRUE)

Use it in a package test

The most useful pattern is to keep a small JSON test case in tests/testthat/testCases, build a CDM inside a testthat test, run your study code, and assert the expected result.

testthat::test_that("cohort construction returns the expected patients", {
  cdm <- TestGenerator::patientsCDM(
    pathJson = "tests/testthat/testCases",
    testName = "icu_sample",
    cdmVersion = "5.4"
  )
  withr::defer(
    DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
  )

  cohort_set <- CDMConnector::readCohortSet(
    system.file("extdata", "test_cohorts", package = "TestGenerator")
  )

  cdm <- CDMConnector::generateCohortSet(
    cdm = cdm,
    cohortSet = cohort_set,
    name = "test_cohorts"
  )

  result <- cdm[["test_cohorts"]] |>
    dplyr::collect()

  testthat::expect_equal(
    sort(unique(result$subject_id)),
    c(1, 2, 4, 5, 6, 7)
  )
})

The exact expectation should come from the micro population you designed. Good tests usually check subject counts, inclusion or exclusion rules, cohort dates, or treatment durations that are easy to verify by hand.