
Comparing algorithm output with the Pregnancy Extension Table (PET)
Source:vignettes/compareWithPET.Rmd
compareWithPET.RmdOverview
The Pregnancy Extension Table (PET) is an OMOP CDM
extension that stores pregnancy episodes (start date, end date, outcome)
identified by a separate process (e.g. chart review or another
algorithm). The function
comparePregnancyIdentifierWithPET()
compares the episodes produced by the PregnancyIdentifier pipeline to
the PET and writes comparison summaries to CSV files. This vignette
describes how to run the comparison using the mock CDM, how
matching is done, and what each output contains.
How to run the PET comparison
You need:
-
Algorithm output: a directory containing
final_pregnancy_episodes.rds(fromrunPregnancyIdentifier()). -
A PET table in the same CDM: a table with at least
person_id,pregnancy_start_date,pregnancy_end_date, andpregnancy_outcome(concept_id).
Below we run the pipeline with mockPregnancyCdm() (which
includes a PET table pregnancy_extension in schema
main), then run the comparison.
library(PregnancyIdentifier)
library(CDMConnector)
library(dplyr)
library(tidyr)
library(knitr)
# Helper: get the results table from a summarised_result (handles list with $results or single table)
sr_results <- function(sr) {
if (is.list(sr) && "results" %in% names(sr) && is.data.frame(sr$results)) {
sr$results
} else {
as.data.frame(sr)
}
}
# Helper: extract a wide table for one variable from the summarised result
sr_table <- function(sr, var, level_name = "variable_level") {
tbl <- sr_results(sr)
d <- dplyr::filter(tbl, .data$variable_name == .env$var)
if (nrow(d) == 0) return(NULL)
d <- dplyr::select(d, "variable_level", "estimate_name", "estimate_value")
wide <- tidyr::pivot_wider(d, names_from = "estimate_name", values_from = "estimate_value")
if (level_name != "variable_level") wide <- dplyr::rename(wide, !!level_name := "variable_level")
wide
}
# Directories: pipeline output (episode data), export (comparison results and log)
td <- tempdir()
if (!dir.exists(td)) dir.create(td, recursive = TRUE, showWarnings = FALSE)
outputDir <- file.path(td, "pet_vignette_pipeline")
exportFolder <- file.path(td, "pet_vignette_comparison")
dir.create(outputDir, recursive = TRUE, showWarnings = FALSE)
dir.create(exportFolder, recursive = TRUE, showWarnings = FALSE)
# 1) Build mock CDM and run the pipeline (export runs by default to outputDir/export)
cdm <- mockPregnancyCdm()
#>
#> Download completed!
runPregnancyIdentifier(
cdm = cdm,
outputFolder = outputDir,
outputLogToConsole = FALSE
)
# The mock CDM includes a PET table `pregnancy_extension` in schema `main` with the
# required columns. We run the comparison against it. (Alternatively, you could build
# a PET from the algorithm output as in the `insert-mock-pet` chunk and use
# petTable = "pregnancy_episode".)
# 3) Run the PET comparison (writes summarised result and log to exportFolder)
pet_comparison <- comparePregnancyIdentifierWithPET(
cdm = cdm,
outputFolder = outputDir,
exportFolder = exportFolder,
petSchema = "main",
petTable = "pregnancy_extension",
minOverlapDays = 1L,
outputLogToConsole = FALSE
)
# Load the written summarised result for display and programmatic use
res <- omopgenerics::importSummarisedResult(file.path(exportFolder, "pet_comparison_summarised_result.csv"))How matching is done
Episodes are matched by:
-
Same person: only algorithm and PET episodes from
the same
person_idare considered. -
Overlapping dates: for each (algorithm episode, PET
episode) pair, overlap in days is
max(0, min(alg_end, pet_end) - max(alg_start, pet_start) + 1).
Pairs with overlap ≥minOverlapDays(default 1) are candidate pairs. - One-to-one assignment: within each person, candidate pairs are sorted by overlap (descending). A greedy algorithm assigns each PET episode to at most one algorithm episode and vice versa: it repeatedly picks the pair with the largest overlap among those whose PET and algorithm indices are not yet used. This avoids double-counting and yields consistent Venn and confusion counts.
Optional: if removeWithinSourceOverlaps = TRUE,
overlapping episodes within PET and within the algorithm are removed
(greedy non-overlapping by start date, max 400 days) before
matching, which can reduce many-to-many pairs.
Outputs generated
The function writes a single CSV in SummarisedResult format to () and returns nothing. Re-import it with and display with :
# Display as a gt table (optional: requires visOmopResults)
if (requireNamespace("visOmopResults", quietly = TRUE)) {
visOmopResults::visOmopTable(
result = res,
header = "cdm_name",
rename = c("Data source" = "cdm_name"),
hide = c("result_id", "group_name", "group_level", "strata_name", "strata_level", "pet_comparison"))
}| Variable name | Variable level | Estimate name |
Data source
|
|---|---|---|---|
| TestData_P4_C5_002_1 | |||
| episode_counts | algorithm | n_episodes | 35 |
| n_persons | 32 | ||
| pet | n_episodes | 33 | |
| n_persons | 25 | ||
| person_overlap | raw_person_overlap | n_persons | 25 |
| cohort_person_overlap | n_persons | 23 | |
| venn_counts | both | n_episodes | 27 |
| n_pet_matched | 27 | ||
| n_alg_matched | 27 | ||
| pet_only | n_episodes | 6 | |
| n_pet_matched | 27 | ||
| n_alg_matched | 27 | ||
| algorithm_only | n_episodes | 8 | |
| n_pet_matched | 27 | ||
| n_alg_matched | 27 | ||
| protocol_summary | overall | total_pet_episodes | 33 |
| total_algorithm_episodes | 35 | ||
| total_matched_episodes | 27 | ||
| confusion_2x2 | TP | count | 27 |
| FN | count | 6 | |
| FP | count | 8 | |
| TN | count | – | |
| ppv_sensitivity | sensitivity | value | 0.82 |
| numerator | 27 | ||
| denominator | 33 | ||
| ppv | value | 0.77 | |
| numerator | 27 | ||
| denominator | 35 | ||
| date_difference_summary | Start date difference (PET - Algorithm, days) | mean | 1.37 |
| median | 2.00 | ||
| sd | 2.73 | ||
| min | -5.00 | ||
| q25 | -0.50 | ||
| q75 | 3.00 | ||
| max | 5.00 | ||
| n_matched | 27 | ||
| End date difference (PET - Algorithm, days) | mean | -0.67 | |
| median | 0.00 | ||
| sd | 2.54 | ||
| min | -5.00 | ||
| q25 | -3.00 | ||
| q75 | 1.00 | ||
| max | 4.00 | ||
| n_matched | 27 | ||
| Duration difference (PET - Algorithm, days) | mean | -2.04 | |
| median | -2.00 | ||
| sd | 4.16 | ||
| min | -9.00 | ||
| q25 | -4.50 | ||
| q75 | -1.00 | ||
| max | 8.00 | ||
| n_matched | 27 | ||
| date_difference_by_outcome | Start date difference (PET - Algorithm, days) [AB] | mean | 3.00 |
| median | 3.00 | ||
| sd | – | ||
| min | 3.00 | ||
| q25 | 3.00 | ||
| q75 | 3.00 | ||
| max | 3.00 | ||
| n_matched | – | ||
| End date difference (PET - Algorithm, days) [AB] | mean | 1.00 | |
| median | 1.00 | ||
| sd | – | ||
| min | 1.00 | ||
| q25 | 1.00 | ||
| q75 | 1.00 | ||
| max | 1.00 | ||
| n_matched | – | ||
| Duration difference (PET - Algorithm, days) [AB] | mean | -2.00 | |
| median | -2.00 | ||
| sd | – | ||
| min | -2.00 | ||
| q25 | -2.00 | ||
| q75 | -2.00 | ||
| max | -2.00 | ||
| n_matched | – | ||
| Start date difference (PET - Algorithm, days) [ECT] | mean | 0.00 | |
| median | 0.00 | ||
| sd | – | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| n_matched | – | ||
| End date difference (PET - Algorithm, days) [ECT] | mean | 0.00 | |
| median | 0.00 | ||
| sd | – | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| n_matched | – | ||
| Duration difference (PET - Algorithm, days) [ECT] | mean | 0.00 | |
| median | 0.00 | ||
| sd | – | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| n_matched | – | ||
| Start date difference (PET - Algorithm, days) [LB] | mean | -0.25 | |
| median | -0.50 | ||
| sd | 3.15 | ||
| min | -5.00 | ||
| q25 | -1.75 | ||
| q75 | 3.00 | ||
| max | 3.00 | ||
| n_matched | 8 | ||
| End date difference (PET - Algorithm, days) [LB] | mean | 0.12 | |
| median | 1.50 | ||
| sd | 3.27 | ||
| min | -5.00 | ||
| q25 | -3.00 | ||
| q75 | 3.00 | ||
| max | 3.00 | ||
| n_matched | 8 | ||
| Duration difference (PET - Algorithm, days) [LB] | mean | 0.38 | |
| median | -1.50 | ||
| sd | 5.10 | ||
| min | -6.00 | ||
| q25 | -2.50 | ||
| q75 | 4.00 | ||
| max | 8.00 | ||
| n_matched | 8 | ||
| Start date difference (PET - Algorithm, days) [PREG] | mean | 2.20 | |
| median | 3.00 | ||
| sd | 2.51 | ||
| min | -2.00 | ||
| q25 | 0.00 | ||
| q75 | 4.50 | ||
| max | 5.00 | ||
| n_matched | 15 | ||
| End date difference (PET - Algorithm, days) [PREG] | mean | -1.07 | |
| median | -1.00 | ||
| sd | 2.25 | ||
| min | -5.00 | ||
| q25 | -2.50 | ||
| q75 | 0.00 | ||
| max | 4.00 | ||
| n_matched | 15 | ||
| Duration difference (PET - Algorithm, days) [PREG] | mean | -3.27 | |
| median | -2.00 | ||
| sd | 3.65 | ||
| min | -9.00 | ||
| q25 | -6.50 | ||
| q75 | -1.00 | ||
| max | 3.00 | ||
| n_matched | 15 | ||
| Start date difference (PET - Algorithm, days) [SA] | mean | 1.00 | |
| median | 1.00 | ||
| sd | – | ||
| min | 1.00 | ||
| q25 | 1.00 | ||
| q75 | 1.00 | ||
| max | 1.00 | ||
| n_matched | – | ||
| End date difference (PET - Algorithm, days) [SA] | mean | -4.00 | |
| median | -4.00 | ||
| sd | – | ||
| min | -4.00 | ||
| q25 | -4.00 | ||
| q75 | -4.00 | ||
| max | -4.00 | ||
| n_matched | – | ||
| Duration difference (PET - Algorithm, days) [SA] | mean | -5.00 | |
| median | -5.00 | ||
| sd | – | ||
| min | -5.00 | ||
| q25 | -5.00 | ||
| q75 | -5.00 | ||
| max | -5.00 | ||
| n_matched | – | ||
| Start date difference (PET - Algorithm, days) [SB] | mean | 2.00 | |
| median | 2.00 | ||
| sd | – | ||
| min | 2.00 | ||
| q25 | 2.00 | ||
| q75 | 2.00 | ||
| max | 2.00 | ||
| n_matched | – | ||
| End date difference (PET - Algorithm, days) [SB] | mean | 0.00 | |
| median | 0.00 | ||
| sd | – | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| n_matched | – | ||
| Duration difference (PET - Algorithm, days) [SB] | mean | -2.00 | |
| median | -2.00 | ||
| sd | – | ||
| min | -2.00 | ||
| q25 | -2.00 | ||
| q75 | -2.00 | ||
| max | -2.00 | ||
| n_matched | – | ||
| date_difference_distribution | duration_diff::≤ -30 | n | – |
| duration_diff::-29 to -15 | n | – | |
| duration_diff::-14 to -8 | n | – | |
| duration_diff::-7 to -1 | n | 18 | |
| duration_diff::0 | n | – | |
| duration_diff::1 to 7 | n | – | |
| duration_diff::8 to 14 | n | – | |
| duration_diff::15 to 29 | n | – | |
| duration_diff::≥ 30 | n | – | |
| end_diff::≤ -30 | n | – | |
| end_diff::-29 to -15 | n | – | |
| end_diff::-14 to -8 | n | – | |
| end_diff::-7 to -1 | n | 13 | |
| end_diff::0 | n | 5 | |
| end_diff::1 to 7 | n | 9 | |
| end_diff::8 to 14 | n | – | |
| end_diff::15 to 29 | n | – | |
| end_diff::≥ 30 | n | – | |
| start_diff::≤ -30 | n | – | |
| start_diff::-29 to -15 | n | – | |
| start_diff::-14 to -8 | n | – | |
| start_diff::-7 to -1 | n | 7 | |
| start_diff::0 | n | – | |
| start_diff::1 to 7 | n | 16 | |
| start_diff::8 to 14 | n | – | |
| start_diff::15 to 29 | n | – | |
| start_diff::≥ 30 | n | – | |
| date_difference_distribution_by_outcome | duration_diff::AB::≤ -30 | n | – |
| duration_diff::AB::-29 to -15 | n | – | |
| duration_diff::AB::-14 to -8 | n | – | |
| duration_diff::AB::-7 to -1 | n | – | |
| duration_diff::AB::0 | n | – | |
| duration_diff::AB::1 to 7 | n | – | |
| duration_diff::AB::8 to 14 | n | – | |
| duration_diff::AB::15 to 29 | n | – | |
| duration_diff::AB::≥ 30 | n | – | |
| end_diff::AB::≤ -30 | n | – | |
| end_diff::AB::-29 to -15 | n | – | |
| end_diff::AB::-14 to -8 | n | – | |
| end_diff::AB::-7 to -1 | n | – | |
| end_diff::AB::0 | n | – | |
| end_diff::AB::1 to 7 | n | – | |
| end_diff::AB::8 to 14 | n | – | |
| end_diff::AB::15 to 29 | n | – | |
| end_diff::AB::≥ 30 | n | – | |
| start_diff::AB::≤ -30 | n | – | |
| start_diff::AB::-29 to -15 | n | – | |
| start_diff::AB::-14 to -8 | n | – | |
| start_diff::AB::-7 to -1 | n | – | |
| start_diff::AB::0 | n | – | |
| start_diff::AB::1 to 7 | n | – | |
| start_diff::AB::8 to 14 | n | – | |
| start_diff::AB::15 to 29 | n | – | |
| start_diff::AB::≥ 30 | n | – | |
| duration_diff::ECT::≤ -30 | n | – | |
| duration_diff::ECT::-29 to -15 | n | – | |
| duration_diff::ECT::-14 to -8 | n | – | |
| duration_diff::ECT::-7 to -1 | n | – | |
| duration_diff::ECT::0 | n | – | |
| duration_diff::ECT::1 to 7 | n | – | |
| duration_diff::ECT::8 to 14 | n | – | |
| duration_diff::ECT::15 to 29 | n | – | |
| duration_diff::ECT::≥ 30 | n | – | |
| end_diff::ECT::≤ -30 | n | – | |
| end_diff::ECT::-29 to -15 | n | – | |
| end_diff::ECT::-14 to -8 | n | – | |
| end_diff::ECT::-7 to -1 | n | – | |
| end_diff::ECT::0 | n | – | |
| end_diff::ECT::1 to 7 | n | – | |
| end_diff::ECT::8 to 14 | n | – | |
| end_diff::ECT::15 to 29 | n | – | |
| end_diff::ECT::≥ 30 | n | – | |
| start_diff::ECT::≤ -30 | n | – | |
| start_diff::ECT::-29 to -15 | n | – | |
| start_diff::ECT::-14 to -8 | n | – | |
| start_diff::ECT::-7 to -1 | n | – | |
| start_diff::ECT::0 | n | – | |
| start_diff::ECT::1 to 7 | n | – | |
| start_diff::ECT::8 to 14 | n | – | |
| start_diff::ECT::15 to 29 | n | – | |
| start_diff::ECT::≥ 30 | n | – | |
| duration_diff::LB::≤ -30 | n | – | |
| duration_diff::LB::-29 to -15 | n | – | |
| duration_diff::LB::-14 to -8 | n | – | |
| duration_diff::LB::-7 to -1 | n | 5 | |
| duration_diff::LB::0 | n | – | |
| duration_diff::LB::1 to 7 | n | – | |
| duration_diff::LB::8 to 14 | n | – | |
| duration_diff::LB::15 to 29 | n | – | |
| duration_diff::LB::≥ 30 | n | – | |
| end_diff::LB::≤ -30 | n | – | |
| end_diff::LB::-29 to -15 | n | – | |
| end_diff::LB::-14 to -8 | n | – | |
| end_diff::LB::-7 to -1 | n | – | |
| end_diff::LB::0 | n | – | |
| end_diff::LB::1 to 7 | n | 5 | |
| end_diff::LB::8 to 14 | n | – | |
| end_diff::LB::15 to 29 | n | – | |
| end_diff::LB::≥ 30 | n | – | |
| start_diff::LB::≤ -30 | n | – | |
| start_diff::LB::-29 to -15 | n | – | |
| start_diff::LB::-14 to -8 | n | – | |
| start_diff::LB::-7 to -1 | n | – | |
| start_diff::LB::0 | n | – | |
| start_diff::LB::1 to 7 | n | – | |
| start_diff::LB::8 to 14 | n | – | |
| start_diff::LB::15 to 29 | n | – | |
| start_diff::LB::≥ 30 | n | – | |
| duration_diff::PREG::≤ -30 | n | – | |
| duration_diff::PREG::-29 to -15 | n | – | |
| duration_diff::PREG::-14 to -8 | n | – | |
| duration_diff::PREG::-7 to -1 | n | 10 | |
| duration_diff::PREG::0 | n | – | |
| duration_diff::PREG::1 to 7 | n | – | |
| duration_diff::PREG::8 to 14 | n | – | |
| duration_diff::PREG::15 to 29 | n | – | |
| duration_diff::PREG::≥ 30 | n | – | |
| end_diff::PREG::≤ -30 | n | – | |
| end_diff::PREG::-29 to -15 | n | – | |
| end_diff::PREG::-14 to -8 | n | – | |
| end_diff::PREG::-7 to -1 | n | 9 | |
| end_diff::PREG::0 | n | – | |
| end_diff::PREG::1 to 7 | n | – | |
| end_diff::PREG::8 to 14 | n | – | |
| end_diff::PREG::15 to 29 | n | – | |
| end_diff::PREG::≥ 30 | n | – | |
| start_diff::PREG::≤ -30 | n | – | |
| start_diff::PREG::-29 to -15 | n | – | |
| start_diff::PREG::-14 to -8 | n | – | |
| start_diff::PREG::-7 to -1 | n | – | |
| start_diff::PREG::0 | n | – | |
| start_diff::PREG::1 to 7 | n | 10 | |
| start_diff::PREG::8 to 14 | n | – | |
| start_diff::PREG::15 to 29 | n | – | |
| start_diff::PREG::≥ 30 | n | – | |
| duration_diff::SA::≤ -30 | n | – | |
| duration_diff::SA::-29 to -15 | n | – | |
| duration_diff::SA::-14 to -8 | n | – | |
| duration_diff::SA::-7 to -1 | n | – | |
| duration_diff::SA::0 | n | – | |
| duration_diff::SA::1 to 7 | n | – | |
| duration_diff::SA::8 to 14 | n | – | |
| duration_diff::SA::15 to 29 | n | – | |
| duration_diff::SA::≥ 30 | n | – | |
| end_diff::SA::≤ -30 | n | – | |
| end_diff::SA::-29 to -15 | n | – | |
| end_diff::SA::-14 to -8 | n | – | |
| end_diff::SA::-7 to -1 | n | – | |
| end_diff::SA::0 | n | – | |
| end_diff::SA::1 to 7 | n | – | |
| end_diff::SA::8 to 14 | n | – | |
| end_diff::SA::15 to 29 | n | – | |
| end_diff::SA::≥ 30 | n | – | |
| start_diff::SA::≤ -30 | n | – | |
| start_diff::SA::-29 to -15 | n | – | |
| start_diff::SA::-14 to -8 | n | – | |
| start_diff::SA::-7 to -1 | n | – | |
| start_diff::SA::0 | n | – | |
| start_diff::SA::1 to 7 | n | – | |
| start_diff::SA::8 to 14 | n | – | |
| start_diff::SA::15 to 29 | n | – | |
| start_diff::SA::≥ 30 | n | – | |
| duration_diff::SB::≤ -30 | n | – | |
| duration_diff::SB::-29 to -15 | n | – | |
| duration_diff::SB::-14 to -8 | n | – | |
| duration_diff::SB::-7 to -1 | n | – | |
| duration_diff::SB::0 | n | – | |
| duration_diff::SB::1 to 7 | n | – | |
| duration_diff::SB::8 to 14 | n | – | |
| duration_diff::SB::15 to 29 | n | – | |
| duration_diff::SB::≥ 30 | n | – | |
| end_diff::SB::≤ -30 | n | – | |
| end_diff::SB::-29 to -15 | n | – | |
| end_diff::SB::-14 to -8 | n | – | |
| end_diff::SB::-7 to -1 | n | – | |
| end_diff::SB::0 | n | – | |
| end_diff::SB::1 to 7 | n | – | |
| end_diff::SB::8 to 14 | n | – | |
| end_diff::SB::15 to 29 | n | – | |
| end_diff::SB::≥ 30 | n | – | |
| start_diff::SB::≤ -30 | n | – | |
| start_diff::SB::-29 to -15 | n | – | |
| start_diff::SB::-14 to -8 | n | – | |
| start_diff::SB::-7 to -1 | n | – | |
| start_diff::SB::0 | n | – | |
| start_diff::SB::1 to 7 | n | – | |
| start_diff::SB::8 to 14 | n | – | |
| start_diff::SB::15 to 29 | n | – | |
| start_diff::SB::≥ 30 | n | – | |
| outcome_accuracy | overall | n_correct | 11 |
| n_total | 11 | ||
| accuracy | 1.00 | ||
| outcome_by_year | same_year_pairs | overall_equal | 11 |
| overall_diff | 16 | ||
| lb_lb | 8 | ||
| lb_miscarriage | – | ||
| lb_ab | – | ||
| lb_sb | – | ||
| lb_unknown | – | ||
| sb_sb | – | ||
| sb_miscarriage | – | ||
| sb_ab | – | ||
| sb_lb | – | ||
| sb_unknown | – | ||
| ab_ab | – | ||
| ab_miscarriage | – | ||
| ab_lb | – | ||
| ab_sb | – | ||
| ab_unknown | – | ||
| duration_summary | algorithm | n | 35 |
| mean | 250.51 | ||
| median | 224.00 | ||
| sd | 277.23 | ||
| min | 21.00 | ||
| q25 | 147.00 | ||
| q75 | 280.00 | ||
| max | 1,749.00 | ||
| pet | n | 33 | |
| mean | 210.45 | ||
| median | 260.00 | ||
| sd | 97.42 | ||
| min | 15.00 | ||
| q25 | 140.00 | ||
| q75 | 280.00 | ||
| max | 377.00 | ||
| duration_matched_summary | algorithm | n | 27 |
| mean | 196.00 | ||
| median | 147.00 | ||
| sd | 99.40 | ||
| min | 21.00 | ||
| q25 | 147.00 | ||
| q75 | 280.00 | ||
| max | 380.00 | ||
| pet | n | 27 | |
| mean | 193.96 | ||
| median | 150.00 | ||
| sd | 100.51 | ||
| min | 15.00 | ||
| q25 | 138.50 | ||
| q75 | 279.50 | ||
| max | 377.00 | ||
| group_gestational_time | matched | n | 27 |
| mean | 196.00 | ||
| median | 147.00 | ||
| sd | 99.40 | ||
| min | 21.00 | ||
| q25 | 147.00 | ||
| q75 | 280.00 | ||
| max | 380.00 | ||
| algorithm_only | n | 8 | |
| mean | 434.50 | ||
| median | 287.50 | ||
| sd | 535.20 | ||
| min | 147.00 | ||
| q25 | 212.50 | ||
| q75 | 303.75 | ||
| max | 1,749.00 | ||
| pet_only | n | 6 | |
| mean | 284.67 | ||
| median | 280.00 | ||
| sd | 11.43 | ||
| min | 280.00 | ||
| q25 | 280.00 | ||
| q75 | 280.00 | ||
| max | 308.00 | ||
| group_outcome | matched:AB | n | – |
| pct | 3.70 | ||
| matched:ECT | n | – | |
| pct | 3.70 | ||
| matched:LB | n | 8 | |
| pct | 29.63 | ||
| matched:PREG | n | 15 | |
| pct | 55.56 | ||
| matched:SA | n | – | |
| pct | 3.70 | ||
| matched:SB | n | – | |
| pct | 3.70 | ||
| algorithm_only:DELIV | n | – | |
| pct | 12.50 | ||
| algorithm_only:LB | n | – | |
| pct | 12.50 | ||
| algorithm_only:PREG | n | 6 | |
| pct | 75.00 | ||
| pet_only:LB | n | 6 | |
| pct | 100.00 | ||
| group_source | matched:both | n | 24 |
| pct | 88.89 | ||
| matched:hip_only | n | – | |
| pct | 3.70 | ||
| matched:pps_only | n | – | |
| pct | 7.41 | ||
| algorithm_only:both | n | 6 | |
| pct | 75.00 | ||
| algorithm_only:hip_only | n | – | |
| pct | 25.00 | ||
| person_venn_counts | both | n_persons | 25 |
| n_alg_episodes | 28 | ||
| n_pet_episodes | 33 | ||
| algorithm_only | n_persons | 7 | |
| n_episodes | 7 | ||
| pet_only | n_persons | – | |
| n_episodes | – | ||
| person_episodes_per_person | both:algorithm | mean | 1.12 |
| median | 1.00 | ||
| sd | 0.33 | ||
| min | 1.00 | ||
| q25 | 1.00 | ||
| q75 | 1.00 | ||
| max | 2.00 | ||
| both:pet | mean | 1.32 | |
| median | 1.00 | ||
| sd | 0.56 | ||
| min | 1.00 | ||
| q25 | 1.00 | ||
| q75 | 2.00 | ||
| max | 3.00 | ||
| algorithm_only | mean | 1.00 | |
| median | 1.00 | ||
| sd | 0.00 | ||
| min | 1.00 | ||
| q25 | 1.00 | ||
| q75 | 1.00 | ||
| max | 1.00 | ||
| pet_only | mean | – | |
| median | – | ||
| sd | – | ||
| min | – | ||
| q25 | – | ||
| q75 | – | ||
| max | – | ||
| pet_only_hip_coverage | overall | n_total | 6 |
| n_with_hip | – | ||
| n_without_hip | 6 | ||
| pct_with_hip | 0.00 | ||
| pet_only_pps_coverage | overall | n_with_pps | – |
| n_without_pps | 6 | ||
| pct_with_pps | 0.00 | ||
| pet_only_any_record_coverage | overall | n_with_any | – |
| n_without_any | 6 | ||
| pct_with_any | 0.00 | ||
| pet_only_hip_record_count | overall | n | 6 |
| mean | 0.00 | ||
| median | 0.00 | ||
| sd | 0.00 | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| pet_only_pps_record_count | overall | n | 6 |
| mean | 0.00 | ||
| median | 0.00 | ||
| sd | 0.00 | ||
| min | 0.00 | ||
| q25 | 0.00 | ||
| q75 | 0.00 | ||
| max | 0.00 | ||
| delivery_mode | matched:2014 | n | – |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2015 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2017 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2018 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2019 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2021 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2023 | n | 11 | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:2024 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| matched:overall | n | 27 | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| algorithm_only:2015 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| algorithm_only:2018 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| algorithm_only:2020 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| algorithm_only:2023 | n | – | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – | ||
| algorithm_only:overall | n | 8 | |
| n_cesarean | – | ||
| n_vaginal | – | ||
| n_known | – | ||
| pct_cesarean | – | ||
| pct_vaginal | – |
The summarised result is in long format: each row has , , , and . The helper used below extracts and pivots one variable into a wide table for display. The following sections describe each metric.
Episode and person counts
kable(sr_table(res, "episode_counts", "source"), format = "html", caption = "Episode counts: algorithm vs PET")| source | n_episodes | n_persons |
|---|---|---|
| algorithm | 35 | 32 |
| pet | 33 | 25 |
Number of episodes and distinct persons in the algorithm output and in the PET table.
Protocol summary
kable(sr_table(res, "protocol_summary"), format = "html", caption = "Protocol summary (for reporting)")| variable_level | total_pet_episodes | total_algorithm_episodes | total_matched_episodes |
|---|---|---|---|
| overall | 33 | 35 | 27 |
Totals and number of matched episodes (one-to-one pairs).
Person overlap
kable(sr_table(res, "person_overlap", "metric"), format = "html", caption = "Person overlap")| metric | n_persons |
|---|---|
| raw_person_overlap | 25 |
| cohort_person_overlap | 23 |
- raw_person_overlap: distinct persons with at least one PET episode and one algorithm episode.
- cohort_person_overlap: same, but after filtering both sources to gestation 0–308 days and end ≥ start.
Venn counts (matched / PET-only / algorithm-only)
kable(sr_table(res, "venn_counts", "category"), format = "html", caption = "Venn counts (one-to-one matching)")| category | n_episodes | n_pet_matched | n_alg_matched |
|---|---|---|---|
| both | 27 | 27 | 27 |
| pet_only | 6 | 27 | 27 |
| algorithm_only | 8 | 27 | 27 |
- both: number of matched episode pairs.
- pet_only: PET episodes with no matched algorithm episode.
- algorithm_only: algorithm episodes with no matched PET episode.
2×2 confusion matrix (PET as reference)
kable(sr_table(res, "confusion_2x2", "cell"), format = "html", caption = "2×2 confusion matrix (PET = reference)")| cell | count |
|---|---|
| TP | 27 |
| FN | 6 |
| FP | 8 |
| TN | NA |
- TP: PET episode has a matched algorithm episode.
- FN: PET episode has no match (algorithm “miss”).
- FP: Algorithm episode has no match (algorithm “extra”).
- TN: Not defined at episode level (no negative population).
Sensitivity, PPV, NPV
kable(sr_table(res, "ppv_sensitivity", "metric"), format = "html", caption = "Sensitivity, specificity, PPV, NPV")| metric | value | numerator | denominator |
|---|---|---|---|
| sensitivity | 0.818181818181818 | 27 | 33 |
| ppv | 0.771428571428571 | 27 | 35 |
- Sensitivity = TP / (TP + FN): fraction of PET episodes that have a match.
- PPV = TP / (TP + FP): fraction of algorithm episodes that match a PET episode.
- Specificity and NPV use TN and are NA at episode level.
Date differences (matched pairs only)
For each matched pair, date differences are PET − algorithm (positive = PET is later, i.e. algorithm starts/ends too early). Three measures are reported:
- Start date difference: PET start − algorithm start. Positive = algorithm starts too early; negative = algorithm starts too late.
- End date difference: PET end − algorithm end. Same sign convention.
- Duration difference: PET duration − algorithm duration. Positive = algorithm episodes are shorter; negative = longer.
dd_summary <- sr_table(res, "date_difference_summary")
if (!is.null(dd_summary) && nrow(dd_summary) > 0) {
kable(dd_summary, format = "html", caption = "Date difference summary (PET − algorithm, days)")
} else {
cat("No matched pairs; date differences not computed.\n")
}| variable_level | mean | median | sd | min | q25 | q75 | max | n_matched |
|---|---|---|---|---|---|---|---|---|
| Start date difference (PET - Algorithm, days) | 1.37037037037037 | 2 | 2.73366685478214 | -5 | -0.5 | 3 | 5 | 27 |
| End date difference (PET - Algorithm, days) | -0.666666666666667 | 0 | 2.54195563720897 | -5 | -3 | 1 | 4 | 27 |
| Duration difference (PET - Algorithm, days) | -2.03703703703704 | -2 | 4.16470039075194 | -9 | -4.5 | -1 | 8 | 27 |
Date differences by outcome
The same start/end/duration differences are also reported stratified by algorithm outcome category (LB, SB, SA, AB, PREG, etc.), since different outcomes have very different expected durations.
dd_by_outcome <- sr_table(res, "date_difference_by_outcome")
if (!is.null(dd_by_outcome) && nrow(dd_by_outcome) > 0) {
kable(dd_by_outcome, format = "html", caption = "Date differences by algorithm outcome (PET − algorithm, days)")
} else {
cat("No outcome-stratified date differences available.\n")
}| variable_level | mean | median | sd | min | q25 | q75 | max | n_matched |
|---|---|---|---|---|---|---|---|---|
| Start date difference (PET - Algorithm, days) [AB] | 3 | 3 | NA | 3 | 3 | 3 | 3 | NA |
| End date difference (PET - Algorithm, days) [AB] | 1 | 1 | NA | 1 | 1 | 1 | 1 | NA |
| Duration difference (PET - Algorithm, days) [AB] | -2 | -2 | NA | -2 | -2 | -2 | -2 | NA |
| Start date difference (PET - Algorithm, days) [ECT] | 0 | 0 | NA | 0 | 0 | 0 | 0 | NA |
| End date difference (PET - Algorithm, days) [ECT] | 0 | 0 | NA | 0 | 0 | 0 | 0 | NA |
| Duration difference (PET - Algorithm, days) [ECT] | 0 | 0 | NA | 0 | 0 | 0 | 0 | NA |
| Start date difference (PET - Algorithm, days) [LB] | -0.25 | -0.5 | 3.15096357144468 | -5 | -1.75 | 3 | 3 | 8 |
| End date difference (PET - Algorithm, days) [LB] | 0.125 | 1.5 | 3.27053949241231 | -5 | -3 | 3 | 3 | 8 |
| Duration difference (PET - Algorithm, days) [LB] | 0.375 | -1.5 | 5.0972681759098 | -6 | -2.5 | 4 | 8 | 8 |
| Start date difference (PET - Algorithm, days) [PREG] | 2.2 | 3 | 2.51282425057657 | -2 | 0 | 4.5 | 5 | 15 |
| End date difference (PET - Algorithm, days) [PREG] | -1.06666666666667 | -1 | 2.25092573548455 | -5 | -2.5 | 0 | 4 | 15 |
| Duration difference (PET - Algorithm, days) [PREG] | -3.26666666666667 | -2 | 3.65409098851971 | -9 | -6.5 | -1 | 3 | 15 |
| Start date difference (PET - Algorithm, days) [SA] | 1 | 1 | NA | 1 | 1 | 1 | 1 | NA |
| End date difference (PET - Algorithm, days) [SA] | -4 | -4 | NA | -4 | -4 | -4 | -4 | NA |
| Duration difference (PET - Algorithm, days) [SA] | -5 | -5 | NA | -5 | -5 | -5 | -5 | NA |
| Start date difference (PET - Algorithm, days) [SB] | 2 | 2 | NA | 2 | 2 | 2 | 2 | NA |
| End date difference (PET - Algorithm, days) [SB] | 0 | 0 | NA | 0 | 0 | 0 | 0 | NA |
| Duration difference (PET - Algorithm, days) [SB] | -2 | -2 | NA | -2 | -2 | -2 | -2 | NA |
Alignment distribution
The date differences can also be viewed as a binned distribution (histogram), which shows at a glance what fraction of matched episodes are perfectly aligned (bin = 0), close (within ±7 days), or far off (±30+ days). The bins are coloured on a red-yellow-green gradient where green = exact match and red = large discrepancy.
This visualisation is available interactively in the Shiny app’s Alignment tab. It can be filtered by measure (start/end/duration) and stratified by outcome.
Outcome confusion (matched pairs)
Cross-tabulation of PET outcome (concept_id) vs algorithm outcome (LB, SB, AB, SA, etc.) is not included in the summarised result CSV; only aggregate metrics are exported.
Outcome accuracy
Among matched pairs with a mappable algorithm outcome (LB/SB/AB/SA/DELIV → concept_id), the fraction where PET outcome concept_id equals the algorithm-mapped concept_id.
kable(sr_table(res, "outcome_accuracy"), format = "html", caption = "Outcome accuracy (matched pairs)")| variable_level | n_correct | n_total | accuracy |
|---|---|---|---|
| overall | 11 | 11 | 1 |
Outcome by year (same-year pairs)
Among matched pairs in the same year (algorithm start year = PET start year), counts of agreement (e.g. lb_lb, sb_sb) and disagreement (e.g. lb_sb, sb_lb). Used for same-year outcome cross-tabs.
kable(sr_table(res, "outcome_by_year"), format = "html", caption = "Outcome by year (same-year pairs)")| variable_level | overall_equal | overall_diff | lb_lb | lb_miscarriage | lb_ab | lb_sb | lb_unknown | sb_sb | sb_miscarriage | sb_ab | sb_lb | sb_unknown | ab_ab | ab_miscarriage | ab_lb | ab_sb | ab_unknown |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| same_year_pairs | 11 | 16 | 8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Duration summary (all episodes)
kable(sr_table(res, "duration_summary", "source"), format = "html", caption = "Pregnancy duration (days) by source")| source | n | mean | median | sd | min | q25 | q75 | max |
|---|---|---|---|---|---|---|---|---|
| algorithm | 35 | 250.514285714286 | 224 | 277.22568544828 | 21 | 147 | 280 | 1749 |
| pet | 33 | 210.454545454545 | 260 | 97.4211511008681 | 15 | 140 | 280 | 377 |
Summary statistics of episode length (end − start) for all algorithm episodes and all PET episodes.
Duration matched summary
dm <- sr_table(res, "duration_matched_summary", "source")
if (!is.null(dm) && nrow(dm) > 0) {
kable(dm, format = "html", caption = "Duration (matched pairs only)")
} else {
cat("No duration matched summary (no matched pairs).\n")
}| source | n | mean | median | sd | min | q25 | q75 | max |
|---|---|---|---|---|---|---|---|---|
| algorithm | 27 | 196 | 147 | 99.401284622561 | 21 | 147 | 280 | 380 |
| pet | 27 | 193.962962962963 | 150 | 100.511568978777 | 15 | 138.5 | 279.5 | 377 |
Duration statistics for the matched episodes only (algorithm vs PET).
Files written to exportFolder
| File | Content |
|---|---|
pet_comparison_summarised_result.csv |
All comparison metrics in SummarisedResult format (episode counts, protocol summary, person overlap, Venn counts, time overlap, confusion 2x2, PPV/sensitivity, date-difference summary, outcome accuracy, outcome by year, duration summaries). Includes settings (e.g. , , ) for traceability. Use to read and to display. |
log.txt |
Run log (appended if file already exists). |
Using your own PET table
If your CDM already has a PET table
(e.g. omop_cmbd.pregnancy_episode), call:
comparePregnancyIdentifierWithPET(
cdm = your_cdm,
outputFolder = "/path/to/pipeline/output",
exportFolder = "/path/to/comparison/results",
petSchema = "omop_cmbd",
petTable = "pregnancy_episode",
minOverlapDays = 1L,
removeWithinSourceOverlaps = FALSE
)
res <- omopgenerics::importSummarisedResult(file.path("/path/to/comparison/results", "pet_comparison_summarised_result.csv"))Ensure the PET table has at least: person_id,
pregnancy_start_date, pregnancy_end_date,
pregnancy_outcome (concept_id). Gestational length in days
is computed from start and end dates using the database (via ).