Comparing algorithm output with the Pregnancy Extension Table (PET)

Overview

The Pregnancy Extension Table (PET) is an OMOP CDM extension that stores pregnancy episodes (start date, end date, outcome) identified by a separate process (e.g. chart review or another algorithm). The function comparePregnancyIdentifierWithPET() compares the episodes produced by the PregnancyIdentifier pipeline to the PET and writes comparison summaries to CSV files. This vignette describes how to run the comparison using the mock CDM, how matching is done, and what each output contains.

How to run the PET comparison

You need:

Algorithm output: a directory containing final_pregnancy_episodes.rds (from runPregnancyIdentifier()).
A PET table in the same CDM: a table with at least person_id, pregnancy_start_date, pregnancy_end_date, and pregnancy_outcome (concept_id).

Below we run the pipeline with mockPregnancyCdm() (which includes a PET table pregnancy_extension in schema main), then run the comparison.

library(PregnancyIdentifier)
library(CDMConnector)
library(dplyr)
library(tidyr)
library(knitr)
# Helper: get the results table from a summarised_result (handles list with $results or single table)
sr_results <- function(sr) {
  if (is.list(sr) && "results" %in% names(sr) && is.data.frame(sr$results)) {
    sr$results
  } else {
    as.data.frame(sr)
  }
}
# Helper: extract a wide table for one variable from the summarised result
sr_table <- function(sr, var, level_name = "variable_level") {
  tbl <- sr_results(sr)
  d <- dplyr::filter(tbl, .data$variable_name == .env$var)
  if (nrow(d) == 0) return(NULL)
  d <- dplyr::select(d, "variable_level", "estimate_name", "estimate_value")
  wide <- tidyr::pivot_wider(d, names_from = "estimate_name", values_from = "estimate_value")
  if (level_name != "variable_level") wide <- dplyr::rename(wide, !!level_name := "variable_level")
  wide
}

# Directories: pipeline output (episode data), export (comparison results and log)
td <- tempdir()
if (!dir.exists(td)) dir.create(td, recursive = TRUE, showWarnings = FALSE)
outputDir    <- file.path(td, "pet_vignette_pipeline")
exportFolder    <- file.path(td, "pet_vignette_comparison")
dir.create(outputDir, recursive = TRUE, showWarnings = FALSE)
dir.create(exportFolder, recursive = TRUE, showWarnings = FALSE)

# 1) Build mock CDM and run the pipeline (export runs by default to outputDir/export)
cdm <- mockPregnancyCdm()
#> 
#> Download completed!
runPregnancyIdentifier(
  cdm = cdm,
  outputFolder = outputDir,
  outputLogToConsole = FALSE
)

# The mock CDM includes a PET table `pregnancy_extension` in schema `main` with the
# required columns. We run the comparison against it. (Alternatively, you could build
# a PET from the algorithm output as in the `insert-mock-pet` chunk and use
# petTable = "pregnancy_episode".)

# 3) Run the PET comparison (writes summarised result and log to exportFolder)
pet_comparison <- comparePregnancyIdentifierWithPET(
  cdm = cdm,
  outputFolder = outputDir,
  exportFolder = exportFolder,
  petSchema = "main",
  petTable = "pregnancy_extension",
  minOverlapDays = 1L,
  outputLogToConsole = FALSE
)
# Load the written summarised result for display and programmatic use
res <- omopgenerics::importSummarisedResult(file.path(exportFolder, "pet_comparison_summarised_result.csv"))

How matching is done

Episodes are matched by:

Same person: only algorithm and PET episodes from the same person_id are considered.
Overlapping dates: for each (algorithm episode, PET episode) pair, overlap in days is
max(0, min(alg_end, pet_end) - max(alg_start, pet_start) + 1).
Pairs with overlap ≥ minOverlapDays (default 1) are candidate pairs.
One-to-one assignment: within each person, candidate pairs are sorted by overlap (descending). A greedy algorithm assigns each PET episode to at most one algorithm episode and vice versa: it repeatedly picks the pair with the largest overlap among those whose PET and algorithm indices are not yet used. This avoids double-counting and yields consistent Venn and confusion counts.

Optional: if removeWithinSourceOverlaps = TRUE, overlapping episodes within PET and within the algorithm are removed (greedy non-overlapping by start date, max 400 days) before matching, which can reduce many-to-many pairs.

Outputs generated

The function writes a single CSV in SummarisedResult format to () and returns nothing. Re-import it with and display with :

# Display as a gt table (optional: requires visOmopResults)
if (requireNamespace("visOmopResults", quietly = TRUE)) {
  visOmopResults::visOmopTable(
    result = res,
    header = "cdm_name",
    rename = c("Data source" = "cdm_name"),
    hide =  c("result_id", "group_name", "group_level", "strata_name", "strata_level", "pet_comparison"))
}

Variable name	Variable level	Estimate name	Data source
Variable name	Variable level	Estimate name	TestData_P4_C5_002_1
episode_counts	algorithm	n_episodes	35
		n_persons	32
	pet	n_episodes	33
		n_persons	25
person_overlap	raw_person_overlap	n_persons	25
	cohort_person_overlap	n_persons	23
venn_counts	both	n_episodes	27
		n_pet_matched	27
		n_alg_matched	27
	pet_only	n_episodes	6
		n_pet_matched	27
		n_alg_matched	27
	algorithm_only	n_episodes	8
		n_pet_matched	27
		n_alg_matched	27
protocol_summary	overall	total_pet_episodes	33
		total_algorithm_episodes	35
		total_matched_episodes	27
confusion_2x2	TP	count	27
	FN	count	6
	FP	count	8
	TN	count	–
ppv_sensitivity	sensitivity	value	0.82
		numerator	27
		denominator	33
	ppv	value	0.77
		numerator	27
		denominator	35
date_difference_summary	Start date difference (PET - Algorithm, days)	mean	1.37
		median	2.00
		sd	2.73
		min	-5.00
		q25	-0.50
		q75	3.00
		max	5.00
		n_matched	27
	End date difference (PET - Algorithm, days)	mean	-0.67
		median	0.00
		sd	2.54
		min	-5.00
		q25	-3.00
		q75	1.00
		max	4.00
		n_matched	27
	Duration difference (PET - Algorithm, days)	mean	-2.04
		median	-2.00
		sd	4.16
		min	-9.00
		q25	-4.50
		q75	-1.00
		max	8.00
		n_matched	27
date_difference_by_outcome	Start date difference (PET - Algorithm, days) [AB]	mean	3.00
		median	3.00
		sd	–
		min	3.00
		q25	3.00
		q75	3.00
		max	3.00
		n_matched	–
	End date difference (PET - Algorithm, days) [AB]	mean	1.00
		median	1.00
		sd	–
		min	1.00
		q25	1.00
		q75	1.00
		max	1.00
		n_matched	–
	Duration difference (PET - Algorithm, days) [AB]	mean	-2.00
		median	-2.00
		sd	–
		min	-2.00
		q25	-2.00
		q75	-2.00
		max	-2.00
		n_matched	–
	Start date difference (PET - Algorithm, days) [ECT]	mean	0.00
		median	0.00
		sd	–
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
		n_matched	–
	End date difference (PET - Algorithm, days) [ECT]	mean	0.00
		median	0.00
		sd	–
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
		n_matched	–
	Duration difference (PET - Algorithm, days) [ECT]	mean	0.00
		median	0.00
		sd	–
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
		n_matched	–
	Start date difference (PET - Algorithm, days) [LB]	mean	-0.25
		median	-0.50
		sd	3.15
		min	-5.00
		q25	-1.75
		q75	3.00
		max	3.00
		n_matched	8
	End date difference (PET - Algorithm, days) [LB]	mean	0.12
		median	1.50
		sd	3.27
		min	-5.00
		q25	-3.00
		q75	3.00
		max	3.00
		n_matched	8
	Duration difference (PET - Algorithm, days) [LB]	mean	0.38
		median	-1.50
		sd	5.10
		min	-6.00
		q25	-2.50
		q75	4.00
		max	8.00
		n_matched	8
	Start date difference (PET - Algorithm, days) [PREG]	mean	2.20
		median	3.00
		sd	2.51
		min	-2.00
		q25	0.00
		q75	4.50
		max	5.00
		n_matched	15
	End date difference (PET - Algorithm, days) [PREG]	mean	-1.07
		median	-1.00
		sd	2.25
		min	-5.00
		q25	-2.50
		q75	0.00
		max	4.00
		n_matched	15
	Duration difference (PET - Algorithm, days) [PREG]	mean	-3.27
		median	-2.00
		sd	3.65
		min	-9.00
		q25	-6.50
		q75	-1.00
		max	3.00
		n_matched	15
	Start date difference (PET - Algorithm, days) [SA]	mean	1.00
		median	1.00
		sd	–
		min	1.00
		q25	1.00
		q75	1.00
		max	1.00
		n_matched	–
	End date difference (PET - Algorithm, days) [SA]	mean	-4.00
		median	-4.00
		sd	–
		min	-4.00
		q25	-4.00
		q75	-4.00
		max	-4.00
		n_matched	–
	Duration difference (PET - Algorithm, days) [SA]	mean	-5.00
		median	-5.00
		sd	–
		min	-5.00
		q25	-5.00
		q75	-5.00
		max	-5.00
		n_matched	–
	Start date difference (PET - Algorithm, days) [SB]	mean	2.00
		median	2.00
		sd	–
		min	2.00
		q25	2.00
		q75	2.00
		max	2.00
		n_matched	–
	End date difference (PET - Algorithm, days) [SB]	mean	0.00
		median	0.00
		sd	–
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
		n_matched	–
	Duration difference (PET - Algorithm, days) [SB]	mean	-2.00
		median	-2.00
		sd	–
		min	-2.00
		q25	-2.00
		q75	-2.00
		max	-2.00
		n_matched	–
date_difference_distribution	duration_diff::≤ -30	n	–
	duration_diff::-29 to -15	n	–
	duration_diff::-14 to -8	n	–
	duration_diff::-7 to -1	n	18
	duration_diff::0	n	–
	duration_diff::1 to 7	n	–
	duration_diff::8 to 14	n	–
	duration_diff::15 to 29	n	–
	duration_diff::≥ 30	n	–
	end_diff::≤ -30	n	–
	end_diff::-29 to -15	n	–
	end_diff::-14 to -8	n	–
	end_diff::-7 to -1	n	13
	end_diff::0	n	5
	end_diff::1 to 7	n	9
	end_diff::8 to 14	n	–
	end_diff::15 to 29	n	–
	end_diff::≥ 30	n	–
	start_diff::≤ -30	n	–
	start_diff::-29 to -15	n	–
	start_diff::-14 to -8	n	–
	start_diff::-7 to -1	n	7
	start_diff::0	n	–
	start_diff::1 to 7	n	16
	start_diff::8 to 14	n	–
	start_diff::15 to 29	n	–
	start_diff::≥ 30	n	–
date_difference_distribution_by_outcome	duration_diff::AB::≤ -30	n	–
	duration_diff::AB::-29 to -15	n	–
	duration_diff::AB::-14 to -8	n	–
	duration_diff::AB::-7 to -1	n	–
	duration_diff::AB::0	n	–
	duration_diff::AB::1 to 7	n	–
	duration_diff::AB::8 to 14	n	–
	duration_diff::AB::15 to 29	n	–
	duration_diff::AB::≥ 30	n	–
	end_diff::AB::≤ -30	n	–
	end_diff::AB::-29 to -15	n	–
	end_diff::AB::-14 to -8	n	–
	end_diff::AB::-7 to -1	n	–
	end_diff::AB::0	n	–
	end_diff::AB::1 to 7	n	–
	end_diff::AB::8 to 14	n	–
	end_diff::AB::15 to 29	n	–
	end_diff::AB::≥ 30	n	–
	start_diff::AB::≤ -30	n	–
	start_diff::AB::-29 to -15	n	–
	start_diff::AB::-14 to -8	n	–
	start_diff::AB::-7 to -1	n	–
	start_diff::AB::0	n	–
	start_diff::AB::1 to 7	n	–
	start_diff::AB::8 to 14	n	–
	start_diff::AB::15 to 29	n	–
	start_diff::AB::≥ 30	n	–
	duration_diff::ECT::≤ -30	n	–
	duration_diff::ECT::-29 to -15	n	–
	duration_diff::ECT::-14 to -8	n	–
	duration_diff::ECT::-7 to -1	n	–
	duration_diff::ECT::0	n	–
	duration_diff::ECT::1 to 7	n	–
	duration_diff::ECT::8 to 14	n	–
	duration_diff::ECT::15 to 29	n	–
	duration_diff::ECT::≥ 30	n	–
	end_diff::ECT::≤ -30	n	–
	end_diff::ECT::-29 to -15	n	–
	end_diff::ECT::-14 to -8	n	–
	end_diff::ECT::-7 to -1	n	–
	end_diff::ECT::0	n	–
	end_diff::ECT::1 to 7	n	–
	end_diff::ECT::8 to 14	n	–
	end_diff::ECT::15 to 29	n	–
	end_diff::ECT::≥ 30	n	–
	start_diff::ECT::≤ -30	n	–
	start_diff::ECT::-29 to -15	n	–
	start_diff::ECT::-14 to -8	n	–
	start_diff::ECT::-7 to -1	n	–
	start_diff::ECT::0	n	–
	start_diff::ECT::1 to 7	n	–
	start_diff::ECT::8 to 14	n	–
	start_diff::ECT::15 to 29	n	–
	start_diff::ECT::≥ 30	n	–
	duration_diff::LB::≤ -30	n	–
	duration_diff::LB::-29 to -15	n	–
	duration_diff::LB::-14 to -8	n	–
	duration_diff::LB::-7 to -1	n	5
	duration_diff::LB::0	n	–
	duration_diff::LB::1 to 7	n	–
	duration_diff::LB::8 to 14	n	–
	duration_diff::LB::15 to 29	n	–
	duration_diff::LB::≥ 30	n	–
	end_diff::LB::≤ -30	n	–
	end_diff::LB::-29 to -15	n	–
	end_diff::LB::-14 to -8	n	–
	end_diff::LB::-7 to -1	n	–
	end_diff::LB::0	n	–
	end_diff::LB::1 to 7	n	5
	end_diff::LB::8 to 14	n	–
	end_diff::LB::15 to 29	n	–
	end_diff::LB::≥ 30	n	–
	start_diff::LB::≤ -30	n	–
	start_diff::LB::-29 to -15	n	–
	start_diff::LB::-14 to -8	n	–
	start_diff::LB::-7 to -1	n	–
	start_diff::LB::0	n	–
	start_diff::LB::1 to 7	n	–
	start_diff::LB::8 to 14	n	–
	start_diff::LB::15 to 29	n	–
	start_diff::LB::≥ 30	n	–
	duration_diff::PREG::≤ -30	n	–
	duration_diff::PREG::-29 to -15	n	–
	duration_diff::PREG::-14 to -8	n	–
	duration_diff::PREG::-7 to -1	n	10
	duration_diff::PREG::0	n	–
	duration_diff::PREG::1 to 7	n	–
	duration_diff::PREG::8 to 14	n	–
	duration_diff::PREG::15 to 29	n	–
	duration_diff::PREG::≥ 30	n	–
	end_diff::PREG::≤ -30	n	–
	end_diff::PREG::-29 to -15	n	–
	end_diff::PREG::-14 to -8	n	–
	end_diff::PREG::-7 to -1	n	9
	end_diff::PREG::0	n	–
	end_diff::PREG::1 to 7	n	–
	end_diff::PREG::8 to 14	n	–
	end_diff::PREG::15 to 29	n	–
	end_diff::PREG::≥ 30	n	–
	start_diff::PREG::≤ -30	n	–
	start_diff::PREG::-29 to -15	n	–
	start_diff::PREG::-14 to -8	n	–
	start_diff::PREG::-7 to -1	n	–
	start_diff::PREG::0	n	–
	start_diff::PREG::1 to 7	n	10
	start_diff::PREG::8 to 14	n	–
	start_diff::PREG::15 to 29	n	–
	start_diff::PREG::≥ 30	n	–
	duration_diff::SA::≤ -30	n	–
	duration_diff::SA::-29 to -15	n	–
	duration_diff::SA::-14 to -8	n	–
	duration_diff::SA::-7 to -1	n	–
	duration_diff::SA::0	n	–
	duration_diff::SA::1 to 7	n	–
	duration_diff::SA::8 to 14	n	–
	duration_diff::SA::15 to 29	n	–
	duration_diff::SA::≥ 30	n	–
	end_diff::SA::≤ -30	n	–
	end_diff::SA::-29 to -15	n	–
	end_diff::SA::-14 to -8	n	–
	end_diff::SA::-7 to -1	n	–
	end_diff::SA::0	n	–
	end_diff::SA::1 to 7	n	–
	end_diff::SA::8 to 14	n	–
	end_diff::SA::15 to 29	n	–
	end_diff::SA::≥ 30	n	–
	start_diff::SA::≤ -30	n	–
	start_diff::SA::-29 to -15	n	–
	start_diff::SA::-14 to -8	n	–
	start_diff::SA::-7 to -1	n	–
	start_diff::SA::0	n	–
	start_diff::SA::1 to 7	n	–
	start_diff::SA::8 to 14	n	–
	start_diff::SA::15 to 29	n	–
	start_diff::SA::≥ 30	n	–
	duration_diff::SB::≤ -30	n	–
	duration_diff::SB::-29 to -15	n	–
	duration_diff::SB::-14 to -8	n	–
	duration_diff::SB::-7 to -1	n	–
	duration_diff::SB::0	n	–
	duration_diff::SB::1 to 7	n	–
	duration_diff::SB::8 to 14	n	–
	duration_diff::SB::15 to 29	n	–
	duration_diff::SB::≥ 30	n	–
	end_diff::SB::≤ -30	n	–
	end_diff::SB::-29 to -15	n	–
	end_diff::SB::-14 to -8	n	–
	end_diff::SB::-7 to -1	n	–
	end_diff::SB::0	n	–
	end_diff::SB::1 to 7	n	–
	end_diff::SB::8 to 14	n	–
	end_diff::SB::15 to 29	n	–
	end_diff::SB::≥ 30	n	–
	start_diff::SB::≤ -30	n	–
	start_diff::SB::-29 to -15	n	–
	start_diff::SB::-14 to -8	n	–
	start_diff::SB::-7 to -1	n	–
	start_diff::SB::0	n	–
	start_diff::SB::1 to 7	n	–
	start_diff::SB::8 to 14	n	–
	start_diff::SB::15 to 29	n	–
	start_diff::SB::≥ 30	n	–
outcome_accuracy	overall	n_correct	11
		n_total	11
		accuracy	1.00
outcome_by_year	same_year_pairs	overall_equal	11
		overall_diff	16
		lb_lb	8
		lb_miscarriage	–
		lb_ab	–
		lb_sb	–
		lb_unknown	–
		sb_sb	–
		sb_miscarriage	–
		sb_ab	–
		sb_lb	–
		sb_unknown	–
		ab_ab	–
		ab_miscarriage	–
		ab_lb	–
		ab_sb	–
		ab_unknown	–
duration_summary	algorithm	n	35
		mean	250.51
		median	224.00
		sd	277.23
		min	21.00
		q25	147.00
		q75	280.00
		max	1,749.00
	pet	n	33
		mean	210.45
		median	260.00
		sd	97.42
		min	15.00
		q25	140.00
		q75	280.00
		max	377.00
duration_matched_summary	algorithm	n	27
		mean	196.00
		median	147.00
		sd	99.40
		min	21.00
		q25	147.00
		q75	280.00
		max	380.00
	pet	n	27
		mean	193.96
		median	150.00
		sd	100.51
		min	15.00
		q25	138.50
		q75	279.50
		max	377.00
group_gestational_time	matched	n	27
		mean	196.00
		median	147.00
		sd	99.40
		min	21.00
		q25	147.00
		q75	280.00
		max	380.00
	algorithm_only	n	8
		mean	434.50
		median	287.50
		sd	535.20
		min	147.00
		q25	212.50
		q75	303.75
		max	1,749.00
	pet_only	n	6
		mean	284.67
		median	280.00
		sd	11.43
		min	280.00
		q25	280.00
		q75	280.00
		max	308.00
group_outcome	matched:AB	n	–
		pct	3.70
	matched:ECT	n	–
		pct	3.70
	matched:LB	n	8
		pct	29.63
	matched:PREG	n	15
		pct	55.56
	matched:SA	n	–
		pct	3.70
	matched:SB	n	–
		pct	3.70
	algorithm_only:DELIV	n	–
		pct	12.50
	algorithm_only:LB	n	–
		pct	12.50
	algorithm_only:PREG	n	6
		pct	75.00
	pet_only:LB	n	6
		pct	100.00
group_source	matched:both	n	24
		pct	88.89
	matched:hip_only	n	–
		pct	3.70
	matched:pps_only	n	–
		pct	7.41
	algorithm_only:both	n	6
		pct	75.00
	algorithm_only:hip_only	n	–
		pct	25.00
person_venn_counts	both	n_persons	25
		n_alg_episodes	28
		n_pet_episodes	33
	algorithm_only	n_persons	7
		n_episodes	7
	pet_only	n_persons	–
		n_episodes	–
person_episodes_per_person	both:algorithm	mean	1.12
		median	1.00
		sd	0.33
		min	1.00
		q25	1.00
		q75	1.00
		max	2.00
	both:pet	mean	1.32
		median	1.00
		sd	0.56
		min	1.00
		q25	1.00
		q75	2.00
		max	3.00
	algorithm_only	mean	1.00
		median	1.00
		sd	0.00
		min	1.00
		q25	1.00
		q75	1.00
		max	1.00
	pet_only	mean	–
		median	–
		sd	–
		min	–
		q25	–
		q75	–
		max	–
pet_only_hip_coverage	overall	n_total	6
		n_with_hip	–
		n_without_hip	6
		pct_with_hip	0.00
pet_only_pps_coverage	overall	n_with_pps	–
		n_without_pps	6
		pct_with_pps	0.00
pet_only_any_record_coverage	overall	n_with_any	–
		n_without_any	6
		pct_with_any	0.00
pet_only_hip_record_count	overall	n	6
		mean	0.00
		median	0.00
		sd	0.00
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
pet_only_pps_record_count	overall	n	6
		mean	0.00
		median	0.00
		sd	0.00
		min	0.00
		q25	0.00
		q75	0.00
		max	0.00
delivery_mode	matched:2014	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2015	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2017	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2018	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2019	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2021	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2023	n	11
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:2024	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	matched:overall	n	27
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	algorithm_only:2015	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	algorithm_only:2018	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	algorithm_only:2020	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	algorithm_only:2023	n	–
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–
	algorithm_only:overall	n	8
		n_cesarean	–
		n_vaginal	–
		n_known	–
		pct_cesarean	–
		pct_vaginal	–

The summarised result is in long format: each row has , , , and . The helper used below extracts and pivots one variable into a wide table for display. The following sections describe each metric.

Episode and person counts

kable(sr_table(res, "episode_counts", "source"), format = "html", caption = "Episode counts: algorithm vs PET")

Episode counts: algorithm vs PET
source	n_episodes	n_persons
algorithm	35	32
pet	33	25

Number of episodes and distinct persons in the algorithm output and in the PET table.

Protocol summary

kable(sr_table(res, "protocol_summary"), format = "html", caption = "Protocol summary (for reporting)")

Protocol summary (for reporting)
variable_level	total_pet_episodes	total_algorithm_episodes	total_matched_episodes
overall	33	35	27

Totals and number of matched episodes (one-to-one pairs).

Person overlap

kable(sr_table(res, "person_overlap", "metric"), format = "html", caption = "Person overlap")

Person overlap
metric	n_persons
raw_person_overlap	25
cohort_person_overlap	23

raw_person_overlap: distinct persons with at least one PET episode and one algorithm episode.
cohort_person_overlap: same, but after filtering both sources to gestation 0–308 days and end ≥ start.

Venn counts (matched / PET-only / algorithm-only)

kable(sr_table(res, "venn_counts", "category"), format = "html", caption = "Venn counts (one-to-one matching)")

Venn counts (one-to-one matching)
category	n_episodes	n_pet_matched	n_alg_matched
both	27	27	27
pet_only	6	27	27
algorithm_only	8	27	27

both: number of matched episode pairs.
pet_only: PET episodes with no matched algorithm episode.
algorithm_only: algorithm episodes with no matched PET episode.

2×2 confusion matrix (PET as reference)

kable(sr_table(res, "confusion_2x2", "cell"), format = "html", caption = "2×2 confusion matrix (PET = reference)")

2×2 confusion matrix (PET = reference)
cell	count
TP	27
FN	6
FP	8
TN	NA

TP: PET episode has a matched algorithm episode.
FN: PET episode has no match (algorithm “miss”).
FP: Algorithm episode has no match (algorithm “extra”).
TN: Not defined at episode level (no negative population).

Sensitivity, PPV, NPV

kable(sr_table(res, "ppv_sensitivity", "metric"), format = "html", caption = "Sensitivity, specificity, PPV, NPV")

Sensitivity, specificity, PPV, NPV
metric	value	numerator	denominator
sensitivity	0.818181818181818	27	33
ppv	0.771428571428571	27	35

Sensitivity = TP / (TP + FN): fraction of PET episodes that have a match.
PPV = TP / (TP + FP): fraction of algorithm episodes that match a PET episode.
Specificity and NPV use TN and are NA at episode level.

Date differences (matched pairs only)

For each matched pair, date differences are PET − algorithm (positive = PET is later, i.e. algorithm starts/ends too early). Three measures are reported:

Start date difference: PET start − algorithm start. Positive = algorithm starts too early; negative = algorithm starts too late.
End date difference: PET end − algorithm end. Same sign convention.
Duration difference: PET duration − algorithm duration. Positive = algorithm episodes are shorter; negative = longer.

dd_summary <- sr_table(res, "date_difference_summary")
if (!is.null(dd_summary) && nrow(dd_summary) > 0) {
  kable(dd_summary, format = "html", caption = "Date difference summary (PET − algorithm, days)")
} else {
  cat("No matched pairs; date differences not computed.\n")
}

Date difference summary (PET − algorithm, days)
variable_level	mean	median	sd	min	q25	q75	max	n_matched
Start date difference (PET - Algorithm, days)	1.37037037037037	2	2.73366685478214	-5	-0.5	3	5	27
End date difference (PET - Algorithm, days)	-0.666666666666667	0	2.54195563720897	-5	-3	1	4	27
Duration difference (PET - Algorithm, days)	-2.03703703703704	-2	4.16470039075194	-9	-4.5	-1	8	27

Date differences by outcome

The same start/end/duration differences are also reported stratified by algorithm outcome category (LB, SB, SA, AB, PREG, etc.), since different outcomes have very different expected durations.

dd_by_outcome <- sr_table(res, "date_difference_by_outcome")
if (!is.null(dd_by_outcome) && nrow(dd_by_outcome) > 0) {
  kable(dd_by_outcome, format = "html", caption = "Date differences by algorithm outcome (PET − algorithm, days)")
} else {
  cat("No outcome-stratified date differences available.\n")
}

Date differences by algorithm outcome (PET − algorithm, days)
variable_level	mean	median	sd	min	q25	q75	max	n_matched
Start date difference (PET - Algorithm, days) [AB]	3	3	NA	3	3	3	3	NA
End date difference (PET - Algorithm, days) [AB]	1	1	NA	1	1	1	1	NA
Duration difference (PET - Algorithm, days) [AB]	-2	-2	NA	-2	-2	-2	-2	NA
Start date difference (PET - Algorithm, days) [ECT]	0	0	NA	0	0	0	0	NA
End date difference (PET - Algorithm, days) [ECT]	0	0	NA	0	0	0	0	NA
Duration difference (PET - Algorithm, days) [ECT]	0	0	NA	0	0	0	0	NA
Start date difference (PET - Algorithm, days) [LB]	-0.25	-0.5	3.15096357144468	-5	-1.75	3	3	8
End date difference (PET - Algorithm, days) [LB]	0.125	1.5	3.27053949241231	-5	-3	3	3	8
Duration difference (PET - Algorithm, days) [LB]	0.375	-1.5	5.0972681759098	-6	-2.5	4	8	8
Start date difference (PET - Algorithm, days) [PREG]	2.2	3	2.51282425057657	-2	0	4.5	5	15
End date difference (PET - Algorithm, days) [PREG]	-1.06666666666667	-1	2.25092573548455	-5	-2.5	0	4	15
Duration difference (PET - Algorithm, days) [PREG]	-3.26666666666667	-2	3.65409098851971	-9	-6.5	-1	3	15
Start date difference (PET - Algorithm, days) [SA]	1	1	NA	1	1	1	1	NA
End date difference (PET - Algorithm, days) [SA]	-4	-4	NA	-4	-4	-4	-4	NA
Duration difference (PET - Algorithm, days) [SA]	-5	-5	NA	-5	-5	-5	-5	NA
Start date difference (PET - Algorithm, days) [SB]	2	2	NA	2	2	2	2	NA
End date difference (PET - Algorithm, days) [SB]	0	0	NA	0	0	0	0	NA
Duration difference (PET - Algorithm, days) [SB]	-2	-2	NA	-2	-2	-2	-2	NA

Alignment distribution

The date differences can also be viewed as a binned distribution (histogram), which shows at a glance what fraction of matched episodes are perfectly aligned (bin = 0), close (within ±7 days), or far off (±30+ days). The bins are coloured on a red-yellow-green gradient where green = exact match and red = large discrepancy.

This visualisation is available interactively in the Shiny app’s Alignment tab. It can be filtered by measure (start/end/duration) and stratified by outcome.

Outcome confusion (matched pairs)

Cross-tabulation of PET outcome (concept_id) vs algorithm outcome (LB, SB, AB, SA, etc.) is not included in the summarised result CSV; only aggregate metrics are exported.

Outcome accuracy

Among matched pairs with a mappable algorithm outcome (LB/SB/AB/SA/DELIV → concept_id), the fraction where PET outcome concept_id equals the algorithm-mapped concept_id.

kable(sr_table(res, "outcome_accuracy"), format = "html", caption = "Outcome accuracy (matched pairs)")

Outcome accuracy (matched pairs)
variable_level	n_correct	n_total	accuracy
overall	11	11	1

Outcome by year (same-year pairs)

Among matched pairs in the same year (algorithm start year = PET start year), counts of agreement (e.g. lb_lb, sb_sb) and disagreement (e.g. lb_sb, sb_lb). Used for same-year outcome cross-tabs.

kable(sr_table(res, "outcome_by_year"), format = "html", caption = "Outcome by year (same-year pairs)")

Outcome by year (same-year pairs)
variable_level	overall_equal	overall_diff	lb_lb	lb_miscarriage	lb_ab	lb_sb	lb_unknown	sb_sb	sb_miscarriage	sb_ab	sb_lb	sb_unknown	ab_ab	ab_miscarriage	ab_lb	ab_sb	ab_unknown
same_year_pairs	11	16	8	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Duration summary (all episodes)

kable(sr_table(res, "duration_summary", "source"), format = "html", caption = "Pregnancy duration (days) by source")

Pregnancy duration (days) by source
source	n	mean	median	sd	min	q25	q75	max
algorithm	35	250.514285714286	224	277.22568544828	21	147	280	1749
pet	33	210.454545454545	260	97.4211511008681	15	140	280	377

Summary statistics of episode length (end − start) for all algorithm episodes and all PET episodes.

Duration matched summary

dm <- sr_table(res, "duration_matched_summary", "source")
if (!is.null(dm) && nrow(dm) > 0) {
  kable(dm, format = "html", caption = "Duration (matched pairs only)")
} else {
  cat("No duration matched summary (no matched pairs).\n")
}

Duration (matched pairs only)
source	n	mean	median	sd	min	q25	q75	max
algorithm	27	196	147	99.401284622561	21	147	280	380
pet	27	193.962962962963	150	100.511568978777	15	138.5	279.5	377

Duration statistics for the matched episodes only (algorithm vs PET).

Files written to `exportFolder`

File	Content
`pet_comparison_summarised_result.csv`	All comparison metrics in SummarisedResult format (episode counts, protocol summary, person overlap, Venn counts, time overlap, confusion 2x2, PPV/sensitivity, date-difference summary, outcome accuracy, outcome by year, duration summaries). Includes settings (e.g. , , ) for traceability. Use to read and to display.
`log.txt`	Run log (appended if file already exists).

Using your own PET table

If your CDM already has a PET table (e.g. omop_cmbd.pregnancy_episode), call:

comparePregnancyIdentifierWithPET(
  cdm = your_cdm,
  outputFolder = "/path/to/pipeline/output",
  exportFolder = "/path/to/comparison/results",
  petSchema = "omop_cmbd",
  petTable = "pregnancy_episode",
  minOverlapDays = 1L,
  removeWithinSourceOverlaps = FALSE
)
res <- omopgenerics::importSummarisedResult(file.path("/path/to/comparison/results", "pet_comparison_summarised_result.csv"))

Ensure the PET table has at least: person_id, pregnancy_start_date, pregnancy_end_date, pregnancy_outcome (concept_id). Gestational length in days is computed from start and end dates using the database (via ).

# Clean up temp dirs (optional; disconnect from cdm when done in your session)
unlink(outputDir, recursive = TRUE)
unlink(exportFolder, recursive = TRUE)