Skip to contents

Overview

The Pregnancy Extension Table (PET) is an OMOP CDM extension that stores pregnancy episodes (start date, end date, outcome) identified by a separate process (e.g. chart review or another algorithm). The function comparePregnancyIdentifierWithPET() compares the episodes produced by the PregnancyIdentifier pipeline to the PET and writes comparison summaries to CSV files. This vignette describes how to run the comparison using the mock CDM, how matching is done, and what each output contains.

How to run the PET comparison

You need:

  1. Algorithm output: a directory containing final_pregnancy_episodes.rds (from runPregnancyIdentifier()).
  2. A PET table in the same CDM: a table with at least person_id, pregnancy_start_date, pregnancy_end_date, and pregnancy_outcome (concept_id).

Below we run the pipeline with mockPregnancyCdm() (which includes a PET table pregnancy_extension in schema main), then run the comparison.

library(PregnancyIdentifier)
library(CDMConnector)
library(dplyr)
library(tidyr)
library(knitr)
# Helper: get the results table from a summarised_result (handles list with $results or single table)
sr_results <- function(sr) {
  if (is.list(sr) && "results" %in% names(sr) && is.data.frame(sr$results)) {
    sr$results
  } else {
    as.data.frame(sr)
  }
}
# Helper: extract a wide table for one variable from the summarised result
sr_table <- function(sr, var, level_name = "variable_level") {
  tbl <- sr_results(sr)
  d <- dplyr::filter(tbl, .data$variable_name == .env$var)
  if (nrow(d) == 0) return(NULL)
  d <- dplyr::select(d, "variable_level", "estimate_name", "estimate_value")
  wide <- tidyr::pivot_wider(d, names_from = "estimate_name", values_from = "estimate_value")
  if (level_name != "variable_level") wide <- dplyr::rename(wide, !!level_name := "variable_level")
  wide
}
# Directories: pipeline output (episode data), export (comparison results and log)
td <- tempdir()
if (!dir.exists(td)) dir.create(td, recursive = TRUE, showWarnings = FALSE)
outputDir    <- file.path(td, "pet_vignette_pipeline")
exportFolder    <- file.path(td, "pet_vignette_comparison")
dir.create(outputDir, recursive = TRUE, showWarnings = FALSE)
dir.create(exportFolder, recursive = TRUE, showWarnings = FALSE)

# 1) Build mock CDM and run the pipeline (export runs by default to outputDir/export)
cdm <- mockPregnancyCdm()
#> 
#> Download completed!
runPregnancyIdentifier(
  cdm = cdm,
  outputFolder = outputDir,
  outputLogToConsole = FALSE
)
# The mock CDM includes a PET table `pregnancy_extension` in schema `main` with the
# required columns. We run the comparison against it. (Alternatively, you could build
# a PET from the algorithm output as in the `insert-mock-pet` chunk and use
# petTable = "pregnancy_episode".)

# 3) Run the PET comparison (writes summarised result and log to exportFolder)
pet_comparison <- comparePregnancyIdentifierWithPET(
  cdm = cdm,
  outputFolder = outputDir,
  exportFolder = exportFolder,
  petSchema = "main",
  petTable = "pregnancy_extension",
  minOverlapDays = 1L,
  outputLogToConsole = FALSE
)
# Load the written summarised result for display and programmatic use
res <- omopgenerics::importSummarisedResult(file.path(exportFolder, "pet_comparison_summarised_result.csv"))

How matching is done

Episodes are matched by:

  1. Same person: only algorithm and PET episodes from the same person_id are considered.
  2. Overlapping dates: for each (algorithm episode, PET episode) pair, overlap in days is
    max(0, min(alg_end, pet_end) - max(alg_start, pet_start) + 1).
    Pairs with overlap minOverlapDays (default 1) are candidate pairs.
  3. One-to-one assignment: within each person, candidate pairs are sorted by overlap (descending). A greedy algorithm assigns each PET episode to at most one algorithm episode and vice versa: it repeatedly picks the pair with the largest overlap among those whose PET and algorithm indices are not yet used. This avoids double-counting and yields consistent Venn and confusion counts.

Optional: if removeWithinSourceOverlaps = TRUE, overlapping episodes within PET and within the algorithm are removed (greedy non-overlapping by start date, max 400 days) before matching, which can reduce many-to-many pairs.

Outputs generated

The function writes a single CSV in SummarisedResult format to () and returns nothing. Re-import it with and display with :

# Display as a gt table (optional: requires visOmopResults)
if (requireNamespace("visOmopResults", quietly = TRUE)) {
  visOmopResults::visOmopTable(
    result = res,
    header = "cdm_name",
    rename = c("Data source" = "cdm_name"),
    hide =  c("result_id", "group_name", "group_level", "strata_name", "strata_level", "pet_comparison"))
}
Variable name Variable level Estimate name
Data source
TestData_P4_C5_002_1
episode_counts algorithm n_episodes 35
n_persons 32
pet n_episodes 33
n_persons 25
person_overlap raw_person_overlap n_persons 25
cohort_person_overlap n_persons 23
venn_counts both n_episodes 27
n_pet_matched 27
n_alg_matched 27
pet_only n_episodes 6
n_pet_matched 27
n_alg_matched 27
algorithm_only n_episodes 8
n_pet_matched 27
n_alg_matched 27
protocol_summary overall total_pet_episodes 33
total_algorithm_episodes 35
total_matched_episodes 27
confusion_2x2 TP count 27
FN count 6
FP count 8
TN count
ppv_sensitivity sensitivity value 0.82
numerator 27
denominator 33
ppv value 0.77
numerator 27
denominator 35
date_difference_summary Start date difference (PET - Algorithm, days) mean 1.37
median 2.00
sd 2.73
min -5.00
q25 -0.50
q75 3.00
max 5.00
n_matched 27
End date difference (PET - Algorithm, days) mean -0.67
median 0.00
sd 2.54
min -5.00
q25 -3.00
q75 1.00
max 4.00
n_matched 27
Duration difference (PET - Algorithm, days) mean -2.04
median -2.00
sd 4.16
min -9.00
q25 -4.50
q75 -1.00
max 8.00
n_matched 27
date_difference_by_outcome Start date difference (PET - Algorithm, days) [AB] mean 3.00
median 3.00
sd
min 3.00
q25 3.00
q75 3.00
max 3.00
n_matched
End date difference (PET - Algorithm, days) [AB] mean 1.00
median 1.00
sd
min 1.00
q25 1.00
q75 1.00
max 1.00
n_matched
Duration difference (PET - Algorithm, days) [AB] mean -2.00
median -2.00
sd
min -2.00
q25 -2.00
q75 -2.00
max -2.00
n_matched
Start date difference (PET - Algorithm, days) [ECT] mean 0.00
median 0.00
sd
min 0.00
q25 0.00
q75 0.00
max 0.00
n_matched
End date difference (PET - Algorithm, days) [ECT] mean 0.00
median 0.00
sd
min 0.00
q25 0.00
q75 0.00
max 0.00
n_matched
Duration difference (PET - Algorithm, days) [ECT] mean 0.00
median 0.00
sd
min 0.00
q25 0.00
q75 0.00
max 0.00
n_matched
Start date difference (PET - Algorithm, days) [LB] mean -0.25
median -0.50
sd 3.15
min -5.00
q25 -1.75
q75 3.00
max 3.00
n_matched 8
End date difference (PET - Algorithm, days) [LB] mean 0.12
median 1.50
sd 3.27
min -5.00
q25 -3.00
q75 3.00
max 3.00
n_matched 8
Duration difference (PET - Algorithm, days) [LB] mean 0.38
median -1.50
sd 5.10
min -6.00
q25 -2.50
q75 4.00
max 8.00
n_matched 8
Start date difference (PET - Algorithm, days) [PREG] mean 2.20
median 3.00
sd 2.51
min -2.00
q25 0.00
q75 4.50
max 5.00
n_matched 15
End date difference (PET - Algorithm, days) [PREG] mean -1.07
median -1.00
sd 2.25
min -5.00
q25 -2.50
q75 0.00
max 4.00
n_matched 15
Duration difference (PET - Algorithm, days) [PREG] mean -3.27
median -2.00
sd 3.65
min -9.00
q25 -6.50
q75 -1.00
max 3.00
n_matched 15
Start date difference (PET - Algorithm, days) [SA] mean 1.00
median 1.00
sd
min 1.00
q25 1.00
q75 1.00
max 1.00
n_matched
End date difference (PET - Algorithm, days) [SA] mean -4.00
median -4.00
sd
min -4.00
q25 -4.00
q75 -4.00
max -4.00
n_matched
Duration difference (PET - Algorithm, days) [SA] mean -5.00
median -5.00
sd
min -5.00
q25 -5.00
q75 -5.00
max -5.00
n_matched
Start date difference (PET - Algorithm, days) [SB] mean 2.00
median 2.00
sd
min 2.00
q25 2.00
q75 2.00
max 2.00
n_matched
End date difference (PET - Algorithm, days) [SB] mean 0.00
median 0.00
sd
min 0.00
q25 0.00
q75 0.00
max 0.00
n_matched
Duration difference (PET - Algorithm, days) [SB] mean -2.00
median -2.00
sd
min -2.00
q25 -2.00
q75 -2.00
max -2.00
n_matched
date_difference_distribution duration_diff::≤ -30 n
duration_diff::-29 to -15 n
duration_diff::-14 to -8 n
duration_diff::-7 to -1 n 18
duration_diff::0 n
duration_diff::1 to 7 n
duration_diff::8 to 14 n
duration_diff::15 to 29 n
duration_diff::≥ 30 n
end_diff::≤ -30 n
end_diff::-29 to -15 n
end_diff::-14 to -8 n
end_diff::-7 to -1 n 13
end_diff::0 n 5
end_diff::1 to 7 n 9
end_diff::8 to 14 n
end_diff::15 to 29 n
end_diff::≥ 30 n
start_diff::≤ -30 n
start_diff::-29 to -15 n
start_diff::-14 to -8 n
start_diff::-7 to -1 n 7
start_diff::0 n
start_diff::1 to 7 n 16
start_diff::8 to 14 n
start_diff::15 to 29 n
start_diff::≥ 30 n
date_difference_distribution_by_outcome duration_diff::AB::≤ -30 n
duration_diff::AB::-29 to -15 n
duration_diff::AB::-14 to -8 n
duration_diff::AB::-7 to -1 n
duration_diff::AB::0 n
duration_diff::AB::1 to 7 n
duration_diff::AB::8 to 14 n
duration_diff::AB::15 to 29 n
duration_diff::AB::≥ 30 n
end_diff::AB::≤ -30 n
end_diff::AB::-29 to -15 n
end_diff::AB::-14 to -8 n
end_diff::AB::-7 to -1 n
end_diff::AB::0 n
end_diff::AB::1 to 7 n
end_diff::AB::8 to 14 n
end_diff::AB::15 to 29 n
end_diff::AB::≥ 30 n
start_diff::AB::≤ -30 n
start_diff::AB::-29 to -15 n
start_diff::AB::-14 to -8 n
start_diff::AB::-7 to -1 n
start_diff::AB::0 n
start_diff::AB::1 to 7 n
start_diff::AB::8 to 14 n
start_diff::AB::15 to 29 n
start_diff::AB::≥ 30 n
duration_diff::ECT::≤ -30 n
duration_diff::ECT::-29 to -15 n
duration_diff::ECT::-14 to -8 n
duration_diff::ECT::-7 to -1 n
duration_diff::ECT::0 n
duration_diff::ECT::1 to 7 n
duration_diff::ECT::8 to 14 n
duration_diff::ECT::15 to 29 n
duration_diff::ECT::≥ 30 n
end_diff::ECT::≤ -30 n
end_diff::ECT::-29 to -15 n
end_diff::ECT::-14 to -8 n
end_diff::ECT::-7 to -1 n
end_diff::ECT::0 n
end_diff::ECT::1 to 7 n
end_diff::ECT::8 to 14 n
end_diff::ECT::15 to 29 n
end_diff::ECT::≥ 30 n
start_diff::ECT::≤ -30 n
start_diff::ECT::-29 to -15 n
start_diff::ECT::-14 to -8 n
start_diff::ECT::-7 to -1 n
start_diff::ECT::0 n
start_diff::ECT::1 to 7 n
start_diff::ECT::8 to 14 n
start_diff::ECT::15 to 29 n
start_diff::ECT::≥ 30 n
duration_diff::LB::≤ -30 n
duration_diff::LB::-29 to -15 n
duration_diff::LB::-14 to -8 n
duration_diff::LB::-7 to -1 n 5
duration_diff::LB::0 n
duration_diff::LB::1 to 7 n
duration_diff::LB::8 to 14 n
duration_diff::LB::15 to 29 n
duration_diff::LB::≥ 30 n
end_diff::LB::≤ -30 n
end_diff::LB::-29 to -15 n
end_diff::LB::-14 to -8 n
end_diff::LB::-7 to -1 n
end_diff::LB::0 n
end_diff::LB::1 to 7 n 5
end_diff::LB::8 to 14 n
end_diff::LB::15 to 29 n
end_diff::LB::≥ 30 n
start_diff::LB::≤ -30 n
start_diff::LB::-29 to -15 n
start_diff::LB::-14 to -8 n
start_diff::LB::-7 to -1 n
start_diff::LB::0 n
start_diff::LB::1 to 7 n
start_diff::LB::8 to 14 n
start_diff::LB::15 to 29 n
start_diff::LB::≥ 30 n
duration_diff::PREG::≤ -30 n
duration_diff::PREG::-29 to -15 n
duration_diff::PREG::-14 to -8 n
duration_diff::PREG::-7 to -1 n 10
duration_diff::PREG::0 n
duration_diff::PREG::1 to 7 n
duration_diff::PREG::8 to 14 n
duration_diff::PREG::15 to 29 n
duration_diff::PREG::≥ 30 n
end_diff::PREG::≤ -30 n
end_diff::PREG::-29 to -15 n
end_diff::PREG::-14 to -8 n
end_diff::PREG::-7 to -1 n 9
end_diff::PREG::0 n
end_diff::PREG::1 to 7 n
end_diff::PREG::8 to 14 n
end_diff::PREG::15 to 29 n
end_diff::PREG::≥ 30 n
start_diff::PREG::≤ -30 n
start_diff::PREG::-29 to -15 n
start_diff::PREG::-14 to -8 n
start_diff::PREG::-7 to -1 n
start_diff::PREG::0 n
start_diff::PREG::1 to 7 n 10
start_diff::PREG::8 to 14 n
start_diff::PREG::15 to 29 n
start_diff::PREG::≥ 30 n
duration_diff::SA::≤ -30 n
duration_diff::SA::-29 to -15 n
duration_diff::SA::-14 to -8 n
duration_diff::SA::-7 to -1 n
duration_diff::SA::0 n
duration_diff::SA::1 to 7 n
duration_diff::SA::8 to 14 n
duration_diff::SA::15 to 29 n
duration_diff::SA::≥ 30 n
end_diff::SA::≤ -30 n
end_diff::SA::-29 to -15 n
end_diff::SA::-14 to -8 n
end_diff::SA::-7 to -1 n
end_diff::SA::0 n
end_diff::SA::1 to 7 n
end_diff::SA::8 to 14 n
end_diff::SA::15 to 29 n
end_diff::SA::≥ 30 n
start_diff::SA::≤ -30 n
start_diff::SA::-29 to -15 n
start_diff::SA::-14 to -8 n
start_diff::SA::-7 to -1 n
start_diff::SA::0 n
start_diff::SA::1 to 7 n
start_diff::SA::8 to 14 n
start_diff::SA::15 to 29 n
start_diff::SA::≥ 30 n
duration_diff::SB::≤ -30 n
duration_diff::SB::-29 to -15 n
duration_diff::SB::-14 to -8 n
duration_diff::SB::-7 to -1 n
duration_diff::SB::0 n
duration_diff::SB::1 to 7 n
duration_diff::SB::8 to 14 n
duration_diff::SB::15 to 29 n
duration_diff::SB::≥ 30 n
end_diff::SB::≤ -30 n
end_diff::SB::-29 to -15 n
end_diff::SB::-14 to -8 n
end_diff::SB::-7 to -1 n
end_diff::SB::0 n
end_diff::SB::1 to 7 n
end_diff::SB::8 to 14 n
end_diff::SB::15 to 29 n
end_diff::SB::≥ 30 n
start_diff::SB::≤ -30 n
start_diff::SB::-29 to -15 n
start_diff::SB::-14 to -8 n
start_diff::SB::-7 to -1 n
start_diff::SB::0 n
start_diff::SB::1 to 7 n
start_diff::SB::8 to 14 n
start_diff::SB::15 to 29 n
start_diff::SB::≥ 30 n
outcome_accuracy overall n_correct 11
n_total 11
accuracy 1.00
outcome_by_year same_year_pairs overall_equal 11
overall_diff 16
lb_lb 8
lb_miscarriage
lb_ab
lb_sb
lb_unknown
sb_sb
sb_miscarriage
sb_ab
sb_lb
sb_unknown
ab_ab
ab_miscarriage
ab_lb
ab_sb
ab_unknown
duration_summary algorithm n 35
mean 250.51
median 224.00
sd 277.23
min 21.00
q25 147.00
q75 280.00
max 1,749.00
pet n 33
mean 210.45
median 260.00
sd 97.42
min 15.00
q25 140.00
q75 280.00
max 377.00
duration_matched_summary algorithm n 27
mean 196.00
median 147.00
sd 99.40
min 21.00
q25 147.00
q75 280.00
max 380.00
pet n 27
mean 193.96
median 150.00
sd 100.51
min 15.00
q25 138.50
q75 279.50
max 377.00
group_gestational_time matched n 27
mean 196.00
median 147.00
sd 99.40
min 21.00
q25 147.00
q75 280.00
max 380.00
algorithm_only n 8
mean 434.50
median 287.50
sd 535.20
min 147.00
q25 212.50
q75 303.75
max 1,749.00
pet_only n 6
mean 284.67
median 280.00
sd 11.43
min 280.00
q25 280.00
q75 280.00
max 308.00
group_outcome matched:AB n
pct 3.70
matched:ECT n
pct 3.70
matched:LB n 8
pct 29.63
matched:PREG n 15
pct 55.56
matched:SA n
pct 3.70
matched:SB n
pct 3.70
algorithm_only:DELIV n
pct 12.50
algorithm_only:LB n
pct 12.50
algorithm_only:PREG n 6
pct 75.00
pet_only:LB n 6
pct 100.00
group_source matched:both n 24
pct 88.89
matched:hip_only n
pct 3.70
matched:pps_only n
pct 7.41
algorithm_only:both n 6
pct 75.00
algorithm_only:hip_only n
pct 25.00
person_venn_counts both n_persons 25
n_alg_episodes 28
n_pet_episodes 33
algorithm_only n_persons 7
n_episodes 7
pet_only n_persons
n_episodes
person_episodes_per_person both:algorithm mean 1.12
median 1.00
sd 0.33
min 1.00
q25 1.00
q75 1.00
max 2.00
both:pet mean 1.32
median 1.00
sd 0.56
min 1.00
q25 1.00
q75 2.00
max 3.00
algorithm_only mean 1.00
median 1.00
sd 0.00
min 1.00
q25 1.00
q75 1.00
max 1.00
pet_only mean
median
sd
min
q25
q75
max
pet_only_hip_coverage overall n_total 6
n_with_hip
n_without_hip 6
pct_with_hip 0.00
pet_only_pps_coverage overall n_with_pps
n_without_pps 6
pct_with_pps 0.00
pet_only_any_record_coverage overall n_with_any
n_without_any 6
pct_with_any 0.00
pet_only_hip_record_count overall n 6
mean 0.00
median 0.00
sd 0.00
min 0.00
q25 0.00
q75 0.00
max 0.00
pet_only_pps_record_count overall n 6
mean 0.00
median 0.00
sd 0.00
min 0.00
q25 0.00
q75 0.00
max 0.00
delivery_mode matched:2014 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2015 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2017 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2018 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2019 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2021 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2023 n 11
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:2024 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
matched:overall n 27
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
algorithm_only:2015 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
algorithm_only:2018 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
algorithm_only:2020 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
algorithm_only:2023 n
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal
algorithm_only:overall n 8
n_cesarean
n_vaginal
n_known
pct_cesarean
pct_vaginal

The summarised result is in long format: each row has , , , and . The helper used below extracts and pivots one variable into a wide table for display. The following sections describe each metric.

Episode and person counts

kable(sr_table(res, "episode_counts", "source"), format = "html", caption = "Episode counts: algorithm vs PET")
Episode counts: algorithm vs PET
source n_episodes n_persons
algorithm 35 32
pet 33 25

Number of episodes and distinct persons in the algorithm output and in the PET table.

Protocol summary

kable(sr_table(res, "protocol_summary"), format = "html", caption = "Protocol summary (for reporting)")
Protocol summary (for reporting)
variable_level total_pet_episodes total_algorithm_episodes total_matched_episodes
overall 33 35 27

Totals and number of matched episodes (one-to-one pairs).

Person overlap

kable(sr_table(res, "person_overlap", "metric"), format = "html", caption = "Person overlap")
Person overlap
metric n_persons
raw_person_overlap 25
cohort_person_overlap 23
  • raw_person_overlap: distinct persons with at least one PET episode and one algorithm episode.
  • cohort_person_overlap: same, but after filtering both sources to gestation 0–308 days and end ≥ start.

Venn counts (matched / PET-only / algorithm-only)

kable(sr_table(res, "venn_counts", "category"), format = "html", caption = "Venn counts (one-to-one matching)")
Venn counts (one-to-one matching)
category n_episodes n_pet_matched n_alg_matched
both 27 27 27
pet_only 6 27 27
algorithm_only 8 27 27
  • both: number of matched episode pairs.
  • pet_only: PET episodes with no matched algorithm episode.
  • algorithm_only: algorithm episodes with no matched PET episode.

2×2 confusion matrix (PET as reference)

kable(sr_table(res, "confusion_2x2", "cell"), format = "html", caption = "2×2 confusion matrix (PET = reference)")
2×2 confusion matrix (PET = reference)
cell count
TP 27
FN 6
FP 8
TN NA
  • TP: PET episode has a matched algorithm episode.
  • FN: PET episode has no match (algorithm “miss”).
  • FP: Algorithm episode has no match (algorithm “extra”).
  • TN: Not defined at episode level (no negative population).

Sensitivity, PPV, NPV

kable(sr_table(res, "ppv_sensitivity", "metric"), format = "html", caption = "Sensitivity, specificity, PPV, NPV")
Sensitivity, specificity, PPV, NPV
metric value numerator denominator
sensitivity 0.818181818181818 27 33
ppv 0.771428571428571 27 35
  • Sensitivity = TP / (TP + FN): fraction of PET episodes that have a match.
  • PPV = TP / (TP + FP): fraction of algorithm episodes that match a PET episode.
  • Specificity and NPV use TN and are NA at episode level.

Date differences (matched pairs only)

For each matched pair, date differences are PET − algorithm (positive = PET is later, i.e. algorithm starts/ends too early). Three measures are reported:

  • Start date difference: PET start − algorithm start. Positive = algorithm starts too early; negative = algorithm starts too late.
  • End date difference: PET end − algorithm end. Same sign convention.
  • Duration difference: PET duration − algorithm duration. Positive = algorithm episodes are shorter; negative = longer.
dd_summary <- sr_table(res, "date_difference_summary")
if (!is.null(dd_summary) && nrow(dd_summary) > 0) {
  kable(dd_summary, format = "html", caption = "Date difference summary (PET − algorithm, days)")
} else {
  cat("No matched pairs; date differences not computed.\n")
}
Date difference summary (PET − algorithm, days)
variable_level mean median sd min q25 q75 max n_matched
Start date difference (PET - Algorithm, days) 1.37037037037037 2 2.73366685478214 -5 -0.5 3 5 27
End date difference (PET - Algorithm, days) -0.666666666666667 0 2.54195563720897 -5 -3 1 4 27
Duration difference (PET - Algorithm, days) -2.03703703703704 -2 4.16470039075194 -9 -4.5 -1 8 27

Date differences by outcome

The same start/end/duration differences are also reported stratified by algorithm outcome category (LB, SB, SA, AB, PREG, etc.), since different outcomes have very different expected durations.

dd_by_outcome <- sr_table(res, "date_difference_by_outcome")
if (!is.null(dd_by_outcome) && nrow(dd_by_outcome) > 0) {
  kable(dd_by_outcome, format = "html", caption = "Date differences by algorithm outcome (PET − algorithm, days)")
} else {
  cat("No outcome-stratified date differences available.\n")
}
Date differences by algorithm outcome (PET − algorithm, days)
variable_level mean median sd min q25 q75 max n_matched
Start date difference (PET - Algorithm, days) [AB] 3 3 NA 3 3 3 3 NA
End date difference (PET - Algorithm, days) [AB] 1 1 NA 1 1 1 1 NA
Duration difference (PET - Algorithm, days) [AB] -2 -2 NA -2 -2 -2 -2 NA
Start date difference (PET - Algorithm, days) [ECT] 0 0 NA 0 0 0 0 NA
End date difference (PET - Algorithm, days) [ECT] 0 0 NA 0 0 0 0 NA
Duration difference (PET - Algorithm, days) [ECT] 0 0 NA 0 0 0 0 NA
Start date difference (PET - Algorithm, days) [LB] -0.25 -0.5 3.15096357144468 -5 -1.75 3 3 8
End date difference (PET - Algorithm, days) [LB] 0.125 1.5 3.27053949241231 -5 -3 3 3 8
Duration difference (PET - Algorithm, days) [LB] 0.375 -1.5 5.0972681759098 -6 -2.5 4 8 8
Start date difference (PET - Algorithm, days) [PREG] 2.2 3 2.51282425057657 -2 0 4.5 5 15
End date difference (PET - Algorithm, days) [PREG] -1.06666666666667 -1 2.25092573548455 -5 -2.5 0 4 15
Duration difference (PET - Algorithm, days) [PREG] -3.26666666666667 -2 3.65409098851971 -9 -6.5 -1 3 15
Start date difference (PET - Algorithm, days) [SA] 1 1 NA 1 1 1 1 NA
End date difference (PET - Algorithm, days) [SA] -4 -4 NA -4 -4 -4 -4 NA
Duration difference (PET - Algorithm, days) [SA] -5 -5 NA -5 -5 -5 -5 NA
Start date difference (PET - Algorithm, days) [SB] 2 2 NA 2 2 2 2 NA
End date difference (PET - Algorithm, days) [SB] 0 0 NA 0 0 0 0 NA
Duration difference (PET - Algorithm, days) [SB] -2 -2 NA -2 -2 -2 -2 NA

Alignment distribution

The date differences can also be viewed as a binned distribution (histogram), which shows at a glance what fraction of matched episodes are perfectly aligned (bin = 0), close (within ±7 days), or far off (±30+ days). The bins are coloured on a red-yellow-green gradient where green = exact match and red = large discrepancy.

This visualisation is available interactively in the Shiny app’s Alignment tab. It can be filtered by measure (start/end/duration) and stratified by outcome.

Outcome confusion (matched pairs)

Cross-tabulation of PET outcome (concept_id) vs algorithm outcome (LB, SB, AB, SA, etc.) is not included in the summarised result CSV; only aggregate metrics are exported.

Outcome accuracy

Among matched pairs with a mappable algorithm outcome (LB/SB/AB/SA/DELIV → concept_id), the fraction where PET outcome concept_id equals the algorithm-mapped concept_id.

kable(sr_table(res, "outcome_accuracy"), format = "html", caption = "Outcome accuracy (matched pairs)")
Outcome accuracy (matched pairs)
variable_level n_correct n_total accuracy
overall 11 11 1

Outcome by year (same-year pairs)

Among matched pairs in the same year (algorithm start year = PET start year), counts of agreement (e.g. lb_lb, sb_sb) and disagreement (e.g. lb_sb, sb_lb). Used for same-year outcome cross-tabs.

kable(sr_table(res, "outcome_by_year"), format = "html", caption = "Outcome by year (same-year pairs)")
Outcome by year (same-year pairs)
variable_level overall_equal overall_diff lb_lb lb_miscarriage lb_ab lb_sb lb_unknown sb_sb sb_miscarriage sb_ab sb_lb sb_unknown ab_ab ab_miscarriage ab_lb ab_sb ab_unknown
same_year_pairs 11 16 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Duration summary (all episodes)

kable(sr_table(res, "duration_summary", "source"), format = "html", caption = "Pregnancy duration (days) by source")
Pregnancy duration (days) by source
source n mean median sd min q25 q75 max
algorithm 35 250.514285714286 224 277.22568544828 21 147 280 1749
pet 33 210.454545454545 260 97.4211511008681 15 140 280 377

Summary statistics of episode length (end − start) for all algorithm episodes and all PET episodes.

Duration matched summary

dm <- sr_table(res, "duration_matched_summary", "source")
if (!is.null(dm) && nrow(dm) > 0) {
  kable(dm, format = "html", caption = "Duration (matched pairs only)")
} else {
  cat("No duration matched summary (no matched pairs).\n")
}
Duration (matched pairs only)
source n mean median sd min q25 q75 max
algorithm 27 196 147 99.401284622561 21 147 280 380
pet 27 193.962962962963 150 100.511568978777 15 138.5 279.5 377

Duration statistics for the matched episodes only (algorithm vs PET).


Files written to exportFolder

File Content
pet_comparison_summarised_result.csv All comparison metrics in SummarisedResult format (episode counts, protocol summary, person overlap, Venn counts, time overlap, confusion 2x2, PPV/sensitivity, date-difference summary, outcome accuracy, outcome by year, duration summaries). Includes settings (e.g. , , ) for traceability. Use to read and to display.
log.txt Run log (appended if file already exists).

Using your own PET table

If your CDM already has a PET table (e.g. omop_cmbd.pregnancy_episode), call:

comparePregnancyIdentifierWithPET(
  cdm = your_cdm,
  outputFolder = "/path/to/pipeline/output",
  exportFolder = "/path/to/comparison/results",
  petSchema = "omop_cmbd",
  petTable = "pregnancy_episode",
  minOverlapDays = 1L,
  removeWithinSourceOverlaps = FALSE
)
res <- omopgenerics::importSummarisedResult(file.path("/path/to/comparison/results", "pet_comparison_summarised_result.csv"))

Ensure the PET table has at least: person_id, pregnancy_start_date, pregnancy_end_date, pregnancy_outcome (concept_id). Gestational length in days is computed from start and end dates using the database (via ).

# Clean up temp dirs (optional; disconnect from cdm when done in your session)
unlink(outputDir, recursive = TRUE)
unlink(exportFolder, recursive = TRUE)