Tidy your summarised result object

`<summarised_result>` format

The <summarised_result> format is a standard output defined in omopgenerics. The fact that it is standardised output make it a very powerful tool so multiple functions can export on the same format and built functionalities on top of it, as it can be seen in tables and plots vignettes. This standard output it can be some times hard to manipulate to do your custom analysis. visOmopResults contains tools to tidy your <summarised_result> object that are covered in this vignette.

Tidy `<summarised_result>`

visOmopResults defines the method tidy for <summarised_result> object, what this function does is to:

1. Split group, strata, and additional pairs into separate columns:

The <summarised_result> object has the following pair columns: group_name-group_level, strata_name-strata_level, and additional_name-additional_level. These pairs use the &&& separator to combine multiple fields, for example if you want to combine cohort_name and age_group in group_name-group_level pair: group_name = "cohort_name &&& age_group" and group_level = "my_cohort &&& <40". By default if no aggregation is produced in group_name-group_level pair: group_name = "overall" and group_level = "overall".

ORIGINAL FORMAT:

group_name	group_level
cohort_name	acetaminophen
cohort_name &&& sex	acetaminophen &&& Female
sex &&& age_group	Male &&& <40

The tidy format puts each one of the values as a columns. Making it easier to manipulate but at the same time the output is not standardised anymore as each <summarised_result> object will have a different number and names of columns. Missing values will be filled with the “overall” label.

TIDY FORMAT:

cohort_name	sex	age_group
acetaminophen	overall	overall
acetaminophen	Female	overall
overall	Male	<40

2. Add settings of the `<summarised_result>` object as columns:

Each <summarised_result> object has a setting attribute that relates the ‘result_id’ column with each different set of settings. The columns ‘result_type’, ‘package_name’ and ‘package_version’ are always present in settings, but then we may have some extra parameters depending how the object was created. So in the <summarised_result> format we need to use these settings() functions to see those variables:

ORIGINAL FORMAT:

settings:

result_id	my_setting	package_name
1	TRUE	visOmopResults
2	FALSE	visOmopResults

<summarised_result>:

result_id	cdm_name		additional_name
1	omop	...	overall
...	...	...	...
2	omop	...	overall
...	...	...	...

But in the tidy format we add the settings as columns, making that their value is repeated multiple times (there is only one row per result_id in settings, whereas there can be multiple rows in the <summarised_result> object). The column ‘result_id’ is eliminated as it does not provide information anymore. Again we loose on standardisation (multiple different settings), but we gain in flexibility:

TIDY FORMAT:

cdm_name		additional_name	my_setting	package_name
omop	...	overall	TRUE	visOmopResults
...	...	...	...	...
omop	...	overall	FALSE	visOmopResults
...	...	...	...	...

3. Pivot estimates as columns:

In the <summarised_result> format estimates are displayed in 3 columns:

‘estimate_name’ indicates the name of the estimate.
‘estimate_type’ indicates the type of the estimate (as all of them will be casted to character). Possible values are: numeric, integer, date, character, proportion, percentage, logical.
‘estimate_value’ value of the estimate as <character>.

ORIGINAL FORMAT:

variable_name	estimate_name	estimate_type	estimate_value
number individuals	count	integer	100
age	mean	numeric	50.3
age	sd	numeric	20.7

In the tidy format we pivot the estimates, creating a new column for each one of the ‘estimate_name’ values. The columns will be casted to ‘estimate_type’. If there are multiple estimate_type(s) for same estimate_name they won’t be casted and they will be displayed as character (a warning will be thrown). Missing data are populated with NAs.

TIDY FORMAT:

variable_name	count	mean	sd
number individuals	100	NA	NA
age	NA	50.3	20.7

Example

Let’s see a simple example with some toy data:

library(visOmopResults)
result <- mockSummarisedResult()
result |>
  tidy()
#> # A tibble: 72 × 13
#>    cdm_name cohort_name age_group sex     variable_name   variable_level   count
#>    <chr>    <chr>       <chr>     <chr>   <chr>           <chr>            <int>
#>  1 mock     cohort1     overall   overall number subjects NA             9200055
#>  2 mock     cohort1     <40       Male    number subjects NA             4007202
#>  3 mock     cohort1     >=40      Male    number subjects NA             2131727
#>  4 mock     cohort1     <40       Female  number subjects NA             6717668
#>  5 mock     cohort1     >=40      Female  number subjects NA              586141
#>  6 mock     cohort1     overall   Male    number subjects NA             9970691
#>  7 mock     cohort1     overall   Female  number subjects NA             1490355
#>  8 mock     cohort1     <40       overall number subjects NA             5185566
#>  9 mock     cohort1     >=40      overall number subjects NA             8461201
#> 10 mock     cohort2     overall   overall number subjects NA             7182697
#> # ℹ 62 more rows
#> # ℹ 6 more variables: mean <dbl>, sd <dbl>, percentage <dbl>,
#> #   result_type <chr>, package_name <chr>, package_version <chr>

Customise your tidy summarised_result

We have several functions to customise the tidy version of the <summarised_result> object. The main one is: tidySummarisedResult().

With this function we can choose which one of the pairs are splitted:

splitGroup = TRUE splits the pair group_name-group_level columns.
splitStrata = TRUE splits the pair strata_name-strata_level columns.
splitAdditional = TRUE splits the pair additional_name-additional_level columns.

Which are the settings columns to be added in the <summarised_result> object can be chosen with the settingsColumns argument.

Then we have one argument pivotEstimatesBy to decide which are the variables that we want to use to pivot by, there are four options:

NULL/character() to not pivot anything.
c("estimates_name") to pivot only estimate_name.
c("variable_level", "estimates_name") to pivot estimate_name and variable_level.
c("variable_name", "variable_level", "estimates_name") to pivot estimate_name, variable_level and variable_name.

Note that ‘variable_level’ can contain NA values, these will be ignored on the naming part.

The default values of tidySummarisedResult() will make that it is nearly equivalent to the tidy method (‘result_id’ is not eliminated by tidySummarisedResult()).

tidySummarisedResult(result)
#> # A tibble: 72 × 14
#>    result_id cdm_name cohort_name age_group sex     variable_name variable_level
#>        <int> <chr>    <chr>       <chr>     <chr>   <chr>         <chr>         
#>  1         1 mock     cohort1     overall   overall number subje… NA            
#>  2         1 mock     cohort1     <40       Male    number subje… NA            
#>  3         1 mock     cohort1     >=40      Male    number subje… NA            
#>  4         1 mock     cohort1     <40       Female  number subje… NA            
#>  5         1 mock     cohort1     >=40      Female  number subje… NA            
#>  6         1 mock     cohort1     overall   Male    number subje… NA            
#>  7         1 mock     cohort1     overall   Female  number subje… NA            
#>  8         1 mock     cohort1     <40       overall number subje… NA            
#>  9         1 mock     cohort1     >=40      overall number subje… NA            
#> 10         1 mock     cohort2     overall   overall number subje… NA            
#> # ℹ 62 more rows
#> # ℹ 7 more variables: count <int>, mean <dbl>, sd <dbl>, percentage <dbl>,
#> #   result_type <chr>, package_name <chr>, package_version <chr>

But then we can have some customised behaviour:

result |>
  tidySummarisedResult(
    splitAdditional = FALSE,
    settingsColumns = "package_name",
    pivotEstimatesBy = c("variable_level", "estimate_name")
  )
#> # A tibble: 54 × 16
#>    result_id cdm_name cohort_name age_group sex    variable_name additional_name
#>        <int> <chr>    <chr>       <chr>     <chr>  <chr>         <chr>          
#>  1         1 mock     cohort1     overall   overa… number subje… overall        
#>  2         1 mock     cohort1     <40       Male   number subje… overall        
#>  3         1 mock     cohort1     >=40      Male   number subje… overall        
#>  4         1 mock     cohort1     <40       Female number subje… overall        
#>  5         1 mock     cohort1     >=40      Female number subje… overall        
#>  6         1 mock     cohort1     overall   Male   number subje… overall        
#>  7         1 mock     cohort1     overall   Female number subje… overall        
#>  8         1 mock     cohort1     <40       overa… number subje… overall        
#>  9         1 mock     cohort1     >=40      overa… number subje… overall        
#> 10         1 mock     cohort2     overall   overa… number subje… overall        
#> # ℹ 44 more rows
#> # ℹ 9 more variables: additional_level <chr>, count <int>, mean <dbl>,
#> #   sd <dbl>, Amoxiciline_count <int>, Amoxiciline_percentage <dbl>,
#> #   Ibuprofen_count <int>, Ibuprofen_percentage <dbl>, package_name <chr>

Each one of the options have their custom functions:

Split

The functions split are provided independent:

splitGroup() only splits the pair group_name-group_level columns.
splitStrata() only splits the pair strata_name-strata_level columns.
splitAdditional() only splits the pair additional_name-additional_level columns.

There is also the function: - splitAll() that splits any pair x_name-x_level that is found on the data.

Pivot estimates

pivotEstimates() can be used to pivot the variables that we are interested in.

Add settings

addSettings() is used to add the settings that we want as new columns to our <summarised_result> object.

<summarised_result> format

Tidy <summarised_result>

1. Split group, strata, and additional pairs into separate columns:

2. Add settings of the <summarised_result> object as columns:

3. Pivot estimates as columns:

Example

Customise your tidy summarised_result

Split

Pivot estimates

Add settings

`<summarised_result>` format

Tidy `<summarised_result>`

2. Add settings of the `<summarised_result>` object as columns: