Introduction
Dealing with an
visOmopResults package contains some functionalities that helps on this process:
-
filterSettings
to filter theobject using the settings()
attribute. -
filterGroup
to filter theobject using the group_name-group_level tidy columns. -
filterStrata
to filter theobject using the strata_name-starta_level tidy columns. -
filterAdditional
to filter theobject using the additional_name-additional_level tidy columns.
In this vignette we will also cover two types of utility functions:
-
unite*
type functions: to join multiple columns in the name-level structure. -
*Columns
type functions: to identify the columns that are contained in a name-level structure.
Now we will see some examples.
filterSettings
For this example we will use some mock data.
library(visOmopResults)
library(dplyr, warn.conflicts = FALSE)
Let’s generate two sets of results:
result1 <- mockSummarisedResult()
result2 <- mockSummarisedResult()
We can change the settings of the second set of results simulating that results come from a different package ans set of results:
result2 <- result2 |>
omopgenerics::newSummarisedResult(settings = tibble(
result_id = 1L,
result_type = "second_mock_result",
package_name = "omopgenerics",
package_version = "1.0.0",
my_parameter = TRUE
))
We can now merge both results in a unique
result <- bind(result1, result2)
settings(result)
#> # A tibble: 2 × 5
#> result_id result_type package_name package_version my_parameter
#> <int> <chr> <chr> <chr> <lgl>
#> 1 1 mock_summarised_result visOmopResults 0.4.1 NA
#> 2 2 second_mock_result omopgenerics 1.0.0 TRUE
result
#> # A tibble: 252 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mock cohort_name cohort1 overall overall
#> 2 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Male
#> 3 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Male
#> 4 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 5 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 6 1 mock cohort_name cohort1 sex Male
#> 7 1 mock cohort_name cohort1 sex Female
#> 8 1 mock cohort_name cohort1 age_group <40
#> 9 1 mock cohort_name cohort1 age_group >=40
#> 10 1 mock cohort_name cohort2 overall overall
#> # ℹ 242 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
You could potentially add the settings with
addSettings()
, then filter and finally eliminate the
columns. Let’s see an example where we subset to the results that have
my_parameter == TRUE:
resultMyParam <- result |>
addSettings(settingsColumns = "my_parameter") |>
filter(my_parameter == TRUE) |>
select(!"my_parameter")
resultMyParam
#> # A tibble: 126 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 mock cohort_name cohort1 overall overall
#> 2 2 mock cohort_name cohort1 age_group &&& sex <40 &&& Male
#> 3 2 mock cohort_name cohort1 age_group &&& sex >=40 &&& Male
#> 4 2 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 5 2 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 6 2 mock cohort_name cohort1 sex Male
#> 7 2 mock cohort_name cohort1 sex Female
#> 8 2 mock cohort_name cohort1 age_group <40
#> 9 2 mock cohort_name cohort1 age_group >=40
#> 10 2 mock cohort_name cohort2 overall overall
#> # ℹ 116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
This approach has some problems:
- It is not efficient.
- We have to use three different functions.
- The settings attribute still contains both sets:
settings(resultMyParam)
#> # A tibble: 2 × 5
#> result_id result_type package_name package_version my_parameter
#> <int> <chr> <chr> <chr> <lgl>
#> 1 1 mock_summarised_result visOmopResults 0.4.1 NA
#> 2 2 second_mock_result omopgenerics 1.0.0 TRUE
We can do the same solving the three problems using
filterSettings()
:
resultMyParam <- result |>
filterSettings(my_parameter == TRUE)
resultMyParam
#> # A tibble: 126 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 mock cohort_name cohort1 overall overall
#> 2 2 mock cohort_name cohort1 age_group &&& sex <40 &&& Male
#> 3 2 mock cohort_name cohort1 age_group &&& sex >=40 &&& Male
#> 4 2 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 5 2 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 6 2 mock cohort_name cohort1 sex Male
#> 7 2 mock cohort_name cohort1 sex Female
#> 8 2 mock cohort_name cohort1 age_group <40
#> 9 2 mock cohort_name cohort1 age_group >=40
#> 10 2 mock cohort_name cohort2 overall overall
#> # ℹ 116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
settings(resultMyParam)
#> # A tibble: 1 × 5
#> result_id result_type package_name package_version my_parameter
#> <int> <chr> <chr> <chr> <lgl>
#> 1 2 second_mock_result omopgenerics 1.0.0 TRUE
filterStrata
Using the same mock data we can try to filter the rows that contain data related to ‘Female’, the problematic with the strata_name-strata_level display is that it is difficult to easily filter the “Female” columns:
result |>
select(strata_name, strata_level) |>
distinct()
#> # A tibble: 9 × 2
#> strata_name strata_level
#> <chr> <chr>
#> 1 overall overall
#> 2 age_group &&& sex <40 &&& Male
#> 3 age_group &&& sex >=40 &&& Male
#> 4 age_group &&& sex <40 &&& Female
#> 5 age_group &&& sex >=40 &&& Female
#> 6 sex Male
#> 7 sex Female
#> 8 age_group <40
#> 9 age_group >=40
One option that we could use is splitStrata()
,
filter()
and then uniteStrata()
again:
result |>
splitStrata() |>
filter(sex == "Female") |>
uniteStrata(c("age_group", "sex"))
#> # A tibble: 84 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 2 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 3 1 mock cohort_name cohort1 sex Female
#> 4 1 mock cohort_name cohort2 age_group &&& sex <40 &&& Female
#> 5 1 mock cohort_name cohort2 age_group &&& sex >=40 &&& Female
#> 6 1 mock cohort_name cohort2 sex Female
#> 7 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 8 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 9 1 mock cohort_name cohort1 sex Female
#> 10 1 mock cohort_name cohort2 age_group &&& sex <40 &&& Female
#> # ℹ 74 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
Problem of this is that:
- It is extremely inefficient (all the rows must be splitted and united back).
- You need to know which are the strata columns (you could potentially
use
strataColumns()
). - You have to use multiple functions.
We could do exactly the same with the function
filterStrata()
:
result |>
filterStrata(sex == "Female")
#> # A tibble: 84 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 2 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 3 1 mock cohort_name cohort1 sex Female
#> 4 1 mock cohort_name cohort2 age_group &&& sex <40 &&& Female
#> 5 1 mock cohort_name cohort2 age_group &&& sex >=40 &&& Female
#> 6 1 mock cohort_name cohort2 sex Female
#> 7 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 8 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 9 1 mock cohort_name cohort1 sex Female
#> 10 1 mock cohort_name cohort2 age_group &&& sex <40 &&& Female
#> # ℹ 74 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
filterGroup()
and filterAdditional()
work
exactly in the same way than filterStrata()
but with their
analogous columns.
A nice functionality that you may have is that is that if you filter by a column/setting that does not exist the output will be warning + return emptySummarisedResult() which can be quite hekpful in some occasions.
result |>
filterSettings(setting_that_does_not_exist == 1)
#> Warning: ! Variable filtering does not exist, returning empty result:
#> ℹ In argument: `setting_that_does_not_exist == 1`.
#> # A tibble: 0 × 13
#> # ℹ 13 variables: result_id <int>, cdm_name <chr>, group_name <chr>,
#> # group_level <chr>, strata_name <chr>, strata_level <chr>,
#> # variable_name <chr>, variable_level <chr>, estimate_name <chr>,
#> # estimate_type <chr>, estimate_value <chr>, additional_name <chr>,
#> # additional_level <chr>
unite functions
In this previous section we have mentioned the function
uniteStrata()
without explaining its functionality so let’s
cover it with a couple of examples. The uniteGroup()
,
uniteStrata()
and uniteAdditional()
functions
are the opposite to the split*()
type functions:
result |>
splitStrata()
#> # A tibble: 252 × 13
#> result_id cdm_name group_name group_level age_group sex variable_name
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mock cohort_name cohort1 overall overall number subjects
#> 2 1 mock cohort_name cohort1 <40 Male number subjects
#> 3 1 mock cohort_name cohort1 >=40 Male number subjects
#> 4 1 mock cohort_name cohort1 <40 Female number subjects
#> 5 1 mock cohort_name cohort1 >=40 Female number subjects
#> 6 1 mock cohort_name cohort1 overall Male number subjects
#> 7 1 mock cohort_name cohort1 overall Female number subjects
#> 8 1 mock cohort_name cohort1 <40 overall number subjects
#> 9 1 mock cohort_name cohort1 >=40 overall number subjects
#> 10 1 mock cohort_name cohort2 overall overall number subjects
#> # ℹ 242 more rows
#> # ℹ 6 more variables: variable_level <chr>, estimate_name <chr>,
#> # estimate_type <chr>, estimate_value <chr>, additional_name <chr>,
#> # additional_level <chr>
result |>
splitStrata() |>
uniteStrata(c("age_group", "sex"))
#> # A tibble: 252 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mock cohort_name cohort1 overall overall
#> 2 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Male
#> 3 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Male
#> 4 1 mock cohort_name cohort1 age_group &&& sex <40 &&& Female
#> 5 1 mock cohort_name cohort1 age_group &&& sex >=40 &&& Female
#> 6 1 mock cohort_name cohort1 sex Male
#> 7 1 mock cohort_name cohort1 sex Female
#> 8 1 mock cohort_name cohort1 age_group <40
#> 9 1 mock cohort_name cohort1 age_group >=40
#> 10 1 mock cohort_name cohort2 overall overall
#> # ℹ 242 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
Note that missing will be not included (if all missing: overall-overall will be considered), for example:
age_group | sex | year |
---|---|---|
NA | NA | NA |
NA | Male | NA |
<40 | Female | 2010 |
>40 | NA | 2011 |
With uniteStrata(cols = c("age_group", "sex", "year"))
the output would be:
strata_name | strata_level |
---|---|
overall | overall |
sex | Male |
age_group &&& sex &&& year | <40 &&& Female &&& 2010 |
age_group &&& year | >40 &&& 2011 |
Note that is we split again then year will be a character vector instead of an integer. In future releases conserving type may be possible.
uniteGroup()
and uniteAdditional()
work
exactly in the same way than uniteStrata()
but with their
analogous columns.
Columns
Splitting and tidying your settings
to add, or how many columns will be
generated when splitting group. That’s why visOmopResults created some
helper functions:
-
settingsColumns()
gives you the setting names that are available in aobject. -
groupColumns()
gives you the new columns that will be generated when splitting group_name-group_level pair into different columns. -
strataColumns()
gives you the new columns that will be generated when splitting strata_name-strata_level pair into different columns. -
additionalColumns()
gives you the new columns that will be generated when splitting additional_name-additional_level pair into different columns. -
tidyColumns()
gives you the columns that will have the object if you tidy it (tidy(result)
). This function in very useful to know which are the columns that can be included in plot and table functions.
Let’s see the different values with out example mock data set:
settingsColumns(result)
#> [1] "result_type" "package_name" "package_version" "my_parameter"
groupColumns(result)
#> [1] "cohort_name"
strataColumns(result)
#> [1] "age_group" "sex"
additionalColumns(result)
#> character(0)
tidyColumns(result)
#> [1] "cdm_name" "cohort_name" "age_group" "sex"
#> [5] "variable_name" "variable_level" "count" "mean"
#> [9] "sd" "percentage" "result_type" "package_name"
#> [13] "package_version" "my_parameter"