Choose subset of data with larger/smaller mean value
filter_based_on_mean_grp_value.RdChoose subset of data with larger/smaller mean value
Arguments
- data
dataframe. Dataframe for which subset(s) should be chosen.
- val
character. Name of column in
datacontaining the data whose mean determines which group ingroup_innerto choose.- grp_outer
character. Name of column in
dataindicating a group of entries for which only one level ingrp_innershould be chosen.- grp_inner
character. Name of of column in
dataindicating the sub-groups in grp_outer. Only one level of grp_inner will be chosen per level of grp_outer.- sel
'smaller' or 'larger'. Level of
grp_innerwill be chosen per level ofgrp_outersuch that it has the smallest mean of all groups (as defined bygrp_inner) for that level ofgrp_outer, ifsel == 'smaller'. Opposite ifsel == 'larger'. No default.
Details
This was written for the situation where the abundance of various types of cells are available (e.g. CD4 T cells expressing IFNg+), and where both frequencies and counts are available but it is not indicated which column pertains to frequencies and which to counts. Note that this function will not work reliably when the response is a frequency (rather than a proportion) and the denominator cell count is not consistently much higher than 100.
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
library(tibble)
library(purrr)
library(stringr)
set.seed(3)
test_tbl <- tibble(
pid = rep(c("id1", "id2", "id1", "id2"), each = 2),
type = c(paste0(rep(c("a", "b"), each = 4), rep(c("", "_1"), 4)))
) %>%
mutate(type_base = stringr::str_remove(type, "_1")) %>%
group_by(pid, type_base) %>%
mutate(resp = purrr::map_dbl(type_base, function(x) {
round(rnorm(1, 5 + stringr::str_detect(x, "b") * 3), 2)
})[1]) %>%
ungroup() %>%
mutate(resp = ifelse(str_detect(type, "_1"), rep(runif(4, 1e4, 1e5), each = 2), 1) * resp)
test_tbl
#> # A tibble: 8 × 4
#> pid type type_base resp
#> <chr> <chr> <chr> <dbl>
#> 1 id1 a a 4.04
#> 2 id1 a_1 a 80923.
#> 3 id2 a a 5.2
#> 4 id2 a_1 a 381326.
#> 5 id1 b b 8.26
#> 6 id1 b_1 b 749793.
#> 7 id2 b b 8.09
#> 8 id2 b_1 b 284573.
filter_based_on_mean_grp_value(
data = test_tbl, val = "resp", grp_outer = "type_base",
grp_inner = "type", sel = "smaller"
)
#> # A tibble: 4 × 4
#> pid type type_base resp
#> <chr> <chr> <chr> <dbl>
#> 1 id1 a a 4.04
#> 2 id2 a a 5.2
#> 3 id1 b b 8.26
#> 4 id2 b b 8.09