Convert a Data Frame Between 'Database' and 'Rvec' Formats
Source:R/collapse_to_rvec.R
collapse_to_rvec.Rd
collapse_to_rvec()
converts a data frame from
a 'database' format to an 'rvec' format.
expand_from_rvec()
, does the opposite,
converting a data frame from an rvecs format
to a database format.
Usage
collapse_to_rvec(data, draw = draw, values = value, by = NULL, type = NULL)
# S3 method for class 'data.frame'
collapse_to_rvec(data, draw = draw, values = value, by = NULL, type = NULL)
# S3 method for class 'grouped_df'
collapse_to_rvec(data, draw = draw, values = value, by = NULL, type = NULL)
expand_from_rvec(data, draw = "draw")
# S3 method for class 'data.frame'
expand_from_rvec(data, draw = "draw")
# S3 method for class 'grouped_df'
expand_from_rvec(data, draw = "draw")
Arguments
- data
A data frame, possibly grouped.
- draw
<
tidyselect
> The variable that uniquely identifies random draws within each combination of values for the 'by' variables. Must be quoted forexpand_from_rvec()
.- values
<
tidyselect
> One or more variables indata
that hold measurements.- by
<
tidyselect
> Variables used to stratify or cross-classify the data. See Details.- type
String specifying the class of rvec to use for each variable. Optional. See Details.
Details
In database format, each row represents one random draw. The data frame contains a 'draw' variable that distinguishes different draws within the same combination of 'by' variables. In rvec format, each row represents one combination of 'by' variables, and multiple draws are stored in an rvec. See below for examples.
by
argument
The by
argument is used to specify stratifying
variables. For instance if by
includes sex
and age
,
then data frame produced by collapse_to_rvec()
has separate rows for each
combination of sex
and age
.
If data
is a
grouped
data frame, then the grouping variables
take precedence over by
.
If no value for by
is provided,
and data
is not a grouped data frame,
then collapse_to_rvec()
assumes that all variables in data
that are
not included in value
and draw
should be included in by
.
type
argument
By default, collapse_to_rvec()
calls function
rvec()
on each values variable in data
.
rvec()
chooses the class of the output (ie
rvec_chr
, rvec_dbl
, rvec_int
, or rvec_lgl
)
depending on the input. Types can instead
be specified in advance, using the type
argument.
type
is a string, each character of which
specifies the class of the corresponding values variable.
The characters have the following meanings:
"c"
:rvec_chr
"d"
:rvec_dbl
"i"
:rvec_int
"l"
:rvec_lgl
"?"
: Depends on inputs.
The codes for type
are modified from ones used by the
readr package.
See also
rvec()
to construct a singlervec
.as_list_col()
to convert anrvec
to a list variable.dplyr::group_vars() gives the names of the grouping variables in a grouped data frame.
collapse_to_rvec()
and expand_from_rvec()
are analogous to
tidyr::nest()
and
tidyr::unnest()
though collapse_to_rvec()
and
expand_from_rvec()
move values into and
out of rvecs, while tidyr::nest()
and
tidyr::unnest()
move them in and out
of data frames. (tidyr::nest()
and
tidyr::unnest()
are also a lot
more flexible.)
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
data_db <- tribble(
~occupation, ~sim, ~pay,
"Statistician", 1, 100,
"Statistician", 2, 80,
"Statistician", 3, 105,
"Banker", 1, 400,
"Banker", 2, 350,
"Banker", 3, 420
)
## database format to rvec format
data_rv <- data_db |>
collapse_to_rvec(draw = sim,
values = pay)
data_rv
#> # A tibble: 2 × 2
#> occupation pay
#> <chr> <rdbl<3>>
#> 1 Statistician 100,80,105
#> 2 Banker 400,350,420
## rvec format to database format
data_rv |>
expand_from_rvec()
#> # A tibble: 6 × 3
#> occupation draw pay
#> <chr> <int> <dbl>
#> 1 Statistician 1 100
#> 2 Statistician 2 80
#> 3 Statistician 3 105
#> 4 Banker 1 400
#> 5 Banker 2 350
#> 6 Banker 3 420
## provide a name for the draw variable
data_rv |>
expand_from_rvec(draw = "sim")
#> # A tibble: 6 × 3
#> occupation sim pay
#> <chr> <int> <dbl>
#> 1 Statistician 1 100
#> 2 Statistician 2 80
#> 3 Statistician 3 105
#> 4 Banker 1 400
#> 5 Banker 2 350
#> 6 Banker 3 420
## specify that rvec variable
## must be rvec_int
data_rv <- data_db |>
collapse_to_rvec(draw = sim,
values = pay,
type = "i")
## specify stratifying variable explicitly,
## using 'by' argument
data_db |>
collapse_to_rvec(draw = sim,
values = pay,
by = occupation)
#> # A tibble: 2 × 2
#> occupation pay
#> <chr> <rdbl<3>>
#> 1 Statistician 100,80,105
#> 2 Banker 400,350,420
## specify stratifying variable explicitly,
## using 'group_by'
library(dplyr)
data_db |>
group_by(occupation) |>
collapse_to_rvec(draw = sim,
values = pay)
#> # A tibble: 2 × 2
#> # Groups: occupation [2]
#> occupation pay
#> <chr> <rdbl<3>>
#> 1 Statistician 100,80,105
#> 2 Banker 400,350,420