Use a fitted model to create replicate datasets, typically as a way of checking a model.
Arguments
- x
A fitted model, typically created by calling
mod_pois()
,mod_binom()
, ormod_norm()
, and thenfit()
.- condition_on
Parameters to condition on. Either
"expected"
or"fitted"
. See details.- n
Number of replicate datasets to create. Default is 19.
Value
A tibble with the following structure:
.replicate | data |
"Original" | Original data supplied to mod_pois() , mod_binom() , mod_norm() |
"Replicate 1" | Simulated data. |
"Replicate 2" | Simulated data. |
... | ... |
"Replicate <n>" | Simulated data. |
Details
Use n
draws from the posterior distribution
for model parameters to generate n
simulated datasets.
If the model is working well, these simulated
datasets should look similar to the actual dataset.
The condition_on
argument
With Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.
When
condition_on
is"fitted"
, the replicate data is created by (i) drawing values from the posterior distribution for rates or probabilities (the \(\gamma_i\) defined inmod_pois()
andmod_binom()
), and (ii) conditional on these rates or probabilities, drawing values for the outcome variable.When
condition_on
is"expected"
, the replicate data is created by (i) drawing values from hyper-parameters governing the rates or probabilities (the \(\mu_i\) and \(\xi\) defined inmod_pois()
andmod_binom()
), then (ii) conditional on these hyper-parameters, drawing values for the rates or probabilities, and finally (iii) conditional on these rates or probabilities, drawing values for the outcome variable.
The default for condition_on
is "expected"
.
The "expected"
option
provides a more severe test for
a model than the "fitted"
option,
since "fitted" values are weighted averages
of the "expected" values and the original
data.
As described in mod_norm()
, normal models
have a different structure from Poisson
and binomial models, and the distinction between
"fitted"
and "expected"
does not apply.
Data models for outcomes
If a data model has been provided for
the outcome variable, then creation of replicate
data will include a step where errors are added
to outcomes. For instance, the a rr3
data model is used, then replicate_data()
rounds
the outcomes to base 3.
See also
mod_pois()
,mod_binom()
,mod_norm()
Create model.fit()
Fit model.report_sim()
Simulation study of model.
Examples
mod <- mod_pois(injuries ~ age:sex + ethnicity + year,
data = nzl_injuries,
exposure = 1) |>
fit()
rep_data <- mod |>
replicate_data()
library(dplyr)
rep_data |>
group_by(.replicate) |>
count(wt = injuries)
#> # A tibble: 20 × 2
#> # Groups: .replicate [20]
#> .replicate n
#> <fct> <dbl>
#> 1 Original 21588
#> 2 Replicate 1 20871
#> 3 Replicate 2 21848
#> 4 Replicate 3 21526
#> 5 Replicate 4 21872
#> 6 Replicate 5 21328
#> 7 Replicate 6 20551
#> 8 Replicate 7 20810
#> 9 Replicate 8 21680
#> 10 Replicate 9 22176
#> 11 Replicate 10 22164
#> 12 Replicate 11 20902
#> 13 Replicate 12 21068
#> 14 Replicate 13 20127
#> 15 Replicate 14 20715
#> 16 Replicate 15 21828
#> 17 Replicate 16 20979
#> 18 Replicate 17 21442
#> 19 Replicate 18 21761
#> 20 Replicate 19 21866
## when the overall model includes an rr3 data model,
## replicate data are rounded to base 3
mod_pois(injuries ~ age:sex + ethnicity + year,
data = nzl_injuries,
exposure = popn) |>
set_datamod_outcome_rr3() |>
fit() |>
replicate_data()
#> # A tibble: 18,240 × 7
#> .replicate age sex ethnicity year injuries popn
#> <fct> <fct> <chr> <chr> <int> <dbl> <int>
#> 1 Original 0-4 Female Maori 2000 12 35830
#> 2 Original 5-9 Female Maori 2000 6 35120
#> 3 Original 10-14 Female Maori 2000 3 32830
#> 4 Original 15-19 Female Maori 2000 6 27130
#> 5 Original 20-24 Female Maori 2000 6 24380
#> 6 Original 25-29 Female Maori 2000 6 24160
#> 7 Original 30-34 Female Maori 2000 12 22560
#> 8 Original 35-39 Female Maori 2000 3 22230
#> 9 Original 40-44 Female Maori 2000 6 18130
#> 10 Original 45-49 Female Maori 2000 6 13770
#> # ℹ 18,230 more rows