Specify a data model for the outcome in a Poisson model, where the outcome is subject to overcount
Arguments
- mod
An object of class
"bage_mod_pois", created withmod_pois().- rate
The prior for the overcoverage rate. A data frame with a variable called
"mean", a variable called"disp", and, optionally, one or more 'by' variables.
Details
The overcount data model assumes that reported values for the outcome overstate the actual values. The reported values might be affected by double-counting, for instance, or might include some people or events that are not in the target population.
The rate argument
The rate argument specifies a prior
distribution for the overcoverage
rate. rate is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a rate of
implies that the reported value for the outcome is expected to overstate the true value by about 5% for females, and about 3% for females, with greater unceratinty for males than females.
Mathematical details
The model for the observed outcome is
$$y_i^{\text{obs}} = y_i^{\text{true}} + \epsilon_i$$ $$\epsilon_i \sim \text{Poisson}(\kappa_{g[i]} \gamma_i w_i)$$ $$\kappa_g \sim \text{Gamma}(1/d_g, 1/(d_g m_g))$$
where
\(y_i^{\text{obs}}\) is the observed outcome for cell \(i\);
\(y_i^{\text{true}}\) is the true outcome for cell \(i\);
\(\epsilon_i\) overcount in cell \(i\);
\(\gamma_i\) is the rate for cell \(i\);
\(w_i\) is exposure for cell \(i\);
\(\kappa_{g[i]}\) is the overcoverage rate for cell \(i\);
\(m_g\) is the expected value for \(\kappa_g\) (specified via
rate); and\(d_g\) is disperson for \(\kappa_g\) (specified via
rate).
See also
mod_pois()Specify a Poisson modelaugment()Original data plus estimated values, including estimates of true value for the outcome variablecomponents()Estimated values for model parameters, including inclusion probabilities and overcount ratesset_datamod_undercount()An undercount-only data modelset_datamod_miscount()An undercount-and-overcount data modeldatamods All data models implemented in
bageconfidential Confidentialization procedures modeled in
bageMathematical Details vignette
Examples
## specify 'rate'
rate <- data.frame(sex = c("Female", "Male"),
mean = c(0.1, 0.13),
disp = c(0.2, 0.2))
## specify model
mod <- mod_pois(divorces ~ age * sex + time,
data = nzl_divorces,
exposure = population) |>
set_datamod_overcount(rate)
mod
#>
#> ------ Unfitted Poisson model ------
#>
#> divorces ~ age * sex + time
#>
#> exposure: population
#> data model: overcount
#>
#> term prior along n_par n_par_free
#> (Intercept) NFix() - 1 1
#> age RW() age 11 11
#> sex NFix() - 2 2
#> time RW() time 11 11
#> age:sex RW() age 22 22
#>
#> disp: mean = 1
#>
#> n_draw var_time var_age var_sexgender
#> 1000 time age sex
#>
## fit model
mod <- mod |>
fit()
#> Building log-posterior function...
#> Finding maximum...
#> Drawing values for hyper-parameters...
mod
#>
#> ------ Fitted Poisson model ------
#>
#> divorces ~ age * sex + time
#>
#> exposure: population
#> data model: overcount
#>
#> term prior along n_par n_par_free std_dev
#> (Intercept) NFix() - 1 1 -
#> age RW() age 11 11 2.00
#> sex NFix() - 2 2 0.36
#> time RW() time 11 11 0.13
#> age:sex RW() age 22 22 0.33
#>
#> disp: mean = 1
#>
#> n_draw var_time var_age var_sexgender optimizer
#> 1000 time age sex nlminb
#>
#> time_total time_max time_draw iter converged message
#> 0.31 0.16 0.13 17 TRUE relative convergence (4)
#>
## original data, plus imputed values for outcome
mod |>
augment()
#> ℹ Adding variable `.divorces` with true values for `divorces`.
#> # A tibble: 242 × 9
#> age sex time divorces .divorces population .observed
#> <fct> <chr> <int> <dbl> <rdbl<1000>> <dbl> <dbl>
#> 1 15-19 Female 2011 0 0 (0, 0) 154460 0
#> 2 15-19 Female 2012 6 6 (4, 6) 153060 0.0000392
#> 3 15-19 Female 2013 3 3 (2, 3) 152250 0.0000197
#> 4 15-19 Female 2014 3 3 (1, 3) 152020 0.0000197
#> 5 15-19 Female 2015 3 3 (1, 3) 152970 0.0000196
#> 6 15-19 Female 2016 3 3 (2, 3) 154170 0.0000195
#> 7 15-19 Female 2017 6 6 (4, 6) 154450 0.0000388
#> 8 15-19 Female 2018 0 0 (0, 0) 154170 0
#> 9 15-19 Female 2019 3 3 (2, 3) 154760 0.0000194
#> 10 15-19 Female 2020 0 0 (0, 0) 154480 0
#> # ℹ 232 more rows
#> # ℹ 2 more variables: .fitted <rdbl<1000>>, .expected <rdbl<1000>>
## parameter estimates
library(dplyr)
mod |>
components() |>
filter(term == "datamod")
#> # A tibble: 2 × 4
#> term component level .fitted
#> <chr> <chr> <chr> <rdbl<1000>>
#> 1 datamod rate Female 0.091 (0.033, 0.19)
#> 2 datamod rate Male 0.11 (0.044, 0.25)
## the data have in fact been confidentialized,
## so we account for that, in addition
## to accounting for overcoverage
mod <- mod |>
set_confidential_rr3() |>
fit()
#> Building log-posterior function...
#> Finding maximum...
#> Drawing values for hyper-parameters...
mod
#>
#> ------ Fitted Poisson model ------
#>
#> divorces ~ age * sex + time
#>
#> exposure: population
#> data model: overcount
#> confidentialization: rr3
#>
#> term prior along n_par n_par_free std_dev
#> (Intercept) NFix() - 1 1 -
#> age RW() age 11 11 1.93
#> sex NFix() - 2 2 0.32
#> time RW() time 11 11 0.13
#> age:sex RW() age 22 22 0.30
#>
#> disp: mean = 1
#>
#> n_draw var_time var_age var_sexgender optimizer
#> 1000 time age sex nlminb
#>
#> time_total time_max time_draw iter converged message
#> 0.75 0.40 0.30 19 TRUE relative convergence (4)
#>