One commonly requested feature is to be able to run a post hoc Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows examples of such an analysis. Some caveats:
coda
package) before drawing
conclusions. tmbstan
does some checking automatically, so
it produces warnings where the DIY sampler below does not.The first MCMC sampler illustrated below is conceptually simple
(Metropolis with a multivariate normal candidate distribution). Users
who want to do MCMC sampling with less code or on a regular basis should
consider the tmbstan package,
which does more efficient hybrid/Hamiltonian Monte Carlo sampling, and
can take full advantage of glmmTMB
’s ability to provide
gradients of the log-posterior with respect to the parameters. More info
on tmbstan
and the methods that are available to use on the
samples can be found on Github
(e.g. methods(class="stanfit")
). It is safe to skip the
Metropolis-Hastings (DIY) section of the salamander example below if the
user prefers to use tmbstan
.
In general, you can plug the objective function from
glmmTMB
(i.e., fitted_model$obj$fn
) into any
general-purpose MCMC package (e.g. adaptMCMC, MCMCpack (using
MCMCmetrop1R()
), ensemblemcmc,
…). However, most of these algorithms do not take advantage of
glmmTMB
’s automatic computation of gradients; for that, we
will demonstrate the use of tmbstan, which
uses glmmTMB
’s objective and gradient functions in
conjunction with the sampling algorithms from Stan.
Load packages:
Fit basic model:
This is a basic block Metropolis sampler, based on Florian Hartig’s code here.
##' @param start starting value
##' @param V variance-covariance matrix of MVN candidate distribution
##' @param iterations total iterations
##' @param nsamp number of samples to store
##' @param burnin number of initial samples to discard
##' @param thin thinning interval
##' @param tune tuning parameters; expand/contract V
##' @param seed random-number seed
run_MCMC <- function(start,
V,
logpost_fun,
iterations = 10000,
nsamp = 1000,
burnin = iterations/2,
thin = floor((iterations-burnin)/nsamp),
tune = NULL,
seed=NULL
) {
## initialize
if (!is.null(seed)) set.seed(seed)
if (!is.null(tune)) {
tunesq <- if (length(tune)==1) tune^2 else outer(tune,tune)
V <- V*tunesq
}
chain <- matrix(NA, nsamp+1, length(start))
chain[1,] <- cur_par <- start
postval <- logpost_fun(cur_par)
j <- 1
for (i in 1:iterations) {
proposal = MASS::mvrnorm(1,mu=cur_par, Sigma=V)
newpostval <- logpost_fun(proposal)
accept_prob <- exp(newpostval - postval)
if (runif(1) < accept_prob) {
cur_par <- proposal
postval <- newpostval
}
if ((i>burnin) && (i %% thin == 1)) {
chain[j+1,] <- cur_par
j <- j + 1
}
}
chain <- na.omit(chain)
colnames(chain) <- names(start)
chain <- coda::mcmc(chain)
return(chain)
}
Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.
## FIXME: is there a better way for user to extract full coefs?
rawcoef <- with(fm1$obj$env,last.par[-random])
names(rawcoef) <- make.names(names(rawcoef), unique=TRUE)
## log-likelihood function
## (run_MCMC wants *positive* log-lik)
logpost_fun <- function(x) -fm1$obj$fn(x)
## check definitions
stopifnot(all.equal(c(logpost_fun(rawcoef)),
c(logLik(fm1)),
tolerance=1e-7))
V <- vcov(fm1,full=TRUE)
Run the chain:
(running this chain takes 7.6 seconds)
Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):
## (Intercept) minedno zi~(Intercept) zi~minedno sd_site
## 262 243 258 259 249
These effective sizes are probably still too small. In a real
analysis we would stop and make sure we had addressed the
mixing/convergence problems before proceeding; for this simple
sampler, some of our choices would be (1) simply run the chain for
longer; (2) tune the candidate distribution (e.g. by using
tune
to scale some parameters, or perhaps by switching to a
multivariate Student t distribution [see the mvtnorm
package]); (3) add regularizing priors.
Ignoring the problems and proceeding, we can compute column-wise
quantiles or highest posterior density intervals
(coda::HPDinterval
) to get confidence intervals. Plotting
posterior distributions, omitting the intercept because it’s on a very
different scale.
The tmbstan
package allows direct, simple access to a
hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object;
the $obj
component of a glmmTMB
fit is such an
object. (To run this example you’ll need to install the
tmbstan
package and its dependencies.)
Running this command, which creates 4 chains, takes 5.6 seconds, with
no parallelization. If you’re going to do this a lot, you can use
argument cores=n
for running chains in parallel. The
argument is sent to rstan::sampling
so you should look at
the helpfile ?rstan::sampling
. Running
bayestestR::diagnostic_posterior()
on the fit gives the
following results:
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 736 | 0.010 | 1.001 |
beta[2] | 709 | 0.011 | 1.001 |
betazi[1] | 998 | 0.008 | 1.001 |
betazi[2] | 1137 | 0.008 | 1.001 |
theta | 556 | 0.015 | 1.012 |
A trace plot
(rstan::traceplot(m2, pars=c("beta","betazi","theta"))
):
Pairs plot
(pairs(m2, pars = c("beta", "betazi"), gap = 0)
):
Now using the (now) classic sleepstudy
data set included
with the lme4
package:
data("sleepstudy", package = "lme4")
fm2 <- glmmTMB(Reaction ~ Days + (Days | Subject), data = sleepstudy)
t3 <- system.time(m3 <- tmbstan(fm2$obj, seed = 101))
Running this chain produces many warnings about divergent transitions, low effective sample size, etc.; the diagnostic table confirms this.
Diagnostics:
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 11 | 2.339 | 1.138 |
beta[2] | 14 | 0.473 | 1.164 |
betadisp | 8 | 0.026 | 1.201 |
theta[1] | 18 | 0.074 | 1.157 |
theta[2] | 23 | 0.056 | 1.143 |
theta[3] | 3 | 205.252 | 2.073 |
The trace plot confirms that one of the chains (chain 1) is problematic:
One way to address these problems is to add bounds that prevent the chains from going outside a range of ±5 standard errors from the MLE (we originally used ±3σ, but it looked like these bounds were being hit too frequently - if possible we want bounds that will prevent crazy behaviour but not otherwise constrain the chains …)
sdrsum <- summary(fm2$sdr)
par_est <- sdrsum[,"Estimate"]
par_sd <- sdrsum[,"Std. Error"]
t4 <- system.time(m4 <- tmbstan(fm2$obj,
lower = par_est - 5*par_sd,
upper = par_est + 5*par_sd,
seed = 101))
The diagnostics and the trace plot look much better now …
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 2782 | 0.078 | 1.000 |
beta[2] | 514 | 0.070 | 1.006 |
betadisp | 2410 | 0.001 | 1.001 |
theta[1] | 445 | 0.021 | 1.004 |
theta[2] | 1275 | 0.006 | 1.002 |
theta[3] | 1184 | 0.014 | 1.002 |
The last step (for now) is to extract the sampled values, give them interpretable names, and transform the correlation parameter back to a more interpretable scale (see the “Covariance structures” vignette):
samples4 <- as.data.frame(extract(m4, pars=c("beta","betadisp","theta")))
colnames(samples4) <- c(names(fixef(fm2)$cond),
"log(sigma)",
c("log(sd_Intercept)", "log(sd_Days)", "cor"))
samples4$cor <- sapply(samples4$cor, get_cor)
m4_long <- reshape2::melt(as.matrix(samples4))
ggplot(m4_long, aes(x = value)) + geom_histogram(bins = 50) + facet_wrap(~Var2, scale = "free")
A more principled, properly Bayesian way to handle this problem would be to add sensible regularizing priors. (To be done/tested; as of this writing, priors can be set on fixed effects and on random effect standard deviations, but not on random effect correlations …)