Post-hoc MCMC with glmmTMB

One commonly requested feature is to be able to run a post hoc Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows an example of such an analysis. Some caveats:

  • when using such a “pseudo-Bayesian” approach, be aware that using a scaled likelihood (implicit, improper priors) can often cause problems, especially when the model is poorly constrained by the data
  • in particular, models with poorly constrained random effects (singular or nearly singular) are likely to give bad results
  • as shown below, even models that are well-behaved for frequentist fitting may need stronger priors to give well-behaved MCMC results
  • as with all MCMC analysis, it is the user’s responsibility to check for proper mixing and convergence of the chains before drawing conclusions
  • the first MCMC sampler illustrated below is simple (Metropolis with a multivariate normal candidate distribution). Users who want to do MCMC sampling on a regular basis should consider the tmbstan package, which does much more efficient hybrid/Hamiltonian Monte Carlo sampling.

Load packages:

library(glmmTMB)
library(coda)     ## MCMC utilities
library(reshape2) ## for melt()
## graphics
library(lattice)
library(ggplot2); theme_set(theme_bw())

Fit basic model:

data("sleepstudy",package="lme4")
fm1 <- glmmTMB(Reaction ~ Days + (Days|Subject),
               sleepstudy)

Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.

## FIXME: is there a better way for user to extract full coefs?
rawcoef <- with(fm1$obj$env,last.par[-random])
names(rawcoef) <- make.names(names(rawcoef),unique=TRUE)
## log-likelihood function 
## (MCMCmetrop1R wants *positive* log-lik)
logpost_fun <- function(x) -fm1$obj$fn(x)
## check definitions
stopifnot(all.equal(c(logpost_fun(rawcoef)),
                    c(logLik(fm1)),
          tolerance=1e-7))
V <- vcov(fm1,full=TRUE)

This is a basic block Metropolis sampler, based on Florian Hartig’s code here.

##' @param start starting value
##' @param V variance-covariance matrix of MVN candidate distribution
##' @param iterations total iterations
##' @param nsamp number of samples to store
##' @param burnin number of initial samples to discard
##' @param thin thinning interval
##' @param tune tuning parameters; expand/contract V
##' @param seed random-number seed
run_MCMC <- function(start,
                     V,   
                     logpost_fun,
                     iterations = 10000,
                     nsamp = 1000,
                     burnin = iterations/2,
                     thin = floor((iterations-burnin)/nsamp),
                     tune = NULL,
                     seed=NULL
                     ) {
    ## initialize
    if (!is.null(seed)) set.seed(seed)
    if (!is.null(tune)) {
        tunesq <- if (length(tune)==1) tune^2 else outer(tune,tune)
        V <-  V*tunesq
    }
    chain <- matrix(NA, nsamp+1, length(start))
    chain[1,] <- cur_par <- start
    postval <- logpost_fun(cur_par)
    j <- 1
    for (i in 1:iterations) {
        proposal = MASS::mvrnorm(1,mu=cur_par, Sigma=V)
        newpostval <- logpost_fun(proposal)
        accept_prob <- exp(newpostval - postval)
        if (runif(1) < accept_prob) {
            cur_par <- proposal
            postval <- newpostval
        }
        if ((i>burnin) && (i %% thin == 1)) {
            chain[j+1,] <- cur_par
            j <- j + 1
        }
    }
    chain <- na.omit(chain)
    colnames(chain) <- names(start)
    chain <- coda::mcmc(chain)
    return(chain)
}

Run the chain:

t1 <- system.time(m1 <- run_MCMC(start=rawcoef,
                                 V=V, logpost_fun=logpost_fun,
                                 seed=1001))

(running this chain takes 4.5 seconds)

Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):

colnames(m1) <- c(names(fixef(fm1)[[1]]),
                  "log(sigma)",
                  c("log(sd_Intercept)","log(sd_Days)","cor"))
m1[,"cor"] <- sapply(m1[,"cor"], get_cor)
xyplot(m1,layout=c(2,3),asp="fill")

The trace plots are poor, especially for the correlation; the effective sample size backs this up, as would any other diagnostics we did.

print(effectiveSize(m1),digits=3)

In a real analysis we would stop and fix the mixing/convergence problems before proceeding; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using tune to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the mvtnorm package]); (3) add regularizing priors.

Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (coda::HPDinterval) to get confidence intervals. Plotting posterior distributions, omitting the intercept because it’s on a very different scale.

tmbstan

The tmbstan package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the $obj component of a glmmTMB fit is such an object. (To run this example you’ll need to install the tmbstan package and its dependencies.)

## install.packages("tmbstan")
library(tmbstan)
t2 <- system.time(m2 <- tmbstan(fm1$obj))

(running this command, which creates 4 chains, takes 336.1 seconds)

However, there are many indications (warning messages; trace plots) that the correlation parameter needs to be given a more informative prior. (In the plot below, the correlation parameter is shown on its unconstrained scale; the actual correlation would be $\theta_3/\sqrt{1+\theta_3^2}$.)

To do

  • solve mixing for cor parameter
  • more complex example - e.g. Owls