Modular Functions for Mixed Model Fits

Modular functions for mixed model fits

glFormula(formula, data = NULL, family = gaussian,
            subset, weights, na.action, offset, contrasts = NULL,
            start, mustart, etastart, control = glmerControl(), ...)

Arguments

formula

a two-sided linear formula object describing both the fixed-effects and random-effects parts of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars ("|") separating expressions for design matrices from grouping factors.

data

an optional data frame containing the variables named in formula. By default the variables are taken from the environment from which lmer is called. While data is optional, the package authors strongly recommend its use, especially when later applying methods such as update and drop1 to the fitted model (such methods are not guaranteed to work properly if data is omitted). If data is omitted, variables will be taken from the environment of formula (if specified as a formula) or from the parent frame (if specified as a character vector).

subset

an optional expression indicating the subset of the rows of data that should be used in the fit. This can be a logical vector, or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

na.action

a function that indicates what should happen when the data contain NAs. The default action (na.omit, inherited from the 'factory fresh' value of getOption("na.action")) strips any observations with any missing values in any variables.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

control

a list giving

for [g]lFormula:: all options for running the model, see lmerControl;
for mkLmerDevfun,mkGlmerDevfun:: options for the inner optimization step;
for optimizeLmer and optimizeGlmer:: control parameters for nonlinear optimizer (typically inherited from the ... argument to lmerControl).

start

starting values (see lmer; for glFormula, should be just a numeric vector of fixed-effect coefficients)

family

a GLM family; see glm and family.

mustart

optional starting values on the scale of the conditional mean; see glm for details.

etastart

optional starting values on the scale of the unbounded predictor; see glm for details.

...

other potential arguments; for optimizeLmer and optimizeGlmer, these are passed to internal function optwrap, which has relevant parameters calc.derivs and use.last.params (see lmerControl).

Value

lFormula and glFormula return a list containing components:

fr: model frame
X: fixed-effect design matrix
reTrms: list containing information on random effects structure: result of mkReTrms

mkLmerDevfun and mkGlmerDevfun return a function to calculate deviance (or restricted deviance) as a function of the theta (random-effect) parameters. updateGlmerDevfun returns a function to calculate the deviance as a function of a concatenation of theta and beta (fixed-effect) parameters. These deviance functions have an environment containing objects required for their evaluation. CAUTION: The environment of functions returned by mk(Gl|L)merDevfun contains reference class objects (see ReferenceClasses, merPredD-class, lmResp-class), which behave in ways that may surprise many users. For example, if the output of mk(Gl|L)merDevfun is naively copied, then modifications to the original will also appear in the copy (and vice versa). To avoid this behavior one must make a deep copy (see ReferenceClasses for details).

optimizeLmer and optimizeGlmer return the results of an optimization.

Details

These functions make up the internal components of an [gn]lmer fit.

[g]lFormula takes the arguments that would normally be passed to [g]lmer, checking for errors and processing the formula and data input to create a list of objects required to fit a mixed model.
mk(Gl|L)merDevfun takes the output of the previous step (minus the formula component) and creates a deviance function
optimize(Gl|L)mer takes a deviance function and optimizes over theta (or over theta and beta, if stage is set to 2 for optimizeGlmer
updateGlmerDevfun takes the first stage of a GLMM optimization (with nAGQ=0, optimizing over theta only) and produces a second-stage deviance function
mkMerMod takes the environment of a deviance function, the results of an optimization, a list of random-effect terms, a model frame, and a model all and produces a [g]lmerMod object.

Examples

  # \donttest{
  library(lme4)
  ### Fitting a linear mixed model in 4 modularized steps

  ## 1.  Parse the data and formula:
  lmod <- lFormula(Reaction ~ Days + (Days|Subject), sleepstudy)
  names(lmod)
#> [1] "fr"      "X"       "reTrms"  "REML"    "formula" "wmsgs"  
  ## 2.  Create the deviance function to be optimized:
  (devfun <- do.call(mkLmerDevfun, lmod))
#> function (theta) 
#> .Call(lmer_Deviance, pp$ptr(), resp$ptr(), as.double(theta))
#> <environment: 0x558a01d507c0>
  ls(environment(devfun)) # the environment of 'devfun' contains objects
#> [1] "lmer_Deviance" "lower"         "pp"            "resp"         
  # required for its evaluation
  ## 3.  Optimize the deviance function:
  opt <- optimizeLmer(devfun)
  opt[1:3]
#> $par
#> [1] 0.96674177 0.01516906 0.23090995
#> 
#> $fval
#> [1] 1743.628
#> 
#> $feval
#> [1] 43
#> 
  ## 4.  Package up the results:
  mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr)
#> Linear mixed model fit by REML ['lmerMod']
#> REML criterion at convergence: 1743.628
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 24.741       
#>           Days         5.922   0.07
#>  Residual             25.592       
#> Number of obs: 180, groups:  Subject, 18
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.41        10.47  


  ### Same model in one line
  lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (Days | Subject)
#>    Data: sleepstudy
#> REML criterion at convergence: 1743.628
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 24.741       
#>           Days         5.922   0.07
#>  Residual             25.592       
#> Number of obs: 180, groups:  Subject, 18
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.41        10.47  


  ### Fitting a generalized linear mixed model in six modularized steps

  ## 1.  Parse the data and formula:
  glmod <- glFormula(cbind(incidence, size - incidence) ~ period + (1 | herd),
                     data = cbpp, family = binomial)
  #.... see what've got :
  str(glmod, max=1, give.attr=FALSE)
#> List of 6
#>  $ fr     :'data.frame':	56 obs. of  3 variables:
#>  $ X      : num [1:56, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ reTrms :List of 10
#>  $ family :List of 13
#>  $ formula:Class 'formula'  language cbind(incidence, size - incidence) ~ period + (1 | herd)
#>  $ wmsgs  : chr(0) 
  ## 2.  Create the deviance function for optimizing over theta:
  (devfun <- do.call(mkGlmerDevfun, glmod))
#> function (theta) 
#> {
#>     resp$updateMu(lp0)
#>     pp$setTheta(theta)
#>     p <- pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GHrule(0L), 
#>         compDev = compDev, maxit = maxit, verbose = verbose)
#>     resp$updateWts()
#>     p
#> }
#> <environment: 0x558a17ce1418>
  ls(environment(devfun)) # the environment of devfun contains lots of info
#> [1] "compDev"     "lower"       "lp0"         "maxit"       "pp"         
#> [6] "pwrssUpdate" "resp"        "tolPwrss"    "verbose"    
  ## 3.  Optimize over theta using a rough approximation (i.e. nAGQ = 0):
  (opt <- optimizeGlmer(devfun))
#> parameter estimates: 0.641838555349172 
#> objective: 184.108693002453 
#> number of function evaluations: 18 
  ## 4.  Update the deviance function for optimizing over theta and beta:
  (devfun <- updateGlmerDevfun(devfun, glmod$reTrms))
#> function (pars) 
#> {
#>     resp$setOffset(baseOffset)
#>     resp$updateMu(lp0)
#>     pp$setTheta(as.double(pars[dpars]))
#>     spars <- as.numeric(pars[-dpars])
#>     offset <- if (length(spars) == 0) 
#>         baseOffset
#>     else baseOffset + pp$X %*% spars
#>     resp$setOffset(offset)
#>     p <- pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, 
#>         compDev = compDev, grpFac = fac, maxit = maxit, verbose = verbose)
#>     resp$updateWts()
#>     p
#> }
#> <environment: 0x558a17ce1418>
  ## 5.  Optimize over theta and beta:
  opt <- optimizeGlmer(devfun, stage=2)
  str(opt, max=1) # seeing what we'got
#> List of 7
#>  $ fval       : num 184
#>  $ par        : num [1:5] 0.642 -1.398 -0.992 -1.128 -1.58
#>  $ convergence: num 0
#>  $ NM.result  : int 3
#>  $ message    : chr "parameter values converged to within tolerance"
#>  $ control    :List of 17
#>  $ feval      : num 285
#>  - attr(*, "optimizer")= chr "Nelder_Mead"
#>  - attr(*, "control")=List of 3
#>  - attr(*, "warnings")= list()
#>  - attr(*, "derivs")=List of 2
  ## 6.  Package up the results:
  (fMod <- mkMerMod(environment(devfun), opt, glmod$reTrms, fr = glmod$fr))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>  194.0531  204.1799  -92.0266  184.0531        51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797  

  ### Same model in one line
  fM <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial)
  all.equal(fMod, fM, check.attributes=FALSE, tolerance = 1e-12)
#> [1] TRUE
  # ----  --  even tolerance = 0  may work
  # }