throbber
JSS
`
`Journal of Statistical Software
`
`December 2007, Volume 23, Issue 7.
`
`http://www.jstatsoft.org/
`
`Generalized Additive Models for Location Scale
`and Shape (GAMLSS) in R
`
`D. Mikis Stasinopoulos
`London Metropolitan University
`
`Robert A. Rigby
`London Metropolitan University
`
`Abstract
`
`GAMLSS is a general framework for fitting regression type models where the distribu-
`tion of the response variable does not have to belong to the exponential family and includes
`highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the
`parameters of the distribution of the response variable to be modelled as linear/non-linear
`or smooth functions of the explanatory variables. This paper starts by defining the sta-
`tistical framework of GAMLSS, then describes the current implementation of GAMLSS
`in R and finally gives four different data examples to demonstrate how GAMLSS can be
`used for statistical modelling.
`
`Keywords: Box-Cox transformation, centile estimation, cubic smoothing splines, LMS method,
`negative binomial, non-normal, non-parametric, overdispersion, penalized likelihood, skewness
`and kurtosis.
`
`1. What is GAMLSS?
`
`1.1. Introduction
`
`Generalized additive models for location, scale and shape (GAMLSS) are semi-parametric
`regression type models. They are parametric, in that they require a parametric distribu-
`tion assumption for the response variable, and “semi” in the sense that the modelling of
`the parameters of the distribution, as functions of explanatory variables, may involve using
`non-parametric smoothing functions. GAMLSS were introduced by Rigby and Stasinopoulos
`(2001, 2005) and Akantziliotou, Rigby, and Stasinopoulos (2002) as a way of overcoming some
`of the limitations associated with the popular generalized linear models, GLM, and general-
`ized additive models, GAM (see Nelder and Wedderburn 1972; Hastie and Tibshirani 1990,
`respectively).
`
`00001
`
`EX1073
`
`

`

`2
`
`GAMLSS in R
`
`In GAMLSS the exponential family distribution assumption for the response variable (y) is
`relaxed and replaced by a general distribution family, including highly skew and/or kurtotic
`continuous and discrete distributions. The systematic part of the model is expanded to
`allow modelling not only of the mean (or location) but other parameters of the distribution
`of y as, linear and/or non-linear, parametric and/or additive non-parametric functions of
`explanatory variables and/or random effects. Hence GAMLSS is especially suited to modelling
`a response variable which does not follow an exponential family distribution, (e.g., leptokurtic
`or platykurtic and/or positive or negative skew response data, or overdispersed counts) or
`which exhibit heterogeneity, (e.g., where the scale or shape of the distribution of the response
`variable changes with explanatory variables(s)).
`There are several R-packages that can be seen as related to the gamlss packages and to its R
`implementation. The original gam package (Hastie 2006), the recommenced R package mgcv
`(Wood 2001), the general smoothing splines package gss (Gu 2007) and the vector GAM
`package, VGAM (Yee 2007). The first three deal mainly with models for the mean from an
`exponential family distribution. The VGAM package allows the modelling from a variety
`of different distributions (usually up to three parameter ones) and also allows multivariate
`responses.
`The remainder of Section 1 defines the GAMLSS model, available distributions, available
`additive terms and model fitting. Section 2 describes the R gamlss package for fitting the
`GAMLSS model. Section 3 gives four data examples to illustrate GAMLSS modelling.
`
`1.2. The GAMLSS model
`
`A GAMLSS model assumes independent observations yi for i = 1, 2, . . . , n with probability
`(density) function f(yi|θi) conditional on θi = (θ1i, θ2i, θ3i, θ4i) = (µi, σi, νi, τi) a vector of four
`distribution parameters, each of which can be a function to the explanatory variables. We
`shall refer to (µi, σi, νi, τi) as the distribution parameters. The first two population distribution
`parameters µi and σi are usually characterized as location and scale parameters, while the
`remaining parameter(s), if any, are characterized as shape parameters, e.g., skewness and
`kurtosis parameters, although the model may be applied more generally to the parameters of
`any population distribution, and can be generalized to more than four distribution parameters.
`Rigby and Stasinopoulos (2005) define the original formulation of a GAMLSS model as follows.
`Let y(cid:62) = (y1, y2, . . . , yn) be the n length vector of the response variable. Also for k =
`1, 2, 3, 4, let gk(.) be known monotonic link functions relating the distribution parameters to
`explanatory variables by
`
`Jk(cid:88)
`
`Zjkγjk,
`
`(1)
`
`i.e.
`
`gk(θk) = ηk = Xkβk +
`
`g1(µ) = η1 = X1β1 +
`
`g2(σ) = η2 = X2β2 +
`
`j=1
`
`J1(cid:88)
`J2(cid:88)
`
`j=1
`
`j=1
`
`Zj1γj1
`
`Zj2γj2
`
`00002
`
`

`

`Journal of Statistical Software
`
`3
`
`J3(cid:88)
`J4(cid:88)
`
`g3(ν) = η3 = X3β3 +
`
`Zj3γj3
`
`j=1
`
`g4(τ ) = η4 = X4β4 +
`
`Zj4γj4.
`
`j=1
`
`(cid:16)− 1
`
`where µ, σ, ν τ and ηk are vectors of length n, β(cid:62)
`k = (β1k, β2k, . . . , βJ(cid:48)kk) is a parameter
`
`k, Xk is a fixed known design matrix of order n × J(cid:48)
`vector of length J(cid:48)
`k, Zjk is a fixed known
`n × qjk design matrix and γjk is a qjk dimensional random variable which is assumed to
`
`
`jk is the (generalized) inverse of a qjk × qjkbe distributed as γjk ∼ Nqjk(0, G−1jk ), where G−1
`(cid:17)
`symmetric matrix Gjk = Gjk(λjk) which may depend on a vector of hyperparameters λjk,
`and where if Gjk is singular then γjk is understood to have an improper prior density function
`2 γ(cid:62)
`proportional to exp
`.
`jkGjkγjk
`The model in (1) allows the user to model each distribution parameter as a linear function of
`explanatory variables and/or as linear functions of stochastic variables (random effects). Note
`that seldom will all distribution parameters need to be modelled using explanatory variables.
`There are several important sub-models of GAMLSS. For example for readers familiar with
`smoothing, the following GAMLSS sub-model formulation may be more familiar. Let Zjk =
`In, where In is an n × n identity matrix, and γjk = hjk = hjk(xjk) for all combinations of j
`and k in (1), then we have the semi-parametric additive formulation of GAMLSS given by
`Jk(cid:88)
`
`gk(θk) = ηk = Xkβk +
`
`hjk(xjk)
`
`(2)
`
`j=1
`
`where to abbreviate the notation use θk for k = 1, 2, 3, 4 to represent the distribution param-
`eter vectors µ, σ, ν and τ , and where xjk for j = 1, 2, . . . , Jk are also vectors of length n.
`The function hjk is an unknown function of the explanatory variable Xjk and hjk = hjk(xjk)
`is the vector which evaluates the function hjk at xjk. If there are no additive terms in any of
`the distribution parameters we have the simple parametric linear GAMLSS model,
`
`g1(θk) = ηk = Xkβk
`
`(3)
`
`Model (2) can be extended to allow non-linear parametric terms to be included in the model
`for µ, σ, ν and τ, as follows (see Rigby and Stasinopoulos 2006)
`
`Jk(cid:88)
`
`gk(θk) = ηk = hk(Xk, βk) +
`
`hjk(xjk)
`
`(4)
`
`j=1
`
`where hk for k = 1, 2, 3, 4 are non-linear functions and Xk is a known design matrix of order
`n×J
`(cid:48)(cid:48)
`k . We shall refer to the model in (4) as the non-linear semi-parametric additive GAMLSS
`model. If, for k = 1, 2, 3, 4, Jk = 0, that is, if for all distribution parameters we do not have
`additive terms, then model (4) is reduced to a non-linear parametric GAMLSS model.
`
`gk(θk) = ηk = hk(Xk, βk).
`If, in addition, hk(Xk, βk) = X(cid:62)
`k βk for i = 1, 2, . . . , n and k = 1, 2, 3, 4 then (5) reduces to the
`linear parametric model (3). Note that some of the terms in each hk(Xk, βk) may be linear,
`
`(5)
`
`00003
`
`

`

`4
`
`GAMLSS in R
`
`in which case the GAMLSS model is a combination of linear and non-linear parametric terms.
`We shall refer to any combination of models (3) or (5) as a parametric GAMLSS model.
`The parametric vectors βk and the random effects parameters γjk, for j = 1, 2, . . . , Jk and
`k = 1, 2, 3, 4 are estimated within the GAMLSS framework (for fixed values of the smoothing
`hyper-parameters λjk’s) by maximising a penalized likelihood function (cid:96)p given by
`
`p(cid:88)
`
`Jk(cid:88)
`
`k=1
`
`j=1
`
`(cid:96)p = (cid:96) − 1
`2
`
`λjkγ(cid:48)
`jkGjkγjk
`
`(6)
`
`where (cid:96) =(cid:80)n
`i=1 log f(yi|θi) is the log likelihood function. More details on how the penalized
`log likelihood (cid:96)p is maximized are given in Section 1.5. For parametric GAMLSS model
`(3) or (5), (cid:96)p reduces to (cid:96), and the βk for k = 1, 2, 3, 4 are estimated by maximizing the
`likelihood function (cid:96). The available distributions and the different additive terms in the
`current GAMLSS implementation in R are given in Sections 1.3 and 1.4 respectively. The R
`function to fit a GAMLSS model is gamlss() in the package gamlss which will be described
`in more detail in Section 2.
`
`1.3. Available distributions in GAMLSS
`The form of the distribution assumed for the response variable y, f(yi|µi, σi, νi, τi), can be very
`general. The only restriction that the R implementation of GAMLSS has is that the function
`log f(yi|µi, σi, νi, τi) and its first (and optionally expected second and cross) derivatives with
`respect to each of the parameters of θ must be computable. Explicit derivatives are preferable
`but numerical derivatives can be used.
`Table 1 shows a variety of one, two, three and four parameter families of continuous distribu-
`tions implemented in our current software version. Table 2 shows the discrete distributions.
`We shall refer to the distributions in Tables 1 and 2 as the gamlss.family distributions,
`a name to coincide with the R object created by the package gamlss. Johnson, Kotz, and
`Balakrishnan (1994, 1995); Johnson, Kotz, and Kemp (2005) are the classical reference books
`for most of the distributions in Tables 1 and 2. The BCCG distribution in Table 1 is the Box-
`Cox transformation model used by Cole and Green (1992) (also known as the LMS method of
`centile estimation). The BCPE and BCT distributions, described in Rigby and Stasinopou-
`los (2004, 2006) respectively, generalize the BCCG distribution to allow modelling of both
`skewness and kurtosis. For some of the distributions shown in Tables 1 and 2 more that one
`parameterization has been implemented. For example, the two parameter Weibull distribu-
`f(y|µ, σ) = σµyσ−1e−µyσ, denoted as WEI2, or as f(y|µ, σ) = (σ/β) (y/β)σ−1 exp {− (y/β)σ}
`denoted as WEI3, for β = µ/ [Γ(1/σ) + 1]. Note that the second parameterization WEI2 is
`suited to proportional hazard (PH) models. In the WEI3 parameterization, parameter µ is
`equal to the mean of y. The choice of parameterization depends upon the particular prob-
`lem, but some parameterizations are computationally preferable to others in the sense that
`maximization of the likelihood function is easier. This usually happens when the parameters
`µ, σ, ν and τ are orthogonal or almost orthogonal. For interpretation purposes we favour pa-
`rameterizations where the parameter µ is a location parameter (mean, median or mode). The
`specific parameterizations used in the gamlss.family distributions are given in the appendix
`of Stasinopoulos, Rigby, and Akantziliotou (2006).
`
`tion can be parameterized as f(y|µ, σ) =(cid:0)σyσ−1/µσ(cid:1) exp {−(y/µ)σ}, denoted as WEI, or as
`
`00004
`
`

`

`Journal of Statistical Software
`
`5
`
`Distributions
`beta
`beta inflated (at 0)
`beta inflated (at 1)
`beta inflated (at 0 and 1 )
`Box-Cox Cole and Green
`Box-Cox power exponential
`Box-Cox-t
`exponential
`exponential Gaussian
`exponential gen. beta type 2
`gamma
`generalized beta type 1
`generalized beta type 2
`generalized gamma
`generalized inverse Gaussian
`generalized y
`Gumbel
`inverse Gaussian
`Johnson’s SU (µ the mean)
`Johnson’s original SU
`logistic
`log normal
`log normal (Box-Cox)
`NET
`normal
`normal family
`power exponential
`reverse Gumbel
`skew power exponential type 1
`skew power exponential type 2
`skew power exponential type 3
`skew power exponential type 4
`shash
`skew t type 1
`skew t type 2
`skew t type 3
`skew t type 4
`skew t type 5
`t Family
`Weibull
`Weibull (PH)
`Weibull (µ the mean)
`zero adjusted IG
`
`R Name
`BE()
`BEOI()
`BEZI()
`BEINF()
`BCCG()
`BCPE()
`BCT()
`EXP()
`exGAUS()
`EGB2()
`GA()
`GB1()
`GB2()
`GG()
`GIG()
`GT()
`GU()
`IG()
`JSU()
`JSUo()
`LO()
`LOGNO()
`LNO()
`NET()
`NO()
`NOF()
`PE()
`RG()
`SEP1()
`SEP2()
`SEP3()
`SEP4()
`SHASH()
`ST1()
`ST2()
`ST3()
`ST4()
`ST5()
`TF()
`WEI()
`WEI2()
`WEI3()
`ZAIG()
`

`logit
`logit
`logit
`logit
`identity
`identity
`identity
`log
`identity
`identity
`log
`logit
`log
`log
`log
`identity
`identity
`log
`identity
`identity
`identity
`log
`log
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`log
`log
`log
`log
`

`logit
`log
`log
`logit
`log
`log
`log
`-
`log
`identity
`log
`logit
`identity
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`

`-
`logit
`logit
`log
`identity
`identity
`identity
`-
`log
`log
`-
`log
`log
`identity
`identity
`log
`-
`-
`identity
`identity
`-
`-
`fixed
`fixed
`-
`identity
`log
`-
`identity
`identity
`log
`log
`log
`identity
`identity
`log
`log
`identity
`log
`-
`-
`-
`logit
`

`-
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`-
`log
`log
`-
`-
`-
`fixed
`-
`-
`-
`-
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`-
`-
`-
`-
`-
`
`Table 1: Continuous distributions implemented within the gamlss packages (with default link
`functions).
`
`00005
`
`

`

`6
`
`GAMLSS in R
`
`Distributions
`beta binomial
`binomial
`Delaporte
`Negative Binomial type I
`Negative Binomial type II
`Poisson
`Poisson inverse Gaussian
`Sichel
`Sichel (µ the mean)
`zero inflated poisson
`zero inflated poisson (µ the mean)
`
`R Name
`BB()
`BI()
`DEL()
`NBI()
`NBII()
`PO()
`PIG()
`SI()
`SICHEL()
`ZIP()
`ZIP2()
`

`logit
`logit
`log
`log
`log
`log
`log
`log
`log
`log
`log
`

`log
`-
`log
`log
`log
`-
`log
`log
`log
`logit
`logit
`

`-
`-
`logit
`-
`-
`-
`-
`identity
`identity
`-
`-
`
`Table 2: Discrete distributions implemented within the gamlss packages (with default link
`functions).
`
`All of the distributions in Tables 1 and 2 have d, p, q and r functions corresponding respec-
`tively to the probability density function (pdf), the cumulative distribution function (cdf),
`the quantiles (i.e., inverse cdf) and random value generating functions. For example, the
`gamma distribution has the functions dGA, pGA, qGA and rGA. In addition each distribution
`has a fitting function which helps the fitting procedure by providing link functions, first and
`(exact of approximate) expected second derivatives, starting values etc. All fitting functions
`have as arguments the link functions for the distribution parameters. For example, the fitting
`function for the gamma distribution is called GA with arguments mu.link and sigma.link.
`The default link functions for all gamlss.family distributions are shown in columns 3–6 of
`Tables 1 and 2. The function show.link() can be used to identify which are the available
`links for the distribution parameter within each of the gamlss.family. Available link func-
`tions can be the usual glm() link functions plus logshifted, logitshifted and own. The
`own option allows the user to define his/her own link function, for an example see the help
`file on the function make.link.gamlss().
`There are several ways to extend the gamlss.family distributions. This can be achieved by
`• creating a new gamlss.family distribution
`• truncating an existing gamlss.family
`• using a censored version of an existing gamlss.family
`• mixing different gamlss.family distributions to create a new finite mixture distribution.
`
`To create a new gamlss.family distribution is relatively simple, if the pdf function of the
`distribution can be evaluated easily. To do that, find a file of a current gamlss.family
`distribution, (having the same number of distribution parameters) and amend accordingly.
`For more details, on how this can be done, see Stasinopoulos et al. (2006, Section 4.2).
`Truncating existing gamlss.family distributions can be achieved by using the add-on package
`gamlss.tr. The function gen.trun(), within the gamlss.tr package, can take any gamlss.family
`
`00006
`
`

`

`Journal of Statistical Software
`
`7
`
`distribution and generate the d, p, q, r and fitting R functions for the specified truncated dis-
`tribution. The truncation can be left, right or in both tails of the range of the response y
`variable.
`The package gamlss.cens is designed for the situation where the response variable is censored
`or, more generally, it has been observed in an interval form, e.g., (3, 10] an interval from
`3 to 10 (including only the right end point 10). The function gen.cens() will take any
`gamlss.family distribution and create a new function which can fit a response of “interval”
`type. Note that for “interval” response variables the usual likelihood function for independent
`response variables defined as
`
`n(cid:89)
`
`L(θ) =
`
`f(yi|θ)
`
`changes to
`
`i=1
`
`n(cid:89)
`
`i=1
`
`L(θ) =
`
`[F (y2i|θ) − F (y1i|θ)]
`
`where F (y) is the cumulative distribution function and (y1i, y2i) is the observed interval.
`Finite mixtures of gamlss.family distributions can be fitted using the package gamlss.mx.
`A finite mixture of gamlss.family distributions will have the form
`
`(7)
`
`(8)
`
`(9)
`
`K(cid:88)
`is the prior (or mixing) probability of component k, for k = 1, 2, . . . , K. Also (cid:80)K
`
`fY (y|ψ) =
`
`πkfk(y|θk)
`
`k=1
`
`where fk(y|θk) is the probability (density) function of y for component k, and 0 ≤ πk ≤ 1
`k=1 πk = 1
`and ψ = (θ, π) where θ = (θ1, θ2, . . . , θk) and π = (π1, π2, . . . , πK). Any combination
`of (continuous or discrete) gamlss.family distributions can be used. The model in this
`case is fitted using the EM algorithm. The component probability (density) functions may
`have different parameters (fitted using the function gamlssMX()) or may have parameters in
`common (fitted using the function gamlssNP()). In the former case, the mixing probabilities
`may also be modelled using explanatory variables and the finite mixture may have a zero
`component (e.g., zero inflated negative binomial etc.). Both functions gamlssMX()) and
`gamlssNP() are in the add on package gamlss.mx.
`
`1.4. Available additive terms in GAMLSS
`
`Equation (1) allows the user to model all the distribution parameters µ, σ, ν and τ as
`linear parametric and/or non-linear parametric and/or non-parametric (smooth) function of
`the explanatory variables and/or random effects terms. For modelling linear functions the
`Wilkinson and Rogers (1973) notation as applied for model formulae in the S language by
`Chambers and Hastie (1992) can be used.
`It is the model formulae notation used in the
`fit of linear models, lm(), and generalized lineal models, glm(), see for example Venables
`and Ripley (2002, Section 6.2). For fitting non-linear or non-parametric (smooth) functions
`or random effects terms, an additive term function has to be fitted. Parametric non-linear
`models can be also fitted using the function nlgamlss() of the add-on package gamlss.nl.
`
`00007
`
`

`

`8
`
`GAMLSS in R
`
`Additive terms
`Cubic splines
`Varying coefficient
`Penalized splines
`loess
`Fractional polynomials
`Power polynomials
`non-linear fit
`Random effects
`Random effects
`Random coefficient
`
`R Name
`cs()
`vc()
`ps()
`lo()
`fp()
`pp()
`nl()
`random()
`ra()
`rc()
`
`Table 3: Additive terms implemented within the gamlss packages.
`
`(cid:104)
`
`λ(cid:82) ∞
`
`(cid:105)2
`
`Table 3 shows the additive term functions implemented in the current R implementation of
`GAMLSS. Note that all available additive terms names are stored in the list .gamlss.sm.list.
`The cubic spline function cs() is based on the smooth.spline() function of R and can be
`used for univariate smoothing. Cubic splines are covered extensively in the literature (Reinsch
`1967; Green and Silverman 1994; Hastie and Tibshirani 1990, Chapter 2). They assume in
`model (2) that the functions h(t) are arbitrary twice continuously differentiable functions
`and we maximize a penalized log likelihood, given by (cid:96) subject to penalty terms of the form
`(cid:48)(cid:48)(t)h
`dt. The solution for the maximizing functions h(t) are all natural cubic splines,
`−∞
`
`and hence can be expressed as linear combinations of their natural cubic spline basis functions
`(de Boor 1978). In cs() each distinct x-value is a knot.
`The varying coefficient terms were introduced by Hastie and Tibshirani (1993) to accommo-
`date a special type of interaction between explanatory variables. This interaction takes the
`form of β(r)x, that is the linear coefficient of the explanatory variable x is changing smoothly
`according to another explanatory variable r. In some applications r will be time. In general
`r should be a continuous variable, while x can be either continuous or categorical. In the
`current GAMLSS implementation x has to be continuous or a two level factor with levels 0
`and 1.
`Penalized splines were introduced by Eilers and Marx (1996). Penalized Splines (or P-splines)
`are piecewise polynomials defined by B-spline basis functions in the explanatory variable,
`where the coefficients of the basis functions are penalized to guarantee sufficient smoothness
`(see Eilers and Marx 1996). More precisely consider the model θ = Z(x)γ where θ can be
`any distributional parameter in a GAMLSS model, Z(x) is n × q basis design matrix for the
`explanatory variable x defined at q-different knots mostly within the range of x and γ is a
`q × 1 vector of coefficients which have some stochastic restrictions imposed by the fact that
`Dγ ∼ N(0, λ−1I) or equivalently by γ ∼ N(0, λ−1K−) where K = D(cid:62)D. The matrix D is a
`(q− r)× q matrix giving rth differences of the q-dimensional vector γ. So to define a penalized
`spline we need:
`i) q the number of knots in the x-axis defined by argument ps.intervals
`(and of course where to put them; ps() uses equal spaces in the x-axis), ii) the degree of
`the piecewise polynomial used in the B-spline basis so we can define X, defined by argument
`degree iii) r the order of differences in the D matrix indicating the type of the penalty
`imposed in the the coefficients of the B-spline basis functions, defined by argument order
`
`00008
`
`

`

`Journal of Statistical Software
`
`9
`
`and iv) the amount of smoothing required defined either by the desired equivalent degrees
`of freedom defined by argument df (or alternative by the smoothing parameter defined by
`argument lambda). The ps() function in gamlss is based on an S-PLUS function of Marx
`(2003).
`The function lo() allows the user to use a loess fit in a GAMLSS formula. A loess fit is
`a polynomial (surface) curve determined by one or more explanatory (continuous) variables,
`which are fitted locally (see Cleveland, Grosse, and Shyu 1993). The implementation of the
`lo() function is very similar to the function with the same name in the S-PLUS implemen-
`tation of gam. However gamlss lo() function uses the R loess() function as its engine and
`this creates some minor differences between the two lo() even when the same model is fitted.
`lo() is the only function currently available in gamlss which allows smoothing in more than
`one explanatory (continuous) variables.
`The fp() function is an implementation of the fractional polynomials introduced by Royston
`and Altman (1994). The functions involved in fp() and bfp() are loosely based on the
`fractional polynomials function fracpoly() for S-PLUS given by Ambler (1999). The function
`bfp generates the correct design matrix for fitting a power polynomial of the type b0 + b1xp1 +
`b2xp2 + ... + bkxpk. For given powers p1, p2, ..., pk, given as the argument powers in bfp(),
`the function can be used to fit power polynomials in the same way as the functions poly() or
`bs() of the package splines are used to fit orthogonal or piecewise polynomials respectively.
`The function fp() (which uses bfp()) works as an additive smoother term in gamlss. It is
`used to fit the best fractional polynomials among a specific set of power values. Its argument
`npoly determines whether one, two or three fractional polynomials should used in the fitting.
`For a fixed number npoly the algorithm looks for the best fitting fractional polynomials in
`the list c(-2, -1, -0.5, 0, 0.5, 1, 2, 3). Note that npoly=3 is rather slow since it fits
`all possible 3-way combinations at each backfitting iteration.
`The power polynomial function pp() is an experimental function and is designed for the
`situation in which the model is in the form b0 + b1xp1 + b2xp2 with powers p1, p2 to be
`estimated non-linearly by the data. Initial values for the non-linear parameters p1, p2 have to
`be supplied.
`The function nl() exists in the add-on package gamlss.nl designed for fitting non-linear para-
`metric models within GAMLSS. It provides a way of fitting non-linear terms together with
`linear or smoothing terns in the same model. The function takes a non-linear object, (cre-
`ated by the function nl.obs), and uses the R nlm() function within the backfitting cycle of
`gamlss(). The success of this procedure depends on the starting values of the non-linear
`parameters (which must be provided by the user). No starting values are required for the
`other, e.g., linear terms, of the model.
`The function random() allows the fitted values for a factor (categorical) predictor to be shrunk
`towards the overall mean, where the amount of shrinking depends either on the parameter
`λ, or on the equivalent degrees of freedom (df). This function is similar to the random()
`function in the gam package of Hastie (2006) documented in Chambers and Hastie (1992).
`The function ra() is similar to the function random() but its fitting procedure is based on
`augmented least squares, a fact that makes ra() more general, but also slower to fit, than
`random(). The random coefficient function rc() is experimental. Note that the “random
`effects” functions, random(), ra() and rc() are used to estimate the random effect γ’s given
`the hyperparameters λ’s.
`In order to obtain estimates for the hyperparameters, methods
`
`00009
`
`

`

`10
`
`GAMLSS in R
`
`discussed in Rigby and Stasinopoulos (2005, Appendix A) can be used. Alternatively, for
`models only requiring a single random effect in one distribution parameter only, the function
`gamlssNP() of the package gamlss.mx, which uses Gaussian quadrature, can be used.
`The gamlss() function uses the same type of additive backfitting algorithm implemented
`in the gam() function of the R package gam (Hastie 2006). Note that the function gam()
`implementation in the R recommended package mgcv (Wood 2001) does not use backfitting.
`The reason that we use backfitting here that it is easier to extend the algorithm so new
`additive terms can be included.
`Each new additive term in the gamlss() requires two new functions. The first one, (the one
`that is seen by the user) is the one which defines the additive term and sets the additional
`required design matrices for the linear part of the model. The names of the existing additive
`functions are shown in the second column of Table 3. For example cs(x) defines a cubic
`smoothing spline function for the continuous explanatory variable x. It is used during the
`definition of the design matrix for the appropriate distribution parameter and it adds a linear
`term for x in the design matrix. The second function is the one that actually performs the
`additive backfitting algorithm. This function is called gamlss.name() where the name is one
`of the names in column two of Table 3. For example the function gamlss.cs() performs the
`backfitting for cubic splines. New additive terms can be implemented by defining those two
`functions and adding the new names in the .gamlss.sm.list list.
`The general policy when backfitting is used in gamlss() is to include the linear part of
`an additive term in the appropriate linear term design matrix. For example, in the cubic
`spline function cs() the explanatory variable say x is put in the linear design matrix of the
`appropriate distribution parameter and the smoothing function is fitted as a deviation from
`this linear part. This is equivalent of fitting a modified backfitting algorithm, see Hastie
`and Tibshirani (1990). In other additive functions where the linear part is not needed (or
`defined) a column on zeros is put in the design matrix. For example, this is the case when
`the fractional polynomials additive term fp() is used.
`If the user wishes to create a new additive term, care should be taken on how the degrees of
`freedom of the model are defined. The degrees of freedom for the (smoothing) additive terms
`are usually taken to be the extra degrees of freedom on top of the linear fit. For example to
`fit a single smoothing cubic spline term for say x with 5 total degrees of freedom, cs(x,df=3)
`should be used since already 2 degrees of freedom have been used for the fitting of the constant
`and the linear part of the explanatory variable x. This is different from the s() function of
`the gam package which uses s(x,df=4), assuming that only the constant term has been fitted
`separately. After a GAMLSS model containing additive (smoothing) terms is used to fit a
`specific distribution parameter the following components are (usually) saved for further use.
`In the output below replace mu with sigma, nu or "tau" if a distribution parameter other
`that mu is involved.
`
`mu.s: a matrix, each column containing the fitted values of the smoothers used to model the
`specific parameter. For example given a fitted model say mod1, then mod1$mu.s would
`access the additive terms fitted for mu.
`
`mu.var: a matrix containing the estimated variances of the smoothers.
`
`mu.df: a vector containing the extra degrees of freedom used to fit the smoothers.
`
`00010
`
`

`

`Journal of Statistical Software
`
`11
`
`mu.lambda: a vector containing the smoothing parameters (or random effects hyperparame-
`ters).
`
`mu.coefSmo: a list containing coefficients or other components from the additive smooth
`fitting.
`
`1.5. The GAMLSS algorithms
`
`There are two basic algorithms used for maximizing the penalized likelihood given in (6).
`The first, the CG algorithm, is a generalization of the Cole and Green (1992) algorithm—and
`uses the first and (expected or approximated) second and cross derivatives of the likelihood
`function with respect to the distribution parameters θ = (µ, σ, ν, τ) for a four parameter
`distribution. Note that we have dropped the subscripts here to simplify the notation. However
`for many population probability (density) functions, f(y|θ), the parameters θ are information
`orthogonal (since the expected values of the cross derivatives of the likelihood function are
`zero), e.g., location and scale models and dispersion family models, or approximately so. In
`this case the simpler RS algorithm, which is a generalization of the algorithm used by Rigby
`and Stasinopoulos (1996a,b) for fitting mean and dispersion additive models (MADAM), and
`does not use the cross derivatives, is more suited. The parameters θ = (µ, σ) are fully
`information orthogonal for distributions NBI, GA, IG, LO and NO only in Table 1. Nevertheless,
`the RS algorithm has been successfully used for fitting all distributions in Tables 1 and 2,
`although occasionally it can be slow to converge. Note also that the RS algorithm is not a
`special case of the CG algorithm.
`The object of the algorithms is to maximize the penalized likelihood function (cid:96)p, given by (6),
`for fixed hyperparameters λ. For fully parametric models, (3) or (5), the algorithms maximize
`the likelihood function (cid:96). The algorithms are implemented in the option method in the function
`gamlss() where a combination of both algorithms is also allowed. The major advantages of
`the algorithms are i) the modular fitting procedure (allowing different model diagnostics
`for each distribution parameter); ii) easy addition of extra distributions; iii) easy addition
`of extra additive terms; and iv) easily found starting values, requiring initial values for the
`θ = (µ, σ, ν, τ) rather than for the β parameters. The algorithms have generally been found to
`be stable and fast using very simple starting values (e.g., constants) for the θ parameters. The
`function nlgamlss() in the package gamlss.nl provides a third algorithm for fitting parametric
`linear or non-linear GAMLSS models as in equations (3) or (5) respectively. However the
`algorithm needs starting values for all the β parameters, rather than θ = (µ, σ, ν, τ), which
`can be difficult for the user to choose.
`Clearly, for a specific data set and model, the (penalized) likelihood can potentially have
`multiple local maxima. This is investigated using different starting values and has generally
`not been found to be a problem in the data sets analyzed, possibly due to the relatively large
`sample sizes used.
`Singularities in the likelihood function similar to the ones reported by Crisp and Burridge
`(1994) can potentially occur in specific cases within the GAMLSS framework, especially when
`the sample size is small. The problem can be alleviated by appropriate restrictions on the
`scale parameter (penalizing it for going close to zero).
`
`0

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket