`
`Journal of Statistical Software
`
`December 2007, Volume 23, Issue 7.
`
`http://www.jstatsoft.org/
`
`Generalized Additive Models for Location Scale
`and Shape (GAMLSS) in R
`
`D. Mikis Stasinopoulos
`London Metropolitan University
`
`Robert A. Rigby
`London Metropolitan University
`
`Abstract
`
`GAMLSS is a general framework for fitting regression type models where the distribu-
`tion of the response variable does not have to belong to the exponential family and includes
`highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the
`parameters of the distribution of the response variable to be modelled as linear/non-linear
`or smooth functions of the explanatory variables. This paper starts by defining the sta-
`tistical framework of GAMLSS, then describes the current implementation of GAMLSS
`in R and finally gives four different data examples to demonstrate how GAMLSS can be
`used for statistical modelling.
`
`Keywords: Box-Cox transformation, centile estimation, cubic smoothing splines, LMS method,
`negative binomial, non-normal, non-parametric, overdispersion, penalized likelihood, skewness
`and kurtosis.
`
`1. What is GAMLSS?
`
`1.1. Introduction
`
`Generalized additive models for location, scale and shape (GAMLSS) are semi-parametric
`regression type models. They are parametric, in that they require a parametric distribu-
`tion assumption for the response variable, and “semi” in the sense that the modelling of
`the parameters of the distribution, as functions of explanatory variables, may involve using
`non-parametric smoothing functions. GAMLSS were introduced by Rigby and Stasinopoulos
`(2001, 2005) and Akantziliotou, Rigby, and Stasinopoulos (2002) as a way of overcoming some
`of the limitations associated with the popular generalized linear models, GLM, and general-
`ized additive models, GAM (see Nelder and Wedderburn 1972; Hastie and Tibshirani 1990,
`respectively).
`
`00001
`
`EX1073
`
`
`
`2
`
`GAMLSS in R
`
`In GAMLSS the exponential family distribution assumption for the response variable (y) is
`relaxed and replaced by a general distribution family, including highly skew and/or kurtotic
`continuous and discrete distributions. The systematic part of the model is expanded to
`allow modelling not only of the mean (or location) but other parameters of the distribution
`of y as, linear and/or non-linear, parametric and/or additive non-parametric functions of
`explanatory variables and/or random effects. Hence GAMLSS is especially suited to modelling
`a response variable which does not follow an exponential family distribution, (e.g., leptokurtic
`or platykurtic and/or positive or negative skew response data, or overdispersed counts) or
`which exhibit heterogeneity, (e.g., where the scale or shape of the distribution of the response
`variable changes with explanatory variables(s)).
`There are several R-packages that can be seen as related to the gamlss packages and to its R
`implementation. The original gam package (Hastie 2006), the recommenced R package mgcv
`(Wood 2001), the general smoothing splines package gss (Gu 2007) and the vector GAM
`package, VGAM (Yee 2007). The first three deal mainly with models for the mean from an
`exponential family distribution. The VGAM package allows the modelling from a variety
`of different distributions (usually up to three parameter ones) and also allows multivariate
`responses.
`The remainder of Section 1 defines the GAMLSS model, available distributions, available
`additive terms and model fitting. Section 2 describes the R gamlss package for fitting the
`GAMLSS model. Section 3 gives four data examples to illustrate GAMLSS modelling.
`
`1.2. The GAMLSS model
`
`A GAMLSS model assumes independent observations yi for i = 1, 2, . . . , n with probability
`(density) function f(yi|θi) conditional on θi = (θ1i, θ2i, θ3i, θ4i) = (µi, σi, νi, τi) a vector of four
`distribution parameters, each of which can be a function to the explanatory variables. We
`shall refer to (µi, σi, νi, τi) as the distribution parameters. The first two population distribution
`parameters µi and σi are usually characterized as location and scale parameters, while the
`remaining parameter(s), if any, are characterized as shape parameters, e.g., skewness and
`kurtosis parameters, although the model may be applied more generally to the parameters of
`any population distribution, and can be generalized to more than four distribution parameters.
`Rigby and Stasinopoulos (2005) define the original formulation of a GAMLSS model as follows.
`Let y(cid:62) = (y1, y2, . . . , yn) be the n length vector of the response variable. Also for k =
`1, 2, 3, 4, let gk(.) be known monotonic link functions relating the distribution parameters to
`explanatory variables by
`
`Jk(cid:88)
`
`Zjkγjk,
`
`(1)
`
`i.e.
`
`gk(θk) = ηk = Xkβk +
`
`g1(µ) = η1 = X1β1 +
`
`g2(σ) = η2 = X2β2 +
`
`j=1
`
`J1(cid:88)
`J2(cid:88)
`
`j=1
`
`j=1
`
`Zj1γj1
`
`Zj2γj2
`
`00002
`
`
`
`Journal of Statistical Software
`
`3
`
`J3(cid:88)
`J4(cid:88)
`
`g3(ν) = η3 = X3β3 +
`
`Zj3γj3
`
`j=1
`
`g4(τ ) = η4 = X4β4 +
`
`Zj4γj4.
`
`j=1
`
`(cid:16)− 1
`
`where µ, σ, ν τ and ηk are vectors of length n, β(cid:62)
`k = (β1k, β2k, . . . , βJ(cid:48)kk) is a parameter
`
`k, Xk is a fixed known design matrix of order n × J(cid:48)
`vector of length J(cid:48)
`k, Zjk is a fixed known
`n × qjk design matrix and γjk is a qjk dimensional random variable which is assumed to
`
`
`jk is the (generalized) inverse of a qjk × qjkbe distributed as γjk ∼ Nqjk(0, G−1jk ), where G−1
`(cid:17)
`symmetric matrix Gjk = Gjk(λjk) which may depend on a vector of hyperparameters λjk,
`and where if Gjk is singular then γjk is understood to have an improper prior density function
`2 γ(cid:62)
`proportional to exp
`.
`jkGjkγjk
`The model in (1) allows the user to model each distribution parameter as a linear function of
`explanatory variables and/or as linear functions of stochastic variables (random effects). Note
`that seldom will all distribution parameters need to be modelled using explanatory variables.
`There are several important sub-models of GAMLSS. For example for readers familiar with
`smoothing, the following GAMLSS sub-model formulation may be more familiar. Let Zjk =
`In, where In is an n × n identity matrix, and γjk = hjk = hjk(xjk) for all combinations of j
`and k in (1), then we have the semi-parametric additive formulation of GAMLSS given by
`Jk(cid:88)
`
`gk(θk) = ηk = Xkβk +
`
`hjk(xjk)
`
`(2)
`
`j=1
`
`where to abbreviate the notation use θk for k = 1, 2, 3, 4 to represent the distribution param-
`eter vectors µ, σ, ν and τ , and where xjk for j = 1, 2, . . . , Jk are also vectors of length n.
`The function hjk is an unknown function of the explanatory variable Xjk and hjk = hjk(xjk)
`is the vector which evaluates the function hjk at xjk. If there are no additive terms in any of
`the distribution parameters we have the simple parametric linear GAMLSS model,
`
`g1(θk) = ηk = Xkβk
`
`(3)
`
`Model (2) can be extended to allow non-linear parametric terms to be included in the model
`for µ, σ, ν and τ, as follows (see Rigby and Stasinopoulos 2006)
`
`Jk(cid:88)
`
`gk(θk) = ηk = hk(Xk, βk) +
`
`hjk(xjk)
`
`(4)
`
`j=1
`
`where hk for k = 1, 2, 3, 4 are non-linear functions and Xk is a known design matrix of order
`n×J
`(cid:48)(cid:48)
`k . We shall refer to the model in (4) as the non-linear semi-parametric additive GAMLSS
`model. If, for k = 1, 2, 3, 4, Jk = 0, that is, if for all distribution parameters we do not have
`additive terms, then model (4) is reduced to a non-linear parametric GAMLSS model.
`
`gk(θk) = ηk = hk(Xk, βk).
`If, in addition, hk(Xk, βk) = X(cid:62)
`k βk for i = 1, 2, . . . , n and k = 1, 2, 3, 4 then (5) reduces to the
`linear parametric model (3). Note that some of the terms in each hk(Xk, βk) may be linear,
`
`(5)
`
`00003
`
`
`
`4
`
`GAMLSS in R
`
`in which case the GAMLSS model is a combination of linear and non-linear parametric terms.
`We shall refer to any combination of models (3) or (5) as a parametric GAMLSS model.
`The parametric vectors βk and the random effects parameters γjk, for j = 1, 2, . . . , Jk and
`k = 1, 2, 3, 4 are estimated within the GAMLSS framework (for fixed values of the smoothing
`hyper-parameters λjk’s) by maximising a penalized likelihood function (cid:96)p given by
`
`p(cid:88)
`
`Jk(cid:88)
`
`k=1
`
`j=1
`
`(cid:96)p = (cid:96) − 1
`2
`
`λjkγ(cid:48)
`jkGjkγjk
`
`(6)
`
`where (cid:96) =(cid:80)n
`i=1 log f(yi|θi) is the log likelihood function. More details on how the penalized
`log likelihood (cid:96)p is maximized are given in Section 1.5. For parametric GAMLSS model
`(3) or (5), (cid:96)p reduces to (cid:96), and the βk for k = 1, 2, 3, 4 are estimated by maximizing the
`likelihood function (cid:96). The available distributions and the different additive terms in the
`current GAMLSS implementation in R are given in Sections 1.3 and 1.4 respectively. The R
`function to fit a GAMLSS model is gamlss() in the package gamlss which will be described
`in more detail in Section 2.
`
`1.3. Available distributions in GAMLSS
`The form of the distribution assumed for the response variable y, f(yi|µi, σi, νi, τi), can be very
`general. The only restriction that the R implementation of GAMLSS has is that the function
`log f(yi|µi, σi, νi, τi) and its first (and optionally expected second and cross) derivatives with
`respect to each of the parameters of θ must be computable. Explicit derivatives are preferable
`but numerical derivatives can be used.
`Table 1 shows a variety of one, two, three and four parameter families of continuous distribu-
`tions implemented in our current software version. Table 2 shows the discrete distributions.
`We shall refer to the distributions in Tables 1 and 2 as the gamlss.family distributions,
`a name to coincide with the R object created by the package gamlss. Johnson, Kotz, and
`Balakrishnan (1994, 1995); Johnson, Kotz, and Kemp (2005) are the classical reference books
`for most of the distributions in Tables 1 and 2. The BCCG distribution in Table 1 is the Box-
`Cox transformation model used by Cole and Green (1992) (also known as the LMS method of
`centile estimation). The BCPE and BCT distributions, described in Rigby and Stasinopou-
`los (2004, 2006) respectively, generalize the BCCG distribution to allow modelling of both
`skewness and kurtosis. For some of the distributions shown in Tables 1 and 2 more that one
`parameterization has been implemented. For example, the two parameter Weibull distribu-
`f(y|µ, σ) = σµyσ−1e−µyσ, denoted as WEI2, or as f(y|µ, σ) = (σ/β) (y/β)σ−1 exp {− (y/β)σ}
`denoted as WEI3, for β = µ/ [Γ(1/σ) + 1]. Note that the second parameterization WEI2 is
`suited to proportional hazard (PH) models. In the WEI3 parameterization, parameter µ is
`equal to the mean of y. The choice of parameterization depends upon the particular prob-
`lem, but some parameterizations are computationally preferable to others in the sense that
`maximization of the likelihood function is easier. This usually happens when the parameters
`µ, σ, ν and τ are orthogonal or almost orthogonal. For interpretation purposes we favour pa-
`rameterizations where the parameter µ is a location parameter (mean, median or mode). The
`specific parameterizations used in the gamlss.family distributions are given in the appendix
`of Stasinopoulos, Rigby, and Akantziliotou (2006).
`
`tion can be parameterized as f(y|µ, σ) =(cid:0)σyσ−1/µσ(cid:1) exp {−(y/µ)σ}, denoted as WEI, or as
`
`00004
`
`
`
`Journal of Statistical Software
`
`5
`
`Distributions
`beta
`beta inflated (at 0)
`beta inflated (at 1)
`beta inflated (at 0 and 1 )
`Box-Cox Cole and Green
`Box-Cox power exponential
`Box-Cox-t
`exponential
`exponential Gaussian
`exponential gen. beta type 2
`gamma
`generalized beta type 1
`generalized beta type 2
`generalized gamma
`generalized inverse Gaussian
`generalized y
`Gumbel
`inverse Gaussian
`Johnson’s SU (µ the mean)
`Johnson’s original SU
`logistic
`log normal
`log normal (Box-Cox)
`NET
`normal
`normal family
`power exponential
`reverse Gumbel
`skew power exponential type 1
`skew power exponential type 2
`skew power exponential type 3
`skew power exponential type 4
`shash
`skew t type 1
`skew t type 2
`skew t type 3
`skew t type 4
`skew t type 5
`t Family
`Weibull
`Weibull (PH)
`Weibull (µ the mean)
`zero adjusted IG
`
`R Name
`BE()
`BEOI()
`BEZI()
`BEINF()
`BCCG()
`BCPE()
`BCT()
`EXP()
`exGAUS()
`EGB2()
`GA()
`GB1()
`GB2()
`GG()
`GIG()
`GT()
`GU()
`IG()
`JSU()
`JSUo()
`LO()
`LOGNO()
`LNO()
`NET()
`NO()
`NOF()
`PE()
`RG()
`SEP1()
`SEP2()
`SEP3()
`SEP4()
`SHASH()
`ST1()
`ST2()
`ST3()
`ST4()
`ST5()
`TF()
`WEI()
`WEI2()
`WEI3()
`ZAIG()
`
`µ
`logit
`logit
`logit
`logit
`identity
`identity
`identity
`log
`identity
`identity
`log
`logit
`log
`log
`log
`identity
`identity
`log
`identity
`identity
`identity
`log
`log
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`log
`log
`log
`log
`
`σ
`logit
`log
`log
`logit
`log
`log
`log
`-
`log
`identity
`log
`logit
`identity
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`
`ν
`-
`logit
`logit
`log
`identity
`identity
`identity
`-
`log
`log
`-
`log
`log
`identity
`identity
`log
`-
`-
`identity
`identity
`-
`-
`fixed
`fixed
`-
`identity
`log
`-
`identity
`identity
`log
`log
`log
`identity
`identity
`log
`log
`identity
`log
`-
`-
`-
`logit
`
`τ
`-
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`-
`log
`log
`-
`-
`-
`fixed
`-
`-
`-
`-
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`-
`-
`-
`-
`-
`
`Table 1: Continuous distributions implemented within the gamlss packages (with default link
`functions).
`
`00005
`
`
`
`6
`
`GAMLSS in R
`
`Distributions
`beta binomial
`binomial
`Delaporte
`Negative Binomial type I
`Negative Binomial type II
`Poisson
`Poisson inverse Gaussian
`Sichel
`Sichel (µ the mean)
`zero inflated poisson
`zero inflated poisson (µ the mean)
`
`R Name
`BB()
`BI()
`DEL()
`NBI()
`NBII()
`PO()
`PIG()
`SI()
`SICHEL()
`ZIP()
`ZIP2()
`
`µ
`logit
`logit
`log
`log
`log
`log
`log
`log
`log
`log
`log
`
`σ
`log
`-
`log
`log
`log
`-
`log
`log
`log
`logit
`logit
`
`ν
`-
`-
`logit
`-
`-
`-
`-
`identity
`identity
`-
`-
`
`Table 2: Discrete distributions implemented within the gamlss packages (with default link
`functions).
`
`All of the distributions in Tables 1 and 2 have d, p, q and r functions corresponding respec-
`tively to the probability density function (pdf), the cumulative distribution function (cdf),
`the quantiles (i.e., inverse cdf) and random value generating functions. For example, the
`gamma distribution has the functions dGA, pGA, qGA and rGA. In addition each distribution
`has a fitting function which helps the fitting procedure by providing link functions, first and
`(exact of approximate) expected second derivatives, starting values etc. All fitting functions
`have as arguments the link functions for the distribution parameters. For example, the fitting
`function for the gamma distribution is called GA with arguments mu.link and sigma.link.
`The default link functions for all gamlss.family distributions are shown in columns 3–6 of
`Tables 1 and 2. The function show.link() can be used to identify which are the available
`links for the distribution parameter within each of the gamlss.family. Available link func-
`tions can be the usual glm() link functions plus logshifted, logitshifted and own. The
`own option allows the user to define his/her own link function, for an example see the help
`file on the function make.link.gamlss().
`There are several ways to extend the gamlss.family distributions. This can be achieved by
`• creating a new gamlss.family distribution
`• truncating an existing gamlss.family
`• using a censored version of an existing gamlss.family
`• mixing different gamlss.family distributions to create a new finite mixture distribution.
`
`To create a new gamlss.family distribution is relatively simple, if the pdf function of the
`distribution can be evaluated easily. To do that, find a file of a current gamlss.family
`distribution, (having the same number of distribution parameters) and amend accordingly.
`For more details, on how this can be done, see Stasinopoulos et al. (2006, Section 4.2).
`Truncating existing gamlss.family distributions can be achieved by using the add-on package
`gamlss.tr. The function gen.trun(), within the gamlss.tr package, can take any gamlss.family
`
`00006
`
`
`
`Journal of Statistical Software
`
`7
`
`distribution and generate the d, p, q, r and fitting R functions for the specified truncated dis-
`tribution. The truncation can be left, right or in both tails of the range of the response y
`variable.
`The package gamlss.cens is designed for the situation where the response variable is censored
`or, more generally, it has been observed in an interval form, e.g., (3, 10] an interval from
`3 to 10 (including only the right end point 10). The function gen.cens() will take any
`gamlss.family distribution and create a new function which can fit a response of “interval”
`type. Note that for “interval” response variables the usual likelihood function for independent
`response variables defined as
`
`n(cid:89)
`
`L(θ) =
`
`f(yi|θ)
`
`changes to
`
`i=1
`
`n(cid:89)
`
`i=1
`
`L(θ) =
`
`[F (y2i|θ) − F (y1i|θ)]
`
`where F (y) is the cumulative distribution function and (y1i, y2i) is the observed interval.
`Finite mixtures of gamlss.family distributions can be fitted using the package gamlss.mx.
`A finite mixture of gamlss.family distributions will have the form
`
`(7)
`
`(8)
`
`(9)
`
`K(cid:88)
`is the prior (or mixing) probability of component k, for k = 1, 2, . . . , K. Also (cid:80)K
`
`fY (y|ψ) =
`
`πkfk(y|θk)
`
`k=1
`
`where fk(y|θk) is the probability (density) function of y for component k, and 0 ≤ πk ≤ 1
`k=1 πk = 1
`and ψ = (θ, π) where θ = (θ1, θ2, . . . , θk) and π = (π1, π2, . . . , πK). Any combination
`of (continuous or discrete) gamlss.family distributions can be used. The model in this
`case is fitted using the EM algorithm. The component probability (density) functions may
`have different parameters (fitted using the function gamlssMX()) or may have parameters in
`common (fitted using the function gamlssNP()). In the former case, the mixing probabilities
`may also be modelled using explanatory variables and the finite mixture may have a zero
`component (e.g., zero inflated negative binomial etc.). Both functions gamlssMX()) and
`gamlssNP() are in the add on package gamlss.mx.
`
`1.4. Available additive terms in GAMLSS
`
`Equation (1) allows the user to model all the distribution parameters µ, σ, ν and τ as
`linear parametric and/or non-linear parametric and/or non-parametric (smooth) function of
`the explanatory variables and/or random effects terms. For modelling linear functions the
`Wilkinson and Rogers (1973) notation as applied for model formulae in the S language by
`Chambers and Hastie (1992) can be used.
`It is the model formulae notation used in the
`fit of linear models, lm(), and generalized lineal models, glm(), see for example Venables
`and Ripley (2002, Section 6.2). For fitting non-linear or non-parametric (smooth) functions
`or random effects terms, an additive term function has to be fitted. Parametric non-linear
`models can be also fitted using the function nlgamlss() of the add-on package gamlss.nl.
`
`00007
`
`
`
`8
`
`GAMLSS in R
`
`Additive terms
`Cubic splines
`Varying coefficient
`Penalized splines
`loess
`Fractional polynomials
`Power polynomials
`non-linear fit
`Random effects
`Random effects
`Random coefficient
`
`R Name
`cs()
`vc()
`ps()
`lo()
`fp()
`pp()
`nl()
`random()
`ra()
`rc()
`
`Table 3: Additive terms implemented within the gamlss packages.
`
`(cid:104)
`
`λ(cid:82) ∞
`
`(cid:105)2
`
`Table 3 shows the additive term functions implemented in the current R implementation of
`GAMLSS. Note that all available additive terms names are stored in the list .gamlss.sm.list.
`The cubic spline function cs() is based on the smooth.spline() function of R and can be
`used for univariate smoothing. Cubic splines are covered extensively in the literature (Reinsch
`1967; Green and Silverman 1994; Hastie and Tibshirani 1990, Chapter 2). They assume in
`model (2) that the functions h(t) are arbitrary twice continuously differentiable functions
`and we maximize a penalized log likelihood, given by (cid:96) subject to penalty terms of the form
`(cid:48)(cid:48)(t)h
`dt. The solution for the maximizing functions h(t) are all natural cubic splines,
`−∞
`
`and hence can be expressed as linear combinations of their natural cubic spline basis functions
`(de Boor 1978). In cs() each distinct x-value is a knot.
`The varying coefficient terms were introduced by Hastie and Tibshirani (1993) to accommo-
`date a special type of interaction between explanatory variables. This interaction takes the
`form of β(r)x, that is the linear coefficient of the explanatory variable x is changing smoothly
`according to another explanatory variable r. In some applications r will be time. In general
`r should be a continuous variable, while x can be either continuous or categorical. In the
`current GAMLSS implementation x has to be continuous or a two level factor with levels 0
`and 1.
`Penalized splines were introduced by Eilers and Marx (1996). Penalized Splines (or P-splines)
`are piecewise polynomials defined by B-spline basis functions in the explanatory variable,
`where the coefficients of the basis functions are penalized to guarantee sufficient smoothness
`(see Eilers and Marx 1996). More precisely consider the model θ = Z(x)γ where θ can be
`any distributional parameter in a GAMLSS model, Z(x) is n × q basis design matrix for the
`explanatory variable x defined at q-different knots mostly within the range of x and γ is a
`q × 1 vector of coefficients which have some stochastic restrictions imposed by the fact that
`Dγ ∼ N(0, λ−1I) or equivalently by γ ∼ N(0, λ−1K−) where K = D(cid:62)D. The matrix D is a
`(q− r)× q matrix giving rth differences of the q-dimensional vector γ. So to define a penalized
`spline we need:
`i) q the number of knots in the x-axis defined by argument ps.intervals
`(and of course where to put them; ps() uses equal spaces in the x-axis), ii) the degree of
`the piecewise polynomial used in the B-spline basis so we can define X, defined by argument
`degree iii) r the order of differences in the D matrix indicating the type of the penalty
`imposed in the the coefficients of the B-spline basis functions, defined by argument order
`
`00008
`
`
`
`Journal of Statistical Software
`
`9
`
`and iv) the amount of smoothing required defined either by the desired equivalent degrees
`of freedom defined by argument df (or alternative by the smoothing parameter defined by
`argument lambda). The ps() function in gamlss is based on an S-PLUS function of Marx
`(2003).
`The function lo() allows the user to use a loess fit in a GAMLSS formula. A loess fit is
`a polynomial (surface) curve determined by one or more explanatory (continuous) variables,
`which are fitted locally (see Cleveland, Grosse, and Shyu 1993). The implementation of the
`lo() function is very similar to the function with the same name in the S-PLUS implemen-
`tation of gam. However gamlss lo() function uses the R loess() function as its engine and
`this creates some minor differences between the two lo() even when the same model is fitted.
`lo() is the only function currently available in gamlss which allows smoothing in more than
`one explanatory (continuous) variables.
`The fp() function is an implementation of the fractional polynomials introduced by Royston
`and Altman (1994). The functions involved in fp() and bfp() are loosely based on the
`fractional polynomials function fracpoly() for S-PLUS given by Ambler (1999). The function
`bfp generates the correct design matrix for fitting a power polynomial of the type b0 + b1xp1 +
`b2xp2 + ... + bkxpk. For given powers p1, p2, ..., pk, given as the argument powers in bfp(),
`the function can be used to fit power polynomials in the same way as the functions poly() or
`bs() of the package splines are used to fit orthogonal or piecewise polynomials respectively.
`The function fp() (which uses bfp()) works as an additive smoother term in gamlss. It is
`used to fit the best fractional polynomials among a specific set of power values. Its argument
`npoly determines whether one, two or three fractional polynomials should used in the fitting.
`For a fixed number npoly the algorithm looks for the best fitting fractional polynomials in
`the list c(-2, -1, -0.5, 0, 0.5, 1, 2, 3). Note that npoly=3 is rather slow since it fits
`all possible 3-way combinations at each backfitting iteration.
`The power polynomial function pp() is an experimental function and is designed for the
`situation in which the model is in the form b0 + b1xp1 + b2xp2 with powers p1, p2 to be
`estimated non-linearly by the data. Initial values for the non-linear parameters p1, p2 have to
`be supplied.
`The function nl() exists in the add-on package gamlss.nl designed for fitting non-linear para-
`metric models within GAMLSS. It provides a way of fitting non-linear terms together with
`linear or smoothing terns in the same model. The function takes a non-linear object, (cre-
`ated by the function nl.obs), and uses the R nlm() function within the backfitting cycle of
`gamlss(). The success of this procedure depends on the starting values of the non-linear
`parameters (which must be provided by the user). No starting values are required for the
`other, e.g., linear terms, of the model.
`The function random() allows the fitted values for a factor (categorical) predictor to be shrunk
`towards the overall mean, where the amount of shrinking depends either on the parameter
`λ, or on the equivalent degrees of freedom (df). This function is similar to the random()
`function in the gam package of Hastie (2006) documented in Chambers and Hastie (1992).
`The function ra() is similar to the function random() but its fitting procedure is based on
`augmented least squares, a fact that makes ra() more general, but also slower to fit, than
`random(). The random coefficient function rc() is experimental. Note that the “random
`effects” functions, random(), ra() and rc() are used to estimate the random effect γ’s given
`the hyperparameters λ’s.
`In order to obtain estimates for the hyperparameters, methods
`
`00009
`
`
`
`10
`
`GAMLSS in R
`
`discussed in Rigby and Stasinopoulos (2005, Appendix A) can be used. Alternatively, for
`models only requiring a single random effect in one distribution parameter only, the function
`gamlssNP() of the package gamlss.mx, which uses Gaussian quadrature, can be used.
`The gamlss() function uses the same type of additive backfitting algorithm implemented
`in the gam() function of the R package gam (Hastie 2006). Note that the function gam()
`implementation in the R recommended package mgcv (Wood 2001) does not use backfitting.
`The reason that we use backfitting here that it is easier to extend the algorithm so new
`additive terms can be included.
`Each new additive term in the gamlss() requires two new functions. The first one, (the one
`that is seen by the user) is the one which defines the additive term and sets the additional
`required design matrices for the linear part of the model. The names of the existing additive
`functions are shown in the second column of Table 3. For example cs(x) defines a cubic
`smoothing spline function for the continuous explanatory variable x. It is used during the
`definition of the design matrix for the appropriate distribution parameter and it adds a linear
`term for x in the design matrix. The second function is the one that actually performs the
`additive backfitting algorithm. This function is called gamlss.name() where the name is one
`of the names in column two of Table 3. For example the function gamlss.cs() performs the
`backfitting for cubic splines. New additive terms can be implemented by defining those two
`functions and adding the new names in the .gamlss.sm.list list.
`The general policy when backfitting is used in gamlss() is to include the linear part of
`an additive term in the appropriate linear term design matrix. For example, in the cubic
`spline function cs() the explanatory variable say x is put in the linear design matrix of the
`appropriate distribution parameter and the smoothing function is fitted as a deviation from
`this linear part. This is equivalent of fitting a modified backfitting algorithm, see Hastie
`and Tibshirani (1990). In other additive functions where the linear part is not needed (or
`defined) a column on zeros is put in the design matrix. For example, this is the case when
`the fractional polynomials additive term fp() is used.
`If the user wishes to create a new additive term, care should be taken on how the degrees of
`freedom of the model are defined. The degrees of freedom for the (smoothing) additive terms
`are usually taken to be the extra degrees of freedom on top of the linear fit. For example to
`fit a single smoothing cubic spline term for say x with 5 total degrees of freedom, cs(x,df=3)
`should be used since already 2 degrees of freedom have been used for the fitting of the constant
`and the linear part of the explanatory variable x. This is different from the s() function of
`the gam package which uses s(x,df=4), assuming that only the constant term has been fitted
`separately. After a GAMLSS model containing additive (smoothing) terms is used to fit a
`specific distribution parameter the following components are (usually) saved for further use.
`In the output below replace mu with sigma, nu or "tau" if a distribution parameter other
`that mu is involved.
`
`mu.s: a matrix, each column containing the fitted values of the smoothers used to model the
`specific parameter. For example given a fitted model say mod1, then mod1$mu.s would
`access the additive terms fitted for mu.
`
`mu.var: a matrix containing the estimated variances of the smoothers.
`
`mu.df: a vector containing the extra degrees of freedom used to fit the smoothers.
`
`00010
`
`
`
`Journal of Statistical Software
`
`11
`
`mu.lambda: a vector containing the smoothing parameters (or random effects hyperparame-
`ters).
`
`mu.coefSmo: a list containing coefficients or other components from the additive smooth
`fitting.
`
`1.5. The GAMLSS algorithms
`
`There are two basic algorithms used for maximizing the penalized likelihood given in (6).
`The first, the CG algorithm, is a generalization of the Cole and Green (1992) algorithm—and
`uses the first and (expected or approximated) second and cross derivatives of the likelihood
`function with respect to the distribution parameters θ = (µ, σ, ν, τ) for a four parameter
`distribution. Note that we have dropped the subscripts here to simplify the notation. However
`for many population probability (density) functions, f(y|θ), the parameters θ are information
`orthogonal (since the expected values of the cross derivatives of the likelihood function are
`zero), e.g., location and scale models and dispersion family models, or approximately so. In
`this case the simpler RS algorithm, which is a generalization of the algorithm used by Rigby
`and Stasinopoulos (1996a,b) for fitting mean and dispersion additive models (MADAM), and
`does not use the cross derivatives, is more suited. The parameters θ = (µ, σ) are fully
`information orthogonal for distributions NBI, GA, IG, LO and NO only in Table 1. Nevertheless,
`the RS algorithm has been successfully used for fitting all distributions in Tables 1 and 2,
`although occasionally it can be slow to converge. Note also that the RS algorithm is not a
`special case of the CG algorithm.
`The object of the algorithms is to maximize the penalized likelihood function (cid:96)p, given by (6),
`for fixed hyperparameters λ. For fully parametric models, (3) or (5), the algorithms maximize
`the likelihood function (cid:96). The algorithms are implemented in the option method in the function
`gamlss() where a combination of both algorithms is also allowed. The major advantages of
`the algorithms are i) the modular fitting procedure (allowing different model diagnostics
`for each distribution parameter); ii) easy addition of extra distributions; iii) easy addition
`of extra additive terms; and iv) easily found starting values, requiring initial values for the
`θ = (µ, σ, ν, τ) rather than for the β parameters. The algorithms have generally been found to
`be stable and fast using very simple starting values (e.g., constants) for the θ parameters. The
`function nlgamlss() in the package gamlss.nl provides a third algorithm for fitting parametric
`linear or non-linear GAMLSS models as in equations (3) or (5) respectively. However the
`algorithm needs starting values for all the β parameters, rather than θ = (µ, σ, ν, τ), which
`can be difficult for the user to choose.
`Clearly, for a specific data set and model, the (penalized) likelihood can potentially have
`multiple local maxima. This is investigated using different starting values and has generally
`not been found to be a problem in the data sets analyzed, possibly due to the relatively large
`sample sizes used.
`Singularities in the likelihood function similar to the ones reported by Crisp and Burridge
`(1994) can potentially occur in specific cases within the GAMLSS framework, especially when
`the sample size is small. The problem can be alleviated by appropriate restrictions on the
`scale parameter (penalizing it for going close to zero).
`
`0