IPR2022-01152, No. 1073-74 Exhibit - Stasinopolous 2007 (P.T.A.B. Jun. 16, 2022)

JSS
`
`Journal of Statistical Software
`
`December 2007, Volume 23, Issue 7.
`
`http://www.jstatsoft.org/
`
`Generalized Additive Models for Location Scale
`and Shape (GAMLSS) in R
`
`D. Mikis Stasinopoulos
`London Metropolitan University
`
`Robert A. Rigby
`London Metropolitan University
`
`Abstract
`
`GAMLSS is a general framework for ﬁtting regression type models where the distribu-
`tion of the response variable does not have to belong to the exponential family and includes
`highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the
`parameters of the distribution of the response variable to be modelled as linear/non-linear
`or smooth functions of the explanatory variables. This paper starts by deﬁning the sta-
`tistical framework of GAMLSS, then describes the current implementation of GAMLSS
`in R and ﬁnally gives four diﬀerent data examples to demonstrate how GAMLSS can be
`used for statistical modelling.
`
`Keywords: Box-Cox transformation, centile estimation, cubic smoothing splines, LMS method,
`negative binomial, non-normal, non-parametric, overdispersion, penalized likelihood, skewness
`and kurtosis.
`
`1. What is GAMLSS?
`
`1.1. Introduction
`
`Generalized additive models for location, scale and shape (GAMLSS) are semi-parametric
`regression type models. They are parametric, in that they require a parametric distribu-
`tion assumption for the response variable, and “semi” in the sense that the modelling of
`the parameters of the distribution, as functions of explanatory variables, may involve using
`non-parametric smoothing functions. GAMLSS were introduced by Rigby and Stasinopoulos
`(2001, 2005) and Akantziliotou, Rigby, and Stasinopoulos (2002) as a way of overcoming some
`of the limitations associated with the popular generalized linear models, GLM, and general-
`ized additive models, GAM (see Nelder and Wedderburn 1972; Hastie and Tibshirani 1990,
`respectively).
`
`00001
`
`EX1073
`
`

`2
`
`GAMLSS in R
`
`In GAMLSS the exponential family distribution assumption for the response variable (y) is
`relaxed and replaced by a general distribution family, including highly skew and/or kurtotic
`continuous and discrete distributions. The systematic part of the model is expanded to
`allow modelling not only of the mean (or location) but other parameters of the distribution
`of y as, linear and/or non-linear, parametric and/or additive non-parametric functions of
`explanatory variables and/or random eﬀects. Hence GAMLSS is especially suited to modelling
`a response variable which does not follow an exponential family distribution, (e.g., leptokurtic
`or platykurtic and/or positive or negative skew response data, or overdispersed counts) or
`which exhibit heterogeneity, (e.g., where the scale or shape of the distribution of the response
`variable changes with explanatory variables(s)).
`There are several R-packages that can be seen as related to the gamlss packages and to its R
`implementation. The original gam package (Hastie 2006), the recommenced R package mgcv
`(Wood 2001), the general smoothing splines package gss (Gu 2007) and the vector GAM
`package, VGAM (Yee 2007). The ﬁrst three deal mainly with models for the mean from an
`exponential family distribution. The VGAM package allows the modelling from a variety
`of diﬀerent distributions (usually up to three parameter ones) and also allows multivariate
`responses.
`The remainder of Section 1 deﬁnes the GAMLSS model, available distributions, available
`additive terms and model ﬁtting. Section 2 describes the R gamlss package for ﬁtting the
`GAMLSS model. Section 3 gives four data examples to illustrate GAMLSS modelling.
`
`1.2. The GAMLSS model
`
`A GAMLSS model assumes independent observations yi for i = 1, 2, . . . , n with probability
`(density) function f(yi|θi) conditional on θi = (θ1i, θ2i, θ3i, θ4i) = (µi, σi, νi, τi) a vector of four
`distribution parameters, each of which can be a function to the explanatory variables. We
`shall refer to (µi, σi, νi, τi) as the distribution parameters. The ﬁrst two population distribution
`parameters µi and σi are usually characterized as location and scale parameters, while the
`remaining parameter(s), if any, are characterized as shape parameters, e.g., skewness and
`kurtosis parameters, although the model may be applied more generally to the parameters of
`any population distribution, and can be generalized to more than four distribution parameters.
`Rigby and Stasinopoulos (2005) deﬁne the original formulation of a GAMLSS model as follows.
`Let y(cid:62) = (y1, y2, . . . , yn) be the n length vector of the response variable. Also for k =
`1, 2, 3, 4, let gk(.) be known monotonic link functions relating the distribution parameters to
`explanatory variables by
`
`Jk(cid:88)
`
`Zjkγjk,
`
`(1)
`
`i.e.
`
`gk(θk) = ηk = Xkβk +
`
`g1(µ) = η1 = X1β1 +
`
`g2(σ) = η2 = X2β2 +
`
`j=1
`
`J1(cid:88)
`J2(cid:88)
`
`j=1
`
`j=1
`
`Zj1γj1
`
`Zj2γj2
`
`00002
`
`

`Journal of Statistical Software
`
`3
`
`J3(cid:88)
`J4(cid:88)
`
`g3(ν) = η3 = X3β3 +
`
`Zj3γj3
`
`j=1
`
`g4(τ ) = η4 = X4β4 +
`
`Zj4γj4.
`
`j=1
`
`(cid:16)− 1
`
`where µ, σ, ν τ and ηk are vectors of length n, β(cid:62)
`k = (β1k, β2k, . . . , βJ(cid:48)kk) is a parameter
`
`k, Xk is a ﬁxed known design matrix of order n × J(cid:48)
`vector of length J(cid:48)
`k, Zjk is a ﬁxed known
`n × qjk design matrix and γjk is a qjk dimensional random variable which is assumed to
`
`
`jk is the (generalized) inverse of a qjk × qjkbe distributed as γjk ∼ Nqjk(0, G−1jk ), where G−1
`(cid:17)
`symmetric matrix Gjk = Gjk(λjk) which may depend on a vector of hyperparameters λjk,
`and where if Gjk is singular then γjk is understood to have an improper prior density function
`2 γ(cid:62)
`proportional to exp
`.
`jkGjkγjk
`The model in (1) allows the user to model each distribution parameter as a linear function of
`explanatory variables and/or as linear functions of stochastic variables (random eﬀects). Note
`that seldom will all distribution parameters need to be modelled using explanatory variables.
`There are several important sub-models of GAMLSS. For example for readers familiar with
`smoothing, the following GAMLSS sub-model formulation may be more familiar. Let Zjk =
`In, where In is an n × n identity matrix, and γjk = hjk = hjk(xjk) for all combinations of j
`and k in (1), then we have the semi-parametric additive formulation of GAMLSS given by
`Jk(cid:88)
`
`gk(θk) = ηk = Xkβk +
`
`hjk(xjk)
`
`(2)
`
`j=1
`
`where to abbreviate the notation use θk for k = 1, 2, 3, 4 to represent the distribution param-
`eter vectors µ, σ, ν and τ , and where xjk for j = 1, 2, . . . , Jk are also vectors of length n.
`The function hjk is an unknown function of the explanatory variable Xjk and hjk = hjk(xjk)
`is the vector which evaluates the function hjk at xjk. If there are no additive terms in any of
`the distribution parameters we have the simple parametric linear GAMLSS model,
`
`g1(θk) = ηk = Xkβk
`
`(3)
`
`Model (2) can be extended to allow non-linear parametric terms to be included in the model
`for µ, σ, ν and τ, as follows (see Rigby and Stasinopoulos 2006)
`
`Jk(cid:88)
`
`gk(θk) = ηk = hk(Xk, βk) +
`
`hjk(xjk)
`
`(4)
`
`j=1
`
`where hk for k = 1, 2, 3, 4 are non-linear functions and Xk is a known design matrix of order
`n×J
`(cid:48)(cid:48)
`k . We shall refer to the model in (4) as the non-linear semi-parametric additive GAMLSS
`model. If, for k = 1, 2, 3, 4, Jk = 0, that is, if for all distribution parameters we do not have
`additive terms, then model (4) is reduced to a non-linear parametric GAMLSS model.
`
`gk(θk) = ηk = hk(Xk, βk).
`If, in addition, hk(Xk, βk) = X(cid:62)
`k βk for i = 1, 2, . . . , n and k = 1, 2, 3, 4 then (5) reduces to the
`linear parametric model (3). Note that some of the terms in each hk(Xk, βk) may be linear,
`
`(5)
`
`00003
`
`

`4
`
`GAMLSS in R
`
`in which case the GAMLSS model is a combination of linear and non-linear parametric terms.
`We shall refer to any combination of models (3) or (5) as a parametric GAMLSS model.
`The parametric vectors βk and the random eﬀects parameters γjk, for j = 1, 2, . . . , Jk and
`k = 1, 2, 3, 4 are estimated within the GAMLSS framework (for ﬁxed values of the smoothing
`hyper-parameters λjk’s) by maximising a penalized likelihood function (cid:96)p given by
`
`p(cid:88)
`
`Jk(cid:88)
`
`k=1
`
`j=1
`
`(cid:96)p = (cid:96) − 1
`2
`
`λjkγ(cid:48)
`jkGjkγjk
`
`(6)
`
`where (cid:96) =(cid:80)n
`i=1 log f(yi|θi) is the log likelihood function. More details on how the penalized
`log likelihood (cid:96)p is maximized are given in Section 1.5. For parametric GAMLSS model
`(3) or (5), (cid:96)p reduces to (cid:96), and the βk for k = 1, 2, 3, 4 are estimated by maximizing the
`likelihood function (cid:96). The available distributions and the diﬀerent additive terms in the
`current GAMLSS implementation in R are given in Sections 1.3 and 1.4 respectively. The R
`function to ﬁt a GAMLSS model is gamlss() in the package gamlss which will be described
`in more detail in Section 2.
`
`1.3. Available distributions in GAMLSS
`The form of the distribution assumed for the response variable y, f(yi|µi, σi, νi, τi), can be very
`general. The only restriction that the R implementation of GAMLSS has is that the function
`log f(yi|µi, σi, νi, τi) and its ﬁrst (and optionally expected second and cross) derivatives with
`respect to each of the parameters of θ must be computable. Explicit derivatives are preferable
`but numerical derivatives can be used.
`Table 1 shows a variety of one, two, three and four parameter families of continuous distribu-
`tions implemented in our current software version. Table 2 shows the discrete distributions.
`We shall refer to the distributions in Tables 1 and 2 as the gamlss.family distributions,
`a name to coincide with the R object created by the package gamlss. Johnson, Kotz, and
`Balakrishnan (1994, 1995); Johnson, Kotz, and Kemp (2005) are the classical reference books
`for most of the distributions in Tables 1 and 2. The BCCG distribution in Table 1 is the Box-
`Cox transformation model used by Cole and Green (1992) (also known as the LMS method of
`centile estimation). The BCPE and BCT distributions, described in Rigby and Stasinopou-
`los (2004, 2006) respectively, generalize the BCCG distribution to allow modelling of both
`skewness and kurtosis. For some of the distributions shown in Tables 1 and 2 more that one
`parameterization has been implemented. For example, the two parameter Weibull distribu-
`f(y|µ, σ) = σµyσ−1e−µyσ, denoted as WEI2, or as f(y|µ, σ) = (σ/β) (y/β)σ−1 exp {− (y/β)σ}
`denoted as WEI3, for β = µ/ [Γ(1/σ) + 1]. Note that the second parameterization WEI2 is
`suited to proportional hazard (PH) models. In the WEI3 parameterization, parameter µ is
`equal to the mean of y. The choice of parameterization depends upon the particular prob-
`lem, but some parameterizations are computationally preferable to others in the sense that
`maximization of the likelihood function is easier. This usually happens when the parameters
`µ, σ, ν and τ are orthogonal or almost orthogonal. For interpretation purposes we favour pa-
`rameterizations where the parameter µ is a location parameter (mean, median or mode). The
`speciﬁc parameterizations used in the gamlss.family distributions are given in the appendix
`of Stasinopoulos, Rigby, and Akantziliotou (2006).
`
`tion can be parameterized as f(y|µ, σ) =(cid:0)σyσ−1/µσ(cid:1) exp {−(y/µ)σ}, denoted as WEI, or as
`
`00004
`
`

`Journal of Statistical Software
`
`5
`
`Distributions
`beta
`beta inﬂated (at 0)
`beta inﬂated (at 1)
`beta inﬂated (at 0 and 1 )
`Box-Cox Cole and Green
`Box-Cox power exponential
`Box-Cox-t
`exponential
`exponential Gaussian
`exponential gen. beta type 2
`gamma
`generalized beta type 1
`generalized beta type 2
`generalized gamma
`generalized inverse Gaussian
`generalized y
`Gumbel
`inverse Gaussian
`Johnson’s SU (µ the mean)
`Johnson’s original SU
`logistic
`log normal
`log normal (Box-Cox)
`NET
`normal
`normal family
`power exponential
`reverse Gumbel
`skew power exponential type 1
`skew power exponential type 2
`skew power exponential type 3
`skew power exponential type 4
`shash
`skew t type 1
`skew t type 2
`skew t type 3
`skew t type 4
`skew t type 5
`t Family
`Weibull
`Weibull (PH)
`Weibull (µ the mean)
`zero adjusted IG
`
`R Name
`BE()
`BEOI()
`BEZI()
`BEINF()
`BCCG()
`BCPE()
`BCT()
`EXP()
`exGAUS()
`EGB2()
`GA()
`GB1()
`GB2()
`GG()
`GIG()
`GT()
`GU()
`IG()
`JSU()
`JSUo()
`LO()
`LOGNO()
`LNO()
`NET()
`NO()
`NOF()
`PE()
`RG()
`SEP1()
`SEP2()
`SEP3()
`SEP4()
`SHASH()
`ST1()
`ST2()
`ST3()
`ST4()
`ST5()
`TF()
`WEI()
`WEI2()
`WEI3()
`ZAIG()
`
`µ
`logit
`logit
`logit
`logit
`identity
`identity
`identity
`log
`identity
`identity
`log
`logit
`log
`log
`log
`identity
`identity
`log
`identity
`identity
`identity
`log
`log
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`identity
`log
`log
`log
`log
`
`σ
`logit
`log
`log
`logit
`log
`log
`log
`-
`log
`identity
`log
`logit
`identity
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`
`ν
`-
`logit
`logit
`log
`identity
`identity
`identity
`-
`log
`log
`-
`log
`log
`identity
`identity
`log
`-
`-
`identity
`identity
`-
`-
`ﬁxed
`ﬁxed
`-
`identity
`log
`-
`identity
`identity
`log
`log
`log
`identity
`identity
`log
`log
`identity
`log
`-
`-
`-
`logit
`
`τ
`-
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`log
`log
`-
`-
`log
`-
`-
`log
`log
`-
`-
`-
`ﬁxed
`-
`-
`-
`-
`log
`log
`log
`log
`log
`log
`log
`log
`log
`log
`-
`-
`-
`-
`-
`
`Table 1: Continuous distributions implemented within the gamlss packages (with default link
`functions).
`
`00005
`
`

`6
`
`GAMLSS in R
`
`Distributions
`beta binomial
`binomial
`Delaporte
`Negative Binomial type I
`Negative Binomial type II
`Poisson
`Poisson inverse Gaussian
`Sichel
`Sichel (µ the mean)
`zero inﬂated poisson
`zero inﬂated poisson (µ the mean)
`
`R Name
`BB()
`BI()
`DEL()
`NBI()
`NBII()
`PO()
`PIG()
`SI()
`SICHEL()
`ZIP()
`ZIP2()
`
`µ
`logit
`logit
`log
`log
`log
`log
`log
`log
`log
`log
`log
`
`σ
`log
`-
`log
`log
`log
`-
`log
`log
`log
`logit
`logit
`
`ν
`-
`-
`logit
`-
`-
`-
`-
`identity
`identity
`-
`-
`
`Table 2: Discrete distributions implemented within the gamlss packages (with default link
`functions).
`
`All of the distributions in Tables 1 and 2 have d, p, q and r functions corresponding respec-
`tively to the probability density function (pdf), the cumulative distribution function (cdf),
`the quantiles (i.e., inverse cdf) and random value generating functions. For example, the
`gamma distribution has the functions dGA, pGA, qGA and rGA. In addition each distribution
`has a ﬁtting function which helps the ﬁtting procedure by providing link functions, ﬁrst and
`(exact of approximate) expected second derivatives, starting values etc. All ﬁtting functions
`have as arguments the link functions for the distribution parameters. For example, the ﬁtting
`function for the gamma distribution is called GA with arguments mu.link and sigma.link.
`The default link functions for all gamlss.family distributions are shown in columns 3–6 of
`Tables 1 and 2. The function show.link() can be used to identify which are the available
`links for the distribution parameter within each of the gamlss.family. Available link func-
`tions can be the usual glm() link functions plus logshifted, logitshifted and own. The
`own option allows the user to deﬁne his/her own link function, for an example see the help
`ﬁle on the function make.link.gamlss().
`There are several ways to extend the gamlss.family distributions. This can be achieved by
`• creating a new gamlss.family distribution
`• truncating an existing gamlss.family
`• using a censored version of an existing gamlss.family
`• mixing diﬀerent gamlss.family distributions to create a new ﬁnite mixture distribution.
`
`To create a new gamlss.family distribution is relatively simple, if the pdf function of the
`distribution can be evaluated easily. To do that, ﬁnd a ﬁle of a current gamlss.family
`distribution, (having the same number of distribution parameters) and amend accordingly.
`For more details, on how this can be done, see Stasinopoulos et al. (2006, Section 4.2).
`Truncating existing gamlss.family distributions can be achieved by using the add-on package
`gamlss.tr. The function gen.trun(), within the gamlss.tr package, can take any gamlss.family
`
`00006
`
`

`Journal of Statistical Software
`
`7
`
`distribution and generate the d, p, q, r and ﬁtting R functions for the speciﬁed truncated dis-
`tribution. The truncation can be left, right or in both tails of the range of the response y
`variable.
`The package gamlss.cens is designed for the situation where the response variable is censored
`or, more generally, it has been observed in an interval form, e.g., (3, 10] an interval from
`3 to 10 (including only the right end point 10). The function gen.cens() will take any
`gamlss.family distribution and create a new function which can ﬁt a response of “interval”
`type. Note that for “interval” response variables the usual likelihood function for independent
`response variables deﬁned as
`
`n(cid:89)
`
`L(θ) =
`
`f(yi|θ)
`
`changes to
`
`i=1
`
`n(cid:89)
`
`i=1
`
`L(θ) =
`
`[F (y2i|θ) − F (y1i|θ)]
`
`where F (y) is the cumulative distribution function and (y1i, y2i) is the observed interval.
`Finite mixtures of gamlss.family distributions can be ﬁtted using the package gamlss.mx.
`A ﬁnite mixture of gamlss.family distributions will have the form
`
`(7)
`
`(8)
`
`(9)
`
`K(cid:88)
`is the prior (or mixing) probability of component k, for k = 1, 2, . . . , K. Also (cid:80)K
`
`fY (y|ψ) =
`
`πkfk(y|θk)
`
`k=1
`
`where fk(y|θk) is the probability (density) function of y for component k, and 0 ≤ πk ≤ 1
`k=1 πk = 1
`and ψ = (θ, π) where θ = (θ1, θ2, . . . , θk) and π = (π1, π2, . . . , πK). Any combination
`of (continuous or discrete) gamlss.family distributions can be used. The model in this
`case is ﬁtted using the EM algorithm. The component probability (density) functions may
`have diﬀerent parameters (ﬁtted using the function gamlssMX()) or may have parameters in
`common (ﬁtted using the function gamlssNP()). In the former case, the mixing probabilities
`may also be modelled using explanatory variables and the ﬁnite mixture may have a zero
`component (e.g., zero inﬂated negative binomial etc.). Both functions gamlssMX()) and
`gamlssNP() are in the add on package gamlss.mx.
`
`1.4. Available additive terms in GAMLSS
`
`Equation (1) allows the user to model all the distribution parameters µ, σ, ν and τ as
`linear parametric and/or non-linear parametric and/or non-parametric (smooth) function of
`the explanatory variables and/or random eﬀects terms. For modelling linear functions the
`Wilkinson and Rogers (1973) notation as applied for model formulae in the S language by
`Chambers and Hastie (1992) can be used.
`It is the model formulae notation used in the
`ﬁt of linear models, lm(), and generalized lineal models, glm(), see for example Venables
`and Ripley (2002, Section 6.2). For ﬁtting non-linear or non-parametric (smooth) functions
`or random eﬀects terms, an additive term function has to be ﬁtted. Parametric non-linear
`models can be also ﬁtted using the function nlgamlss() of the add-on package gamlss.nl.
`
`00007
`
`

`8
`
`GAMLSS in R
`
`Additive terms
`Cubic splines
`Varying coeﬃcient
`Penalized splines
`loess
`Fractional polynomials
`Power polynomials
`non-linear ﬁt
`Random eﬀects
`Random eﬀects
`Random coeﬃcient
`
`R Name
`cs()
`vc()
`ps()
`lo()
`fp()
`pp()
`nl()
`random()
`ra()
`rc()
`
`Table 3: Additive terms implemented within the gamlss packages.
`
`(cid:104)
`
`λ(cid:82) ∞
`
`(cid:105)2
`
`Table 3 shows the additive term functions implemented in the current R implementation of
`GAMLSS. Note that all available additive terms names are stored in the list .gamlss.sm.list.
`The cubic spline function cs() is based on the smooth.spline() function of R and can be
`used for univariate smoothing. Cubic splines are covered extensively in the literature (Reinsch
`1967; Green and Silverman 1994; Hastie and Tibshirani 1990, Chapter 2). They assume in
`model (2) that the functions h(t) are arbitrary twice continuously diﬀerentiable functions
`and we maximize a penalized log likelihood, given by (cid:96) subject to penalty terms of the form
`(cid:48)(cid:48)(t)h
`dt. The solution for the maximizing functions h(t) are all natural cubic splines,
`−∞
`
`and hence can be expressed as linear combinations of their natural cubic spline basis functions
`(de Boor 1978). In cs() each distinct x-value is a knot.
`The varying coeﬃcient terms were introduced by Hastie and Tibshirani (1993) to accommo-
`date a special type of interaction between explanatory variables. This interaction takes the
`form of β(r)x, that is the linear coeﬃcient of the explanatory variable x is changing smoothly
`according to another explanatory variable r. In some applications r will be time. In general
`r should be a continuous variable, while x can be either continuous or categorical. In the
`current GAMLSS implementation x has to be continuous or a two level factor with levels 0
`and 1.
`Penalized splines were introduced by Eilers and Marx (1996). Penalized Splines (or P-splines)
`are piecewise polynomials deﬁned by B-spline basis functions in the explanatory variable,
`where the coeﬃcients of the basis functions are penalized to guarantee suﬃcient smoothness
`(see Eilers and Marx 1996). More precisely consider the model θ = Z(x)γ where θ can be
`any distributional parameter in a GAMLSS model, Z(x) is n × q basis design matrix for the
`explanatory variable x deﬁned at q-diﬀerent knots mostly within the range of x and γ is a
`q × 1 vector of coeﬃcients which have some stochastic restrictions imposed by the fact that
`Dγ ∼ N(0, λ−1I) or equivalently by γ ∼ N(0, λ−1K−) where K = D(cid:62)D. The matrix D is a
`(q− r)× q matrix giving rth diﬀerences of the q-dimensional vector γ. So to deﬁne a penalized
`spline we need:
`i) q the number of knots in the x-axis deﬁned by argument ps.intervals
`(and of course where to put them; ps() uses equal spaces in the x-axis), ii) the degree of
`the piecewise polynomial used in the B-spline basis so we can deﬁne X, deﬁned by argument
`degree iii) r the order of diﬀerences in the D matrix indicating the type of the penalty
`imposed in the the coeﬃcients of the B-spline basis functions, deﬁned by argument order
`
`00008
`
`

`Journal of Statistical Software
`
`9
`
`and iv) the amount of smoothing required deﬁned either by the desired equivalent degrees
`of freedom deﬁned by argument df (or alternative by the smoothing parameter deﬁned by
`argument lambda). The ps() function in gamlss is based on an S-PLUS function of Marx
`(2003).
`The function lo() allows the user to use a loess ﬁt in a GAMLSS formula. A loess ﬁt is
`a polynomial (surface) curve determined by one or more explanatory (continuous) variables,
`which are ﬁtted locally (see Cleveland, Grosse, and Shyu 1993). The implementation of the
`lo() function is very similar to the function with the same name in the S-PLUS implemen-
`tation of gam. However gamlss lo() function uses the R loess() function as its engine and
`this creates some minor diﬀerences between the two lo() even when the same model is ﬁtted.
`lo() is the only function currently available in gamlss which allows smoothing in more than
`one explanatory (continuous) variables.
`The fp() function is an implementation of the fractional polynomials introduced by Royston
`and Altman (1994). The functions involved in fp() and bfp() are loosely based on the
`fractional polynomials function fracpoly() for S-PLUS given by Ambler (1999). The function
`bfp generates the correct design matrix for ﬁtting a power polynomial of the type b0 + b1xp1 +
`b2xp2 + ... + bkxpk. For given powers p1, p2, ..., pk, given as the argument powers in bfp(),
`the function can be used to ﬁt power polynomials in the same way as the functions poly() or
`bs() of the package splines are used to ﬁt orthogonal or piecewise polynomials respectively.
`The function fp() (which uses bfp()) works as an additive smoother term in gamlss. It is
`used to ﬁt the best fractional polynomials among a speciﬁc set of power values. Its argument
`npoly determines whether one, two or three fractional polynomials should used in the ﬁtting.
`For a ﬁxed number npoly the algorithm looks for the best ﬁtting fractional polynomials in
`the list c(-2, -1, -0.5, 0, 0.5, 1, 2, 3). Note that npoly=3 is rather slow since it ﬁts
`all possible 3-way combinations at each backﬁtting iteration.
`The power polynomial function pp() is an experimental function and is designed for the
`situation in which the model is in the form b0 + b1xp1 + b2xp2 with powers p1, p2 to be
`estimated non-linearly by the data. Initial values for the non-linear parameters p1, p2 have to
`be supplied.
`The function nl() exists in the add-on package gamlss.nl designed for ﬁtting non-linear para-
`metric models within GAMLSS. It provides a way of ﬁtting non-linear terms together with
`linear or smoothing terns in the same model. The function takes a non-linear object, (cre-
`ated by the function nl.obs), and uses the R nlm() function within the backﬁtting cycle of
`gamlss(). The success of this procedure depends on the starting values of the non-linear
`parameters (which must be provided by the user). No starting values are required for the
`other, e.g., linear terms, of the model.
`The function random() allows the ﬁtted values for a factor (categorical) predictor to be shrunk
`towards the overall mean, where the amount of shrinking depends either on the parameter
`λ, or on the equivalent degrees of freedom (df). This function is similar to the random()
`function in the gam package of Hastie (2006) documented in Chambers and Hastie (1992).
`The function ra() is similar to the function random() but its ﬁtting procedure is based on
`augmented least squares, a fact that makes ra() more general, but also slower to ﬁt, than
`random(). The random coeﬃcient function rc() is experimental. Note that the “random
`eﬀects” functions, random(), ra() and rc() are used to estimate the random eﬀect γ’s given
`the hyperparameters λ’s.
`In order to obtain estimates for the hyperparameters, methods
`
`00009
`
`

`10
`
`GAMLSS in R
`
`discussed in Rigby and Stasinopoulos (2005, Appendix A) can be used. Alternatively, for
`models only requiring a single random eﬀect in one distribution parameter only, the function
`gamlssNP() of the package gamlss.mx, which uses Gaussian quadrature, can be used.
`The gamlss() function uses the same type of additive backﬁtting algorithm implemented
`in the gam() function of the R package gam (Hastie 2006). Note that the function gam()
`implementation in the R recommended package mgcv (Wood 2001) does not use backﬁtting.
`The reason that we use backﬁtting here that it is easier to extend the algorithm so new
`additive terms can be included.
`Each new additive term in the gamlss() requires two new functions. The ﬁrst one, (the one
`that is seen by the user) is the one which deﬁnes the additive term and sets the additional
`required design matrices for the linear part of the model. The names of the existing additive
`functions are shown in the second column of Table 3. For example cs(x) deﬁnes a cubic
`smoothing spline function for the continuous explanatory variable x. It is used during the
`deﬁnition of the design matrix for the appropriate distribution parameter and it adds a linear
`term for x in the design matrix. The second function is the one that actually performs the
`additive backﬁtting algorithm. This function is called gamlss.name() where the name is one
`of the names in column two of Table 3. For example the function gamlss.cs() performs the
`backﬁtting for cubic splines. New additive terms can be implemented by deﬁning those two
`functions and adding the new names in the .gamlss.sm.list list.
`The general policy when backﬁtting is used in gamlss() is to include the linear part of
`an additive term in the appropriate linear term design matrix. For example, in the cubic
`spline function cs() the explanatory variable say x is put in the linear design matrix of the
`appropriate distribution parameter and the smoothing function is ﬁtted as a deviation from
`this linear part. This is equivalent of ﬁtting a modiﬁed backﬁtting algorithm, see Hastie
`and Tibshirani (1990). In other additive functions where the linear part is not needed (or
`deﬁned) a column on zeros is put in the design matrix. For example, this is the case when
`the fractional polynomials additive term fp() is used.
`If the user wishes to create a new additive term, care should be taken on how the degrees of
`freedom of the model are deﬁned. The degrees of freedom for the (smoothing) additive terms
`are usually taken to be the extra degrees of freedom on top of the linear ﬁt. For example to
`ﬁt a single smoothing cubic spline term for say x with 5 total degrees of freedom, cs(x,df=3)
`should be used since already 2 degrees of freedom have been used for the ﬁtting of the constant
`and the linear part of the explanatory variable x. This is diﬀerent from the s() function of
`the gam package which uses s(x,df=4), assuming that only the constant term has been ﬁtted
`separately. After a GAMLSS model containing additive (smoothing) terms is used to ﬁt a
`speciﬁc distribution parameter the following components are (usually) saved for further use.
`In the output below replace mu with sigma, nu or "tau" if a distribution parameter other
`that mu is involved.
`
`mu.s: a matrix, each column containing the ﬁtted values of the smoothers used to model the
`speciﬁc parameter. For example given a ﬁtted model say mod1, then mod1$mu.s would
`access the additive terms ﬁtted for mu.
`
`mu.var: a matrix containing the estimated variances of the smoothers.
`
`mu.df: a vector containing the extra degrees of freedom used to ﬁt the smoothers.
`
`00010
`
`

`Journal of Statistical Software
`
`11
`
`mu.lambda: a vector containing the smoothing parameters (or random eﬀects hyperparame-
`ters).
`
`mu.coefSmo: a list containing coeﬃcients or other components from the additive smooth
`ﬁtting.
`
`1.5. The GAMLSS algorithms
`
`There are two basic algorithms used for maximizing the penalized likelihood given in (6).
`The ﬁrst, the CG algorithm, is a generalization of the Cole and Green (1992) algorithm—and
`uses the ﬁrst and (expected or approximated) second and cross derivatives of the likelihood
`function with respect to the distribution parameters θ = (µ, σ, ν, τ) for a four parameter
`distribution. Note that we have dropped the subscripts here to simplify the notation. However
`for many population probability (density) functions, f(y|θ), the parameters θ are information
`orthogonal (since the expected values of the cross derivatives of the likelihood function are
`zero), e.g., location and scale models and dispersion family models, or approximately so. In
`this case the simpler RS algorithm, which is a generalization of the algorithm used by Rigby
`and Stasinopoulos (1996a,b) for ﬁtting mean and dispersion additive models (MADAM), and
`does not use the cross derivatives, is more suited. The parameters θ = (µ, σ) are fully
`information orthogonal for distributions NBI, GA, IG, LO and NO only in Table 1. Nevertheless,
`the RS algorithm has been successfully used for ﬁtting all distributions in Tables 1 and 2,
`although occasionally it can be slow to converge. Note also that the RS algorithm is not a
`special case of the CG algorithm.
`The object of the algorithms is to maximize the penalized likelihood function (cid:96)p, given by (6),
`for ﬁxed hyperparameters λ. For fully parametric models, (3) or (5), the algorithms maximize
`the likelihood function (cid:96). The algorithms are implemented in the option method in the function
`gamlss() where a combination of both algorithms is also allowed. The major advantages of
`the algorithms are i) the modular ﬁtting procedure (allowing diﬀerent model diagnostics
`for each distribution parameter); ii) easy addition of extra distributions; iii) easy addition
`of extra additive terms; and iv) easily found starting values, requiring initial values for the
`θ = (µ, σ, ν, τ) rather than for the β parameters. The algorithms have generally been found to
`be stable and fast using very simple starting values (e.g., constants) for the θ parameters. The
`function nlgamlss() in the package gamlss.nl provides a third algorithm for ﬁtting parametric
`linear or non-linear GAMLSS models as in equations (3) or (5) respectively. However the
`algorithm needs starting values for all the β parameters, rather than θ = (µ, σ, ν, τ), which
`can be diﬃcult for the user to choose.
`Clearly, for a speciﬁc data set and model, the (penalized) likelihood can potentially have
`multiple local maxima. This is investigated using diﬀerent starting values and has generally
`not been found to be a problem in the data sets analyzed, possibly due to the relatively large
`sample sizes used.
`Singularities in the likelihood function similar to the ones reported by Crisp and Burridge
`(1994) can potentially occur in speciﬁc cases within the GAMLSS framework, especially when
`the sample size is small. The problem can be alleviated by appropriate restrictions on the
`scale parameter (penalizing it for going close to zero).
`
`0

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go Monthly

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases