`
`1
`
`4?
`mi; “PMm-Lflfigm.~
`
`“y #59447
`
`JOUFSUIJOY FYIJCJU 1014
`
`
`TRULIA - EXHIBIT 1010
`
`0001
`
`0001
`
`
`
`
`
`PROPERTY TAX JOURNAL
`Vol. 6 - No. 34- September 1987
`
`;
`__
`E'ditOrs'vl
`Richard R. Almy
`Annie Aubrey -
`
`’
`
`Editorial'Board
`Barry Murphy", Chairman
`Barbara A. Alig ’
`'
`_
`Gerald E. Daigle
`.George C. Keyes
`James P. Maley, Jr.
`Philip J. Waterman
`
`The PropErty Tax Journal (ISSN 0731-0285) is published quarterly by the
`International Association of Assessing Officers with editorial offices at
`1313 East 60th Street, Chicago; Illinois 60637-9990. Copyright © 1987
`by the International Association of Assessing Officers, all rights reserved.
`The statements made or views expressed in the Property Tax Journal are
`those of the authors and do not necessarily reflect the viewpoint or policies
`of the International Association of Assessing Officers.
`
`Indexed in PAIS Bulletin, published by the Public Affairs
`Information Service, Inc., New York, New York.
`
`Subscription rates:
`
`One-year Subscriptions:
`Members residing in the US.
`Members not residing in the U.S.
`Nonmembers residing in the US.
`Nonmembers not residing in the US.
`
`Two—year subscriptions:
`Members residing in the US.
`Members not residing in the US.
`Nonmembers residing in the US.
`Nonmembers not residing in the US.
`
`$20
`$24
`$24
`$28
`
`$36
`$44
`$44
`$52
`
`Additional copies $12 (IAAO members, $10)
`
` This publication
`is available in ‘microform.
` UNIVERSITY MICROFILMS INTERNATIONAL
`
`300 North Zeeb Road, Dept. P.F., Ann Arbor, MI 48106
`
`ii
`
`0002
`
`0002
`
`
`
`This material may be protected by Copyright law (Title 17 US. Code)
`!"#$ %&'()#&* %&+ ,( -).'(/'(0 ,+ 1.-+)#2"' *&3 4!#'*( 56 7898 1.0(;
`
`Alternative Modeling Techniques in
`Computer-Assisted Mass Appraisal*
`
`David LIJensen
`
`David L Jensen is research data analyst, Sigma Systems Technology,
`'
`Ina, Buffalo, New York.
`
`I
`
`
`
`' As experience has been gained in the use of multiple linear regression
`based techniques in computer-assisted mass appraisal, several prob-
`lems have become apparent: (1) the generated modelsfor similarjuris-
`dictions are usually inconsistent with one another, (2) the generated
`models are often. inconsistent with known market influences, prior ex-
`pectations, and the more conventional appraisal techniques, (3) the gen-
`erated models often lack consequential descriptors or are structurally
`unstable,
`(4) the generated models are derived exclusively from the
`available sales with no provisionfor incorporating prior information, (5)
`the generated models are often interpretationally too complex to be eas-
`ily explained or readily defended,
`(6) the generated models are not
`always easily decomposable into separate land and building value
`components, (7) the generated models cannot be readily updated on an
`annual basis without introducing oftentimes wildvalue fluctuations,
`and (8) the modelings require substantial technical expertise. These
`problems cannot be ignored, nor can they be easily circumvented.
`This paper discusses the proper formulation of market models and
`then, ivithin thisframework, presents several modeling alternatives in-
`cluding (1 ) stepwise (withforcedpredictor insertion), cOnstrained, ridge,
`\
`
`‘This study was done in part under contract for the New York State Division of Equal-
`ization and Assessment. The article was extracted from the more comprehensive project
`report entitled “Valuation Methodologies Investigation" (Calspan Report No. 6994-8-5)
`prepared under contract for the New York, State Division of Equalization and Assessment
`in December 1982. It has been adapted for general technical review. Copyright David L.
`Jensen, 1984.
`_
`~
`
`ALTERNATIVE MODELING TECHNIQUES 193
`
`
`
`0003
`
`0003
`
`
`
`
`
`and Bayesian regression options within the multiple linear regression
`technique,
`(2) non-linear regression analysis with constrained and
`Bayesian regression options uSing both derivative-based and derivative-
`free algorithms, (3) adaptive estimation (feedback) with constrained
`modeling options, and (4) iterative correlative estimation. Each of these
`modeling alternatives can be implemented in computer-assisted mass
`appraisal to preclude many, if not most, of the‘problems experienced in
`the past. Theoretical comparisons of these techniques are made based
`on their respective mathematical and statistical properties to facilitate
`selection of the better alternativesforfurther study and eventual imple-
`mentation.
`
`Introduction
`
`In computer-assisted mass appraisal, conventional multiple linear regres-
`sion analysis is often used to functionally relate recorded sales prices to
`various property descriptors by best fitting a linear multivariate model of
`the general form:A
`
`’Y=.ao+a1X1+a2X2+...+aka
`
`(1)
`
`'
`
`(where Y represents the estimated sale price and X1, X2“ . ., Xk represent
`the k property descriptors respectively) to a set of n available sales records.
`The k+1 coefficients of the model (a0, a1, a2,. . .,a,;) are determined
`such that the sum of the squared deviations of the predicted values (Y)
`from the recorded values (Y) of the sale prices (the residual errors) is min-
`imized (the least squares solution) or the chance that these estimators
`are the most probable is maximized (the maximum likelihood solution).
`The resulting model is then used directly or indirectly to estimate the
`sales prices (as indicators of the fair market values) of all of the properties,
`whether recently sold or not, based on their respective property charac-
`teristics or to compute the various “cost to cure" adjustments necessary
`to correct the actual sales prices of selected recent comparable sales for
`the property characteristic differences between the subject propertybeing '
`valued and each comparable sale (in order to ascertain what each com-
`parable property would have 'sold for had it been identical to the subject,
`in a physical descriptive sense, and had it sold on the valuation date rather
`than on the actual sale date).
`'
`
`The usefulness of these mathematical and statistical modeling tech-
`niques for predicting the fair market values (in terms of localized, up-to-
`date, unbiased, and equitable sale price estimates) of residential improved
`urban and suburban real properties has been repeatedly demonstrated
`with remarkable success in several very different areas of the country over
`the last several years.
`
`194 DAVID L. JENSEN
`
`
`
`0004
`
`0004
`
`
`
`However, as experience has been gained in the use of these multiple
`linear regression based techniques, several problems have become appar-
`cut:
`
`1. The generated models for similar, possibly adjoining, jurisdictions (for
`example, neighborhoods and towns) are usually inconsistent with one
`another.
`
`2. The generated models are often inconsistent with known market influ-
`ences (that is, current purchase and installation costs, building costs,
`labor rates, interest or inflation rates, and so on), with prior expecta-
`tions (based on past appraisal experience), and with other appraisal
`techniques such as the depreciated replacement or reproduction cost
`and the capitalization of income approaches to valuation.
`3. The generated models are often incomplete (that is, lacking terms con-
`taining consequential descriptors known to affect value) or are struc-
`turally unstable (that is, adversely affected by the addition or deletion
`of just a few sales).
`4. The generated models are derived exclusively from the available sales
`with no provision made to incorporate prior information or experience.
`5. The generated models are often too complex to be easily explained or
`readily defended.
`6. The generated modéls cannot always be easily decomposed into sepa-
`rate reliable land and building improvement value compenents.
`7. The generated models cannot be readily updated on an annual basis
`without introducing wild value fluctuations.
`8. The modelings require substantial mathematical and statistical mod-
`- eling expertise as well as some appraisal‘experience in their initial set-
`up and subsequent evaluations and revisions, so they can only be per-
`formed by highly trained technical personnel.
`
`These problems, which plague even experienced analysts, cannot be 1g?
`nored nor easily circumvented. However, there are several modeling alter-
`natives to conventional multiple linear regression analysis that can be
`implemented in Computer-assisted mass appraisal to preclude. many, if
`not most, ofthe problems experienced in the past.
`‘ This paper introduces the mathematical and statistical foundations of
`the more prevalent modeling alternatives and presents theoretical com-
`parisons to facilitate selection of the better alternatives for further study
`and eventual implementation in computer-assisted mass appraisal.
`
`Market Model Formulations
`
`The generation of an effective market model depends both on its mathe-
`matical formulation and on its statistical fit to the available sales. Its
`
`mathematical formulation should properly reflect the actual economic in-
`
`ALTERNATIVE MODELING TECHNIQUES 195
`
`
`
`
`
`0005
`
`0005
`
`
`
`
`
`fluences Within the real-world market, and its statistical fit should capture
`all of the significant trends within the'available sales.
`
`Additive vs. Multiplicative Models
`
`The multiple linear regression model assumes strict additivity (that is,
`that the parcel descriptors, each multiplied by its per-unit value, can be ~
`simply added together to get the total market value of the property in
`question) as in equation (1).
`Altliough'the value contributions of the land areas or frontages, land
`improvements such as the lineal feet of curb and gutter, finished and
`unfinished building areas, and building enhancements such as fireplaces,
`central air conditioning, porches, patios, decks, and pools, may indeed be
`additive, other descriptors, such as location, accessibility, quality of view,
`extent oLlandscaping, severity of traffic flow, and the building's style,
`grade, condition, and age, as well as inflation, are usually multiplicative,
`reflecting simple or compounding percentage-type adjustments.
`Strictly multiplicative models can be accommodated in a multiple linear
`regression analysis by simply applying a logarithmic transformation. For
`example, if the logarithm of the dependent variable (sale price) is fit to a
`linear combination of the property descriptors:
`
`long=ao‘+ a1X1+aka+...+aka.
`
`[2)
`
`then the result is mathematically equivalent to the multiplicative model:
`
`i? = bao bum b“2x2 .
`
`.
`
`. bakxk.
`
`(3)
`
`An individual term in this model represents a compounding percentage
`adjustment since:
`
`(1+rt)xt _—._ C‘Xl = (bagxt = batxt,
`
`[4)
`
`(where r, is the percentage change compounded over the number of units
`of xi).
`Similarly, if the logarithm of the dependent variable (sale price) is fit to
`a linear combination of the logarithms of the property descriptors:
`(5)
`10gb Y = a0 + allong1L + @10ng2 + .
`.
`. + aklongk ,
`then the result is mathematically equivalent to the multiplicative model:
`
`it = Ioaoxla1 X242 .
`
`.
`
`. Xkak,
`
`(6)
`
`whose individual terms are representative of exponential scalings.
`Using logarithmic transformations, then, the practitioner can accom-
`modate either type of multiplicative model or even a combination of the
`two such as:
`
`v = bao baIXI'Xzaz .
`
`.
`
`.
`
`.
`
`.
`
`(r)
`
`196 DAVID L. JENSEN
`
`
`
`0006
`
`0006
`
`
`
`
`
`However, the logarithmic transformation is ineffective for accommodating
`mixed or hybrid additive-multiplicative models such as:
`
`if = bao balxl + azx2 .
`
`. .,
`
`(8)
`
`which are not intrinsically linear in their parameters. such mixed, or
`hybrid, models are more realistic in terms of actual market influences
`because some descriptors, such as inflation and natural aging, contribute
`non-offsetting compounding multiplicative effects; others, such as loca-
`tion, accessibility, quality of View, extent of landscaping, severity of traffic
`flow, and the building’s style, grade, and condition or depreciation, are
`usually handled as simple percentage adjustments; and the majority of
`individual improvements, such as fireplaces, central air conditioning,
`porches, patios, decks, and swimming pools, tend to contribute additive
`dollar amounts.
`
`Fortunately, however, such hybrid additive-multiplicative models can
`be reformulated into a strictly linear form (in the coefficients), to enable
`multiple linear regression analysis to be used. The reformulation requires
`expressing each simple percentage adjustment by the multiplier:
`
`'
`
`u’ + r,X,)
`
`(9)
`
`(Where r, represents the percentage change in decimal form per unit change
`in X1) and approximating each compounding effect with the same type of
`multiplier:
`-
`’
`
`(l + r,)Xt z (I + r,X,),
`(10)
`which is valid for small r,. Then each multiplicative expression in turn
`can be expanded algebraically into a strictly additive set of terms:
`
`(1 + r,X,)Y = (1 + rtXt) (a0 + a1X1 + a2X2 + ..
`
`+ aka)
`--.= aod+ a1X1 + a2X2 +
`
`+ 61ng
`
`«i
`
`+- rtaoX, + rtaleX, + 'rtazXzX, + .
`
`.
`
`.
`
`+ rlakaXt
`
`ao+ ale .+ 0th2 +
`
`+ aka I
`
`4%— akHX, + ak+2X1X, + ak+3X2X, + .
`
`.
`
`.
`
`+ 0lzxc+1X '1 1
`
`(11)
`
`enabling multiple linear regression analysis to be used for determining
`the various coefficients.
`
`ALTERNATIVE MODELING TECHNIQUES 197
`
`
`
`0007
`
`0007
`
`
`
`
`
`Market Models
`
`In a mass appraisal application, the market model should be formulated
`as the sum of the basic land value (BLV) qualified by a set of land adjust-
`ments (LA), the basic building value (BBV) qualified by a set of building
`adjustments (BA), and the values of other improvements (0V) not needing
`separate qualifying adjustments, with the sum being qualified by a set of
`global adjustments (GA):
`
`1? = GA [LA(BLV) + BA(BBV) + 0V].
`
`’
`
`.
`
`(12)
`
`The basic landvalue is usually represented by a land constant (a0) and
`the sum of a per unit (that is, per front foot, per square foot, or peracre)
`rate (a,) times the land size in those units (X,) for each 'section of land
`having a different use within a parcel (for example, the primary building
`lot, excess acreage and undeveloped land, large garden acres or orchards,
`various crop-producing lands and barnyards, forest land, waste land, and
`muck land depending upon the property type under consideration), With
`optional additional terms for other land value contributions such as the
`lineal feet of curb and gutter:
`
`BLV = a0 + a,X, + amx,+1 + ....
`
`_
`
`(13)
`
`For primary building lots, the value per square foot can be expected to
`decrease with increasing lot size because such lots commandsa fairly con-
`stant market value in a residential neighborhood regardless of size. Any
`land not required for the residence is limited in size and use (because of
`normal zoning restrictions), so a buyer will not pay a proportionately higher
`price for it. Therefore, as lot size increases, its value will not increase
`proportionately, but will be prorated over more square feet. ’IWo common
`transgenerations that exhibit progressively decreasing per unit values with
`increasing size are the square root “and the logarithm. In a regression
`model, these two transgenerations differ primarily in the degree of cur-
`vilinearity they explain. Consequently, the use of either the logarithm or
`an appropriate root of the lot size (instead of the lot size itself) in the basic
`land value expression may improve the modeling substantially, but only
`if the lot sizes vary over a wide range. If they don’t, as is usual in a blocked
`residential setting or subdivision, then the degree of curvilinearity is se-
`verely limited, rendering the transgenerations ineffective.
`Similarly, the value of a property can also be expected to increase With
`increasing frontage, up to 'a point. Afterthat, little if any additional utility
`(such as parking spaCe) or benefit (such as extent of view or privacy) is
`realized from the excess frontage. Therefore, as the frontage increases,
`the value per front foot usually decreases because the fairly constant land
`value is prorated over more front feet. Again, the use of either the loga-
`rithm or an appropriate root of the lot frontage (instead of the lot frontage
`
`198 DAVID L. JENSEN
`
`0008
`
`0008
`
`
`
`itself] in the basic land value expression may improve the modeling, but
`only if the lot frontages vary over a Wide range.
`Even with a wide range of lot sizes or frontages, the transgenerations,
`which are difficult to explain, can be avoided by dividing the lot size or
`frontage into separate components (such as area under an acre or frontage
`under a hundred feet, and area or frontage in excess of these limits), each
`with a separate per unit rate. Hence, the formulation of the basic land
`value need not be mathematically subjective nor complex.
`.
`The land adjustments consist of a set of percentage qualifiers for such
`influences as general accessibility, quality of view, extent of landscaping,
`and severity of traffic flow, for residential properties; for farm or crop-
`producing lands, topography, soil rating, and availability ofwater; and for
`either, general accessibility./
`.
`LA =11 + rJXJ) (1 + rJHXJH) -
`
`(1.4)
`
`- ..
`
`,
`
`The basic building value is represented by a building constant (b0) and ‘
`the sum of the per unit (per square foot) rates (bi) times the finished and
`unfinished areas of the main floor, upper floors, basement, attic, open
`and enclosed porches, and so on, with additional terms-for other main
`building value-contributions such as the number of baths and bedrooms;
`the existence of a den, family room, utility room, dining room, or extra
`kitchen; and the capacityof attached or basement garages and attached ~
`carports,
`
`'
`
`BBV = b0 + kak + bk+',xk,,, + b,;+2X,c+2 +
`
`.
`
`(15)
`
`The building adjustments consist of a set of percentage qualifiers for
`such influences as building style (for example, ranch, raised ranch, split
`level, Cape Cod, Colonial, Victorian, or contemporary), grade, interior and]
`or exterior condition, types of exterior wall and roofing, and effective agei
`(sometimes in terms of the year built or the year last remodeled):
`'
`
`BA = (1 + reX) (1 + Te+1Xe+1) (1 + re+2Xe+2) .
`. ..
`(16)
`The values of other improvements (0V) are represented by thejsum of
`the per unit rates (cm) times the floor areas or capacities cf detached"
`garages, carports, sheds, barns, and so on, the sizes or existence ofpatios,
`decks, swimming pools, tennis courts, silos, and so On, and the lineal feet
`offencing:
`3
`
`0V = cme + cm+1 Xmfl + c,,,+2X,,1+2 + ... .
`
`[17)
`
`The global adjustments consist of a set of percentage qualifiers for such
`influences as location or neighborhood (which, if properly delineated, ac-
`count fdr the effects of proximity to schools, police and fire protection,
`hospitals, major employment centers, central business districts or major
`malls and shopping plazas, major recreational facilities such‘as beaches,
`
`ALTERNATIVE MODELING TECHNIQUES 199 '
`
`
`
`0009
`
`0009
`
`
`
`
`
`golf courses, and parks, and for the effects of socioeconomic, ethnic, and
`cultural differences) and inflation (in terms of the date of sale or valua-
`tlon]:
`
`GA = (1 + ran) (1 + rn+1Xn+1) .
`
`. ..
`
`(18)
`
`The specific property descriptors included in the actual market model
`formulation will depend, of course, on the property type being valued, on ~
`what descriptors are available within the data base, and on which ones
`are actually supported by a sufficient number of representative sales to
`enable the rates to be determined reliably. Nevertheless, when formulated
`in this general way, the model will be consistent both with the actual
`mechanisms affecting value in the real-world market (assuming that the
`land and building values are independent of one another) and with the
`more traditional appraisal techniques (which have been based on a fa-
`miliarity with the real-world market). Therefore, if the computed coeffi-
`cients (the per unit rates and percentage adjustments) are in line with
`normal expectations or known market influences, then the model should
`be reasonably explainable and defensible.
`Since the basic land, building, and other improvement values are linear
`(additive) expressions and the land, building, and global adjustments are
`multiplicative, as formulated, the market model is a hybrid additive-mul-
`tiplicative model that cannot be implemented in a multiple linear regres-
`sion analysis. However, through an extensive set of algebraic expansions,
`the model can be “linearized” in its coefficients so that it can be fit by
`conventional multiple linear regression techniques.
`
`Market Modeling Alternatives
`
`In order to determine the various per unit rates within the market model,
`the model must be fit to the verified recent sales from the local jurisdiction.
`\ Four different curve fitting or curve tracking procedures are available for
`this purpose: (1) multiple linear regression analysis, (2) non-linear regres-
`sion analysis, (3) Longini and Carbone’s adaptive estimation (feedback)
`procedure, and (4) Carlson’s iterative correlative estimation procedure.
`-
`/
`
`Multiple Linear Regression Analysis
`
`Multiple linear regression analysis can be implemented to fit a multivar-
`iate linear model of the form,
`
`Y=bo+b1X1 +b2X2+...+b,ch,
`
`(19)
`
`to the available sales by determining estimates of the k+1 coefficients
`such that (l) the sum of the squared deviations of the predicted values
`from the actual sales prices (the residual errors) is minimized (the least
`squares solution), or (2) the chance that these estimators are the most
`
`200 DAVID L. JENSEN
`
`0010
`
`0010
`
`
`
`probable is maximized (the maximum likelihood solution). In matrix form,
`the solution vector (B), consisting of the k regression coefficients (b1, b2,
`.
`. .bk), is computed as:
`
`B = (X’X)—1 X’Y,
`
`(20)
`
`where X is the n X k matrix of property descriptors (adjusted for their
`respective means) and Y is the n X 1 vector of recorded sales prices (ad-
`justed for their mean). The intercept (be) is computed as:
`k
`
`be = r — 2 b,x,.
`.
`(=1
`/
`The variance/covariance matrix of the solution vector is computed as:
`
`-
`
`(21)
`
`v (£3) = (X’X)‘ 1 62
`
`(22)
`
`where the residual variance (62) is estimated by the computed mean square
`error:
`
`62 = [Y'Y - Y’XlX’X)‘1X’Y]/(n—k— 1).
`
`,
`
`(23)
`
`Stepwise Alternatives
`in a multiple linear regression analysis, a modified Gauss-Jordan matrix
`inversion procedure known as the “sweep" method can be implemented
`to obtain the solution by either:
`-
`
`1. forward selection regression, Which attempts to build the “best" model
`by a one-at-a-time repetitive selection and subsequent insertion into
`the model of the most statistically significant candidate predictor not
`already included in the model at that step, terminating either when all
`of the significant candidate predictors have been entered (that is when
`insignificance has been reached] or when all of the candidate predic-
`tors, regardless of their statistical significance, have been entered;
`2. backward elimination regression, Which attempts to converge to the
`“best" model by beginning with the full model (containing all of the
`candidate predictors with the exception of those that would introduce
`effective singularity) and then repetitively eliminating from the model
`the least statistically significant predictor at that step, terminating
`either when all of the insignificant candidate predictors have been de-
`leted (that is, when significance has been reached) or when no more
`predictors remain in the model;
`3. forward stepwise regression, which attempts to converge to the “best”
`model by applying forward selection regression modified to eliminate
`in each step any predictor statistically significant at the time it was
`inserted into the model, but which has since become statistically in-
`
`ALTERNATIVE MODELING TECHNIQUES 201
`
`
`
`0011
`
`0011
`
`
`
`
`
`significant [in the presence of other predictors inserted later in the
`process) before proceeding with the next variable insertion, terminat-
`ing only when no further additions or deletions can be made based on
`the respective significance statistics;
`V 4. backward stepwise regression, which attempts to converge to the “best"
`model by applying backward elimination regression modified to rein-
`sert in each step any candidate predictor statistically insignificant at
`the time it was deleted, but which has since become statistically sig-
`nificant and important to the regression (in the absence of other pre-
`dictors deleted later in the process) before proceeding with the next
`variable deletion, terminating only when no further additions or“ dele-
`tions can be made based on significance statistics;
`A
`5. user-specified stepwise regression, which performs a sequence of user-
`speciiied pivots (that is, a set of one-at-a-time predictor insertions or
`deletibns) wherein a pivot using the index of a candidate predictor not
`in the model at that step causes that variable to be inserted, and a
`pivot using the index of a predictor in the model at that step causes
`that variable to be deleted; or
`6. nonstepwise (full) regression, which computes the full model (contain-
`ing all of the candidate predictors, regardless of their statistical sig-
`nificances, excluding only those having zero variance and those that
`would introduce effective singularity].
`'
`
`.
`
`The various stepwise alternatives, eXCept possibly the user-specified one,
`each attempt to isolate the “best" subset model by ensuring that all of the
`candidate predictors included in the model are statistically significant
`and, therefore, important in the estimation, and all of those not in the
`model are statistically insignificant and, therefore, either extraneous or
`redundant to the regression. The forward selection and backward elimi-
`nation alternatives are usually used only in an exploratory sense and,
`hence, are often run to completion. (When they are, the step summary
`statistics can usually be compared to determine when the “best” subset
`model was achieved.) However, the directional procedures can each con-
`verge to a different solution. Usually, the results of the various stepwise
`alternatives will be mutually consistent with one another only when the
`respective candidate predictors are essentially uncorrelated.
`Since both the forward and backward stepwise approaches cope with
`the possibility that the statistical significances of the various candidate
`predictors might change in subsequent stages of the process (after they
`have been initially inserted into or deleted from the model], it can be
`assumed they they would be better alternatives than their single direc-
`tional counterparts. Furthermore, since the backward approaches con-
`sider, “catalytic effects" (wherein multiple predictors are statistically
`insignificant when considered individually but highly significant when
`
`
`
`2024 DAVID L. JENSEN
`
`0012
`
`0012
`
`
`
`considered jointly), it can be assumed that they would be more reliable
`than. their forward proceeding counterparts. Therefore, it can be inferred
`that backward stepwise regression is the best of the four directional a1-
`ternatives.
`
`Although the backward stepwise approach is generally the best alter-
`native, there are times when one of the other approaches may yield better
`results, or at least adequate or comparable results, with less computa-
`tional effort. The forward and backward stepwise approaches both at-
`tempt to maximize the predictive power of the model with a minimum
`number of terms, which, although optimal from a purely statistical point
`of view, may not be optimal from a more political point: of view.
`Politically, mass appraisal models should incorporate as many candi-
`date predictors as possible so that as much of the property description as
`is practical is taken into account in the appraisal process. It is usually
`not possible to convince a disgruntled taxpayer, a board of review, or the
`courts that certain property descriptors (particularly those they deem con-
`sequential) need not be considered in the valuation process.
`Although politically advisable, in actual practice it is usually not possible
`to retain all of the candidate predictors in a multiple linear regression
`analysis. When the hybrid additive-multiplicative market model is alge-
`braically expanded into a form strictly linear in its coefficients, the overall
`size of the model is increased immensely. It is not unusual for the ex-
`panded models to contain well over a hundred, and sometimes several
`hundred, predictors, most of which are transgenerations involving the
`cross products of subsets of the original descriptors.
`These large expanded models cannot be handled by multiple linear
`regression techniques for several reasons. (1) Even the largest multiple
`linear regression software packages have restrictions on the maximum
`number of candidate predictors that they can handle. Few allow more than
`a hundred variables in the regression because anything more becomes
`totally unmanageable both in terms of the internal. storage required (to
`accommodate the sums of squares/sums of cross products or simple linear '
`correlation coefficient matrix) and the computational error generated (which
`is compounded in the iterative matrix inversion process). (2)There seldom
`exists a sufficient number and balance of available sales to support the
`reliable computation of such a large number of regression coefficients. (3)-
`The higher-order cross products tend to be highly correlated with their
`lower-order counterparts regardless of what descriptors are involved, and
`this extreme multicollinearity usually causes intolerable instabilities, if
`not outrightabsurdities, in the computed regression coefficients.
`Therefore, in a multiple linear regression application, the higher-order
`interactions (the three or more variable cross product transgenerations)
`usually have to be eliminated as candidate predictors from the “linearized"
`expanded market model. Then, if sufficient sales exist, a full regression
`
`ALTERNATIVE MODELING TECHNIQUES 203
`
`
`
`
`0013
`
`0013
`
`
`
`
`
`
`
`can be attempted. However, because of the likelihood of inadequate sales
`to support some of the candidate predictors, highly skewed variables, and/
`or remnant multicollinearity (multicollinearity not eliminated by discard-
`ing the higher-order interactions), a stepwise regression analysis, with
`autdmatic variable screening to eliminate systematically further problem
`candidates, is usually advisable.
`
`Forced Predictor Insertions
`
`When a stepwise regression is implemented, certain politically important
`variables, such as lot size (land area) in a residential application, may not
`come into the model at all either because of being prescreened out (by the
`automatic variable screening algorithm) or because of insignificance at-
`tributable to either limited variabilities or consequential multicollineari-
`ties. Therefore, it may be desirable, if not absolutely necessary, to force
`certain key variables (such as the lot size and the main floor and upper
`floor areas) into the model regardless of their statistical properties. This
`cannot always be done by adjusting the significance criteria (the insertion
`and deletion F levels), so it might require a special option for forced pre-
`dictor insertions not available in most regression packages. Such an op-
`tion would permit even the stepwise models to be made more politically
`tenable and structurally consistent with the real-world market by ensur-
`ing that certain key property descriptors are represented in the appraisal
`process even though statistically they might be redundant in, or extra-
`neous to, the regression. Unlike the full regression models, however, these
`models enable a’prespecified subset of the candidate predictors to be un-
`conditionally taken into account in the appraisal process, while use of any
`other candidate is decided by its statistical significance.
`
`Constrained Regression
`
`In a multiple linear regression analysis, the computed coefficients may
`not conform to expectation or known market influences, particularly when
`variables are forced into the model when they would not otherwise have
`entered. This can be due to the effects of inadequate sales (wherein the
`coefficients are attempting to explain the individual contributions ofjust
`a few sales rather than the average contributions of the market influences
`in question), highly-skewed variables, extreme biasing outliers, low vari-
`ance, or an improperly formulated model. (For a comprehensive explana-
`tion of these problem effects, see Jensen 1980.) However, it is usually
`because of multicollinearity.
`
`If two variables (X, and XJ) in a multiple linear regression model:
`
`i7=bo+b1X1+...+b,X,+...+bJXJ+,...+b,¢Xk(24)
`
`‘
`
`204 DAVID L. JENSEN
`
`0014
`
`0014
`
`
`
`are linearly‘correlated:
`
`.Xz = a0 + C11 )9,
`
`[25)
`
`then the linear equivalency can be substituted for the ith predictor in the
`regression model to obtain:
`
`f’=bo+b1X1+...+b,(ao+a1XJ)+...+bJXJ
`
`+...+b,ch
`
`=(bo+b,ao)+b1X1+...+b,a1XJ
`
`+'...+bjx,+_...bkxk
`
`=bo’+b1X1+...+b,’XJ
`
`+...+bJXJ+...+kak,
`
`(26)
`
`which exposes a redund