`Liberty Mutual v. Progressive
`CBM2012-00002
`Page 00001
`
`
`
`© The Journal ofRI‘sk and Insurance, 1995 Vol. 62-, No. 3, 447-482
`
`Fuzzy Technlques of Pattern Recognruon in
`RiSk.and Claim Classrflcahon
`IKrz'ysIIztIof M.Ostaszewski
`
`Richard A D'e'rrig
`
`ABSTRACT
`
`"
`
`.2 f
`
`a
`
`Applications of fuzzy setIIItheory to property-liability and life?insurance have emerged
`~
`,
`. in the last few years through the work of lemaire (1990), Cummins and Derrig: (1993,
`I 1994),Iand Ostaszewski (1993). This article Continues, that line of research by providing an I
`overview of fuzzy pattern recognition techniques arid'using them111 clustering for risk and I
`, claims classification. The classic clustering problem of groupingItowns intorating territo- -'
`-'-ries (DuMouche'l;1983; Conger; 19-87)-is revisited using these fuzzymethods and 1987
`through 199.0 Massachusetts automobileinsurance data..-T11e_uew-;p1:oblem of classifying -.
`claimsin terms of suspected fraudrs also addressedusing these same fuzzy methods and
`data drawn from a study of 1989 bodily"Injury liability claims'in Massachusetts
`
`.
`
`Introductron
`
`InI-I1961' -II:'Zilsberg presented the foIIOWing- paradox A11 experimentIwas
`
`designed with two urns,each centaining 100- balls, of which the first one Was
`knOWn to contain 50 red balls and 50 black balls; while nofurther information
`was” given about the contents of the otherum If asked to bet on the color of
`a ball draWn from one of the urns, most people We‘re foundindifferentas to
`which color they would choose no matter whether-r the 5ball was drawn: from the
`firstIor the 'secOnd‘ur'n; Onthe other hand,- Ellsberg found that if peopleiivere
`asked Which um they would prefer to usefor': betting- on either-color; they.
`
`Richard A. Den-ig is senior Vice President of the Automobile Insurers Bureau of Massachm
`setts and Vice President—Research for the Insurance Fraud Bureau of Massachusetts. Krz—ysztof M;
`OstaszewskiIs AssociateProfessorof-Mathematics and Actuarial Program Director at the Univer»
`sity_of Louisville
`.
`KrzysztofOstaszewski has worked on this project atIIthe University ofLouisvillewith financial
`support from the Actuarial Education and Research Fund, and this support from AERFIS grateful-
`ly acknowledged The authors thank Jeff Strong and Robert Roesch of the Automobile Insurers '
`Bureau for invaluable help in programming and performing calculations involved in this project,
`Herbert I Weisberg for suggesting the fuzzy clustering of fraud assessment data, Ruy Cardoso for
`helpful comments on an early draft Julie Jannuzzi for production of the document, and one
`anonymous reviewer.
`
`Page 00002
`
`
`
`
`
`
`
`Page 00002
`
`
`
`448
`
`The Journal of Risk and Insurance
`
`consistently favored the first um (no matter what color they Were asked to bet
`on).
`What seems to be present in this experiment is the participants’ perception ‘
`of uncertainty When we say “uncertainty,” the usual association is with “prob-
`ability.” The Ellsberg paradox illustrates that some other form of uncertainty '
`can indeed exist. Probability theory provides no basis for the outcome of the
`Ellsberg experiment-..
`'
`K111 and Folger (1988)analyze thesemantic conteXt oftheterm‘uncertain"
`_ and arrive at theconclusiOnthatthereare twomain types of uncertainty, cap—
`tured by the terms “vagueness? and “ambiguity” Vagueness13 associated with
`the difficulty of making sharp orprecise distinctiOns among objects. “Ambigu—
`ity” is caused by situations where the choice between two or more alternatives
`is unspecified The basic set of axioms ofprobability theory originating from
`Kohnogorov, rests on the assumption that the outcome of a random event can
`be observed and identified with precision Any vagueness 'of observationIS
`considered negligible, or not significant to the construction of the theoretical
`model. Yet one cannotescape the Conclusion that forms of uncertainty repre—
`'sented by vagueness of Observations, human perceptions, and interpretations,
`are missing from probabilistic.models, Which equate uncertainty with random-
`ness (1.'.,e a sophisticated version ofambiguity).-
`‘
`'
`'
`'
`Several reasons may exist for wanting to Search fer models of a form of
`uncertainty other than ra'n'dbmness. One1s that vagueneSs is unavoidable. Giv—
`en imprecision of natural language or human perception of the phenomena
`observed, vagueness becomes a majOr factorin any attempt to model or predict-
`the course of events. But there1-8 more When the. phenomena observed be—
`come so complex that exact measurement involving all features considered
`significant would be impossible, or longer than economically- feasible for
`study, mathematical prccision is often abandoned?in favor of more workable
`simple, but vague, “common sense’_’ models. Thus,-complexity of. the problem
`may be another cause of vagueness
`_.
`These reasons were the.driving force behind the developmentof the fuzzy
`'set theory. (PST); This field of applied mathematicshas become a-dynamic
`research and applications field, with success stories ranging from a fuzzy logic
`rice cooker to an artificial intelligence in control of Japan’s Sendai subway
`system. The main idea of fuzzy set theory18 to propose a model of uncertainty
`different from that given by probability, precisely because a different formof
`uncertainty is being modeled.
`Fuzzy set theory was created1nZadehs (1965) historic article. Topresent
`this basic idea, recall that a characteristicfunction of a subset E of a univerSe
`of discourse U'IS defined as
`'
`'
`7630?).
`'10 if x'e' Er.
`' In other words, the characteristic function deScribes the membership ofan
`element x in a set E. It equals _one if x is a member of E, and zero otherwise.
`
`-
`
`1 :if' x e E
`
`'
`
`Page 00003
`
`
`
`
`
`Page 00003
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification
`
`449'
`
`Zadeh challenged the idea that membership in all sets behaves in the man?
`ner described above. One example would be the set of “tall people.” We con-‘
`sistently talk about the set of “tall people,” yet understand that the Condept
`used[5 not precise. A person whoIS 5’11“ is tall only to a certain degree, and
`yet such a person is not “not tall.” Zadeh writes,
`'
`-
`The notion of fuzzy set provides a convenient point of departure for the construction
`of a conceptual framework which parallels in many respects the framework used in -
`' the case of Ordinary sets, but is more general than the latter and, potentially, may '
`prove to have a much-wider scope of applicability, partiCularly in the fields of pat-
`tern classification and information processing. Essentially, such a framework pro-2 _
`vides a natural way of dealing with problems111 which the source ofimprecision is
`the absence of sharply definedcriteria of class membership rather than the presence .
`of random variables.
`
`,
`__
`
`'
`
`.
`
`'
`
`In the fuzzy set theory, membership of an element111 a set is described by
`the membership function of the set. If U1s the universe of discourse, and Eis
`Ia fuzzy subset of U, the membership function 11E:Uw—>[0,1] assigns to every
`element x in the set E its degree of membership 11501). We write either (E413)
`or E- for that fuzzy set, to distinguish from the standard set notation E. The.
`membership function18 a generalizationof the characteristic function of an
`ordinary set. Ordinary sets are termed crisp sets in fuzzy sets theory. They are
`considered a special case—a fuzzy set is crisp if, and only if, its membership
`function does not have fractional values.
`,
`011 the basis ofthis definition, one then.develops such concepts as Set theo-
`retic operations on fuzzy sets (union, intersectiOI'i, etc).I as well as the notions
`of fuzzy numbers, fuzzy relations, fuzzy arithmetic, and approximate reasoning '
`(known popularly as “fuzzy logic’).:Pattern recognition, .or the search for
`structure in data, provided the early impetus for developing FST becauseof the .
`fundamental involvement of human perception (Dubois and Prade, l980) and-
`the inadequacy of standard mathematics to deal with complex and ill-defined
`systems CBezdek and Pal, 1992). The formal development began witl1.Zadeh
`(1965) introducing the principal concepts of FSIT. Acomplete presentation of
`-
`'
`PSTIS providedIn Zimmerman (1991).
`The first recognitionof FST applicability to the problemofinsuranceunder-
`writing is due to DeIWit (1982) Iemaire (1990) sets out a more extensive
`agenda for IFST1n insurance theory, most notablyin the financial aspects of
`the business. Under the auspices of the Society of Actuaries, Ostaszewski
`(1993) assembled a large number of possible applications of fuzzy set theory
`in actuarial science. His presentation includes such areas aseconomics of risk,
`time value of money, individual and collective models or risk, classification,
`assumptions, conservatism, and adjustment. Cummins and Derrig (1993,1994)
`complement that work by exploring applications of fuzzy sets to propeny-I
`liabilityinsurance forecasting: and pricing problems.
`Here, we present a method of fuzzy pattern recognitionfor riskIand claims
`classification We apply fuzzy pattern recognition to two problemsin Massa~
`chusetts private passenger automobileinsurance: defining rating tenitories and
`classifying claims with regard to their SUSpected fraud content. Dubois and
`
`,.
`
`'
`
`Page 00004
`
`
`
`Page 00004
`
`
`
`4-50
`
`.
`
`I
`
`._
`
`.
`
`-
`
`.The. Journal of Risk and Insurance
`
`Prade (I980), Bezdek {1981), and Kandel (1982)_I provide overviews of fuzzy
`techniquesin pattern recognition Zimmerman _(1991) and Bezdek and Pal
`(1992) provideother valuable references on the subject
`.-
`_
`The. conceptof a fuzzy set and the mathematical algoritlunsI_needed 1021111:
`plement classification using fuzzy techniquesis described111 the next section.
`Grouping towns in Massachusetts into rating territories for risk classification
`purposes is viewed as afuzzy c1usten11g‘ problem becausemanytowns can be
`strongly related to 1W0 or more _territOries, thereby creatinga border problem:
`to which of several related territories should a town be assigned We also
`explore theinfluence of geographicalproximity on the resulting fuzz-y territo-
`ries and classification of— claimS by their Suspected fraudulent content. A final
`section summarizes andprovides some alternative and fiitu're dirEctions fer
`FST'111 risk and claims classification problems.
`
`'
`7I_.Al'g_0_Ii'i-th1ns for. Fuzzy Classification 1
`Lemaire (-1990) and Ostas'zeW'ski (1993) point out thatinsurancer'i'sk-classi- .
`fication often resortseither to vague” methods—asin the case of using multiple .
`ill-definedpersonal criteriato identify good fisks t'o undeIWi'ite—or metl'iods
`that are eXCess1vely precis'e—as in theCaSe of a person Who fails toclassify as
`a preferred 'risk' fer life'mSuran'c‘e app1icatio‘1'i becausehis or her body weight
`exoeeds the stated Iifiiit byhalf a pound. Kande'l (l982)' writing from a differ—
`ent perspective! says. “In a very fundainentelw'ay; theintimate relation be—-
`tween thetheory of fuzzy setsand the theory of pattern recognition and cl'assi— _
`rfication rests'on the factthat most reel-Worldclasses arefuzzyin nature." This
`isexactly the reason that we prepdse to utilize the methodology of firz'iyclus—
`terin'g in temtorial classification 'and' to extend that methodto:class1fy1ng
`Claims for suspected‘fraud.
`'
`'
`-
`'
`‘
`'
`: Kandel (1982) Classifies various techniques of fuzzy pattern recognition
`Syntactictechniques ”apply When the pattern soughtis related tothe formal
`structure of the language. Semantic techniquesapplytothoSe producing fuzzy
`partitions of data sets. AcCording to Eezdek amps-119923111115 first choice
`faced by a pattern recognition system designer"is that of process description
`”The designer maychooSe fromamong syntactic, numerical, contextual, rule-
`based, hybrid; and fuZZ'y process descriptions. Featureanalysis is the next
`designstep,__ in Which data (generally given in the 101111 of--a date ve‘Ctoi con—
`mining information about the analyzed objects) may be subjeCted to p'rep1‘04
`cessing, displays, and extraction. Next, semantic Cluswring algorithms, generat-
`ing' actual sti'uCtures in date, areidentified. E99139.the deSigneraddresses
`clustervalidity and optimality.
`_
`'
`We use a fuzzy pattern recognition technique givenby Be'zdek (1981): In
`the classification of Bezdek and Pal (1992), it can be deSCIibed as 'anumerical
`process description, fuzzy c-means iterative semantic algorithm.'IBecause the
`data We analyze 'are’in'the form of numerical vectors (i:e:, vectors in a'e'uclide—
`1111 Space), withe'nurnber of ”clusters 'sought‘predetermihed, we consider the
`1
`.
`
`_
`
`Page 00005
`
`
`
`
`
`
`
`
`
`Page 00005
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification
`
`451
`
`fuzzy c—means technique most appropriate. Bezdek et a1. (1987) discuss the
`convergence properties of the algorithm.
`The taskis to divide 11 objects, where n is a natural number, each represent—
`ed by a vector in a p-dirnensional euclidean space
`
`X1, x2!" ! xii
`
`(coordinates of the vectors are known as features), into- c, 2 < _c <_ n, categori—
`Cally homogeneous subsets called clusters. The objects belonging to the _same'
`cluster should be simjlar, _and the objeCts in different clusters shouldbe as
`dissimilar aspossible. The number of clusters, 0,_is specifiedIn advance If the
`membership function of objects111 clasters takes on fractional values,then we
`have fuzzyClusters. The process is calledclustering '1:
`'Any clustering method must answer two fundamental questions: _vvhich'
`properties of the data set shbuld be used; and111 Which way should they be
`”used to identify clusters.once the algorithm meeting th'OSe two conditions1's
`specrfied there are, of course, more technicalquestions; such as Whether the
`algorithm11% effective for all possiblesetscf data, as well as the question of
`Validity of clusters (see Kandel, 1982, andBezdekand'Pal,1992, fora discus- ..
`sion 'of thisproblem);-
`Risk:classification seeks todistinguish risks for the purposes of ratingand
`u'nderwntin'g.1n claims processing, the purpose is' to identify cIaimS suspected
`of fraud for special processingand reute nonsuspici'ous clain'i's thrOugh normal -
`adjusting channels. Insurance risks and claims are both described here by cer-
`- tain data patterns. The pattern recognition algorithm does the “detective work"
`of findingclustersofzsimilar risks and-claims.-- ‘
`‘
`Let the-'datanSCt be,
`_
`.
`,
`
`-
`
`X== {X14 is... a}:
`,
`._
`_
`.
`X is assumed to be a finitesubset ofa p—dimensional euclidean-space RP. Each
`-
`1;:- (Xmikaggi- 'stP)’ k = I,' 2, 3,".,
`is called afeature vector, while each ka, wherej—= 1, 2,..., p, is the jthfeamre
`of the vector xk.
`A partition of the data set X" into fuzzy clustersrs described by the set of
`membership functions of the clusters (note that such a descriptioncould also-
`apply to crisp clusters, With the membership function meaning simply the
`characteristic function) The clusters are denoted by SI, 82,"., S With the corre-'
`sponding membership functions 113', n52, J.., 115.111 other words, we will con-
`struct 0 clusters that are fuzzy sets.
`-A c X n— matrix containing. the values. of the membership functions of the
`fuzzy clusters
`'
`
`‘
`
`_-[p'si(xk)]i=1w2-v°‘-k;12-“'a-“
`is. afuzzy c~partition if itsatisfies the following conditions:
`
`'
`
`Page 00006
`
`
`
`
`
`
`
`Page 00006
`
`
`
`452
`
`'
`
`,
`
`-
`
`TherJoumal-ofRisk andimurance'
`
`O
`
`Fl
`
`p,”(x) = 1 for each 1:: 2.,
`
`11,
`
`-
`
`.
`
`(1)
`
`s X psi(xk) s 1 for each 1' =1, 2..... c.
`=1
`
`(2)
`
`Cendition (1) says that each feature vector x; has its total membership value
`of one divided' arming all clusters, and Condition (2) states that the sum of
`membership degrees of feature vectors in a given cluster does not exceed the
`totalnumber of feature vectors.
`'
`Given the above definitibn, let us now present the fuzzy c-means algofith‘m
`of Bezdek (1981), also usedin Ostas'zewsk‘i (1993) The iterative algorithm
`consists of four steps; we Iadd a fifth step to __make the result operational. The
`first step sets out a working definition of distance between feature vectors (the
`'
`vector. norm) and an initial starting partition. The second step identifies the;
`center of each clusterin the partition. The third step recalculates the member— ,
`ship functions of the partition as normalized distances from the step 2 centers. ,
`The fourth step Checks the distance between successive partitions. to determine
`if. the iteration procedure should be stopped. The _fifth step discards small
`membership values (below some predetermined 0t, 0 < it < l) to make the,
`partition operational The five formal steps follow.
`
`Stép I
`
`Choose 0, an integer between .two and.n,-'as the number-of clusters into
`which the data is partitioned. Choose a positive parameter m,-and a symmetric,
`positive-definite p x p matrix G. Define the vector norm ll
`llG , using the
`transpose operator T, by
`
`iix-k H Ville :
`
`(3)
`
`7
`
`Setthe iteration counting parameter 1! equal to _zero, and choose the initial
`fuzzy partition
`
`.
`U(a)' = [pstio)(xk)]
`Choose a parathetere > 0 (this number will indicate when to stop the iteration
`
`process).
`Note that the columns of the fuzzy partition matrix, numbered one through
`11, corre3pond to data vectors, and each column gives degrees of membership
`of the data point in clusters. one through c. The matrix nomi'll NC, is suitably
`chosen in such a way that two data vectors with great similarities are relatively
`
`Page 00007
`
`
`
`
`
`Page 00007
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and CIaim C[ossification
`
`453
`
`close to each other, while dissimilar data are set apart. Although no perfect
`measure of such relationship exists, we can adjust the scale of xk coordinates
`by introducing appropriate diagonal entries, and any known correlations of
`coordinates can .be represented in the nondiagonal entries. The size ofrthe
`matrix G corresponds to the number of coordinates in data vectors.
`The main idea of the algorithm is to produce reasonable centers for clusters
`of data, and then. group data vectors aroundcluster centers which. are reason-'-
`ably close to them. Unlike in standard crisp'algor-ithrns-, fractional cluster mem-'
`bership'is allowed, which gives us flexibility to adjust for any otherwise desir—
`able phenorriena.
`,
`-
`
`Step 2
`
`Calculate the fuzzy clustercenters {V'mh——1...2”ugh?“by the f011owing fem-£11.47
`
`1
`
`1a:
`
`to _ 212:1 (p3?)(xk))m xk
`1
`EL (steam
`_
`,
`for1: 1 '23.., c.
`The cluster centem are merely Weighted averages of data vectOrs. Weights
`are given by the mth powers of the membership. degree..Bezdek et al. (1987)
`discuss the influence of the scaling factor In; as' well as convergence of'th'e
`resulting algorithm.
`
`3 7(4)
`
`.
`
`_
`‘
`,
`Step 3
`Calculate the new partition (i.e., membership matrix) .7 ,.
`”' (2+1)
`
`-
`
`'
`
`=[psi(xk)']1g5c,1ac5n
`
`'
`
`Where
`
`111-1
`
`1
`
`.
`psa'l‘uock) _
`
`x we
`u k
`l
`
`”G
`
`-
`
`_.
`
`_
`
`(5)
`
`_
`where i = 1; 2;"; c, and k: 1, 2,...,’-n‘
`If xk =" vi(‘),howe.ye'r, formula-(5) cannot be usedln thatcasefire set
`
`Page 00008
`
`
`
`
`
`
`
`
`
`Page 00008
`
`
`
`454
`
`.
`
`, The Journal (if-Risk and Insurance
`
`1+1
`
`2
`
`1 if k - 1,
`
`‘
`
`'
`
`‘
`
`‘
`
`'
`
`“SHOW {011111.11 1,2,.."(61'
`This step of thealgorithm carries us from the previous membership matrix
`(numbered 12) to- the next one {numbered 11+ 1). One. can mterpret formula.(5)
`as follows: if the vector norm measures the similaiity of two data vectors,.the
`(m—1)st root of its reciprocal- is a form, of measure of dissimilarity, and formula
`(5) assigns a new membership degree by relating the dissimilarity with a given
`cluster center.to the.‘‘total- dissimilarity present.” Formula (5)5is, however, -a
`result of a longer Optimization procedure discussed further by ‘Bezdek etah
`(1987).
`
`Step 4
`[1G to the matrix
`By using the naturalmatrix norm, or the extension of II
`norm, or by choosing a different matrix norm more suitable to the problem,
`calculate
`-
`
`__ “Urea g-Uteuq
`If A > a, repeat steps 2, 3, and 4. Otherwise, stop at some iteration count (2*.
`This “stopping procedure” is a standard numerical analysis technique-mil” yet
`another iteration does not change much, the resultis the best possible. Clearly,
`theprocedure rests on the _assumption_of the algoritinn’s convergence, but
`luckily the proof of that Convergence exists,by Beadek et a1. (1987)
`
`Step 5
`
`The final fuzzy matrix U 2* is structured for operational use by means of the:
`normalized Ot—cut, for some 0 < at < 'l.- Quite simply, all membership function
`values less than or are replaced with zero and the function is renormalizedj
`(sums to one) to preserve partition candiliion (1). For small or, the resulting
`partition is still fuzzy; for large 01 (or max-cuts, where the largest membership
`value1s set equal to one all others are zero), the resulting partitions are likely
`to be crisp.
`
`Automobile Rating Territories in,Massachusetts I
`As Genger (19871) points out,
`
`In Massachusetts, -the.p'ast ten yeam have witnessed the evolution of an increasingly
`sophisticated system of methodologies for determining the definitions of rating
`territories for private passenger automobile insurance. '.Incontrast to territory ,1
`schemesin other states, which tend to group geographically contiguous towas, these ,
`Massachusetts methodoiogies have had as their goal the grouping of towns with“
`similar expected losses per exposure, regardless of the geographic contiguity or non—
`contiguity of the grouped towns.
`
`Page 00009
`
`
`
`
`
`
`Page 00009
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification
`
`455
`
`Note the ambiguous nature of “similar expected losses,” adecidedly fuzzy
`concept
`The methodology used for territorial rating resultsin a- final combined fi've-
`coverage pure premium index for each of the 360 towns (or, more precisely,
`350 towus and ten areas into which Bostonis divided for automobile rating
`purposes). A complete descriptibn of the empirical Bayes procedure for deter—'
`mining the biennial individual and combined coverage town indices from four ‘
`, years of datars given in DuMouchel (1983).. The indices, which are numbers
`relatively close to 1 representing expected losses in relation to those of the
`entire state espectecl losses, are then ordered and territories are createdby
`partitiouing that linear ordering. Because frequent switches from one territory
`to another are undesirable but inevitable, numerous restrictions on moving
`towns from one territory to another exist in actualregulatory practice. Once
`territory clusters are set fora rating year, five individual coverage rates are
`determined using that single Clustering, one which may or may not be appro-r
`priate for each coverage, but whichrs assumed to be equitable overall.
`Such difficulties andnnpreCisions in groupings warrant an investigation of
`fuzzy clustering. Resulting fuzzy clusters would be much. more flexible, be: .
`cause a town belonging partially to trim or more territories could be assigned
`to one of them if regulatory limitations dictate unique assigninents oftowns to
`territories. Although stability of territory assignmentIS desirable and conve—
`nient, the system of clustering towns into territories should meet the standard -
`responsiveness criterion for risk classification Towns have an incentive to
`reduce their relative loss costs by. maintaining their roads, safety engineering,
`and law enforcement, if those actionsbring about lower premiums When the
`system is not responsive, or slow to respond, the incentives can be diminished
`or lost.
`The pure premium indices are calculated for the following coverages for all:
`350 towns: bodily injury liability (A-1 and B), personal injury protection (A—
`2) property damage; liability (PDL), collision, comprehensive, and a sixth
`category comprising the five individual coverages combined. We use those
`values as the coordinates of vectors 11k, k—— l, 2., 3.,., 350, representing the
`towns in the data space. This implies that we treat the data space as six-dimeni
`sional, as six parameters are used to describe towns. In our calculations, we .
`use either the five coverage indices (five—dimensional vectors) or the combined '
`index (one-dimensional vectors) but not both. The data for the 1993 indices ‘
`(based on the 1987 through 1990 data) for towns in Bristol County are given
`in Appendix A. Data for 1111350 towns and Boston are available in Automobile
`Insurers Bureau (1992) or from the authors. We begin by illustrating the algo—
`rithm for a manageableset of towns: the twenty towns of Bristol County,
`Massachusetts.
`
`.
`
`1In general, the partitioning is accomplished by grouping towns within five to six percent
`intervals on either side of the statewide averageindex of one.
`1
`-
`,
`
`Page 00010
`
`
`
`
`
`
`Page 00010
`
`
`
`
`
`
`
`
`
`'456
`
`-
`
`.
`
`-
`
`The Journal of Risk and Insurance
`
`The BriStot County Algorithm
`The initial clustering for Bristol County is the indicated 1993 territory as- I
`signment groupings relabeled one to five. The initial five--coverage partition
`matrix is
`hue—[11:01:19]lSiJ[$15520
`
`where ps(°)(xk)- represents the membership of town. xk in eluster Si; and- it
`equals one if the town is in the territol'y, or zero if it is not
`We also set the stopping parameter s-- 0.05, and m:2.- The initial cluster
`centers are calculated as
`
`'
`
`I
`20
`klam)
`
`2
`
`for i :21, 2,‘..., 5. We proceeditol‘eualuate thenew partition matrix _
`U0) [1180xeng 151520?
`'
`1
`-
`
`I
`
`'
`
`where
`
`(7)
`
`_
`
`'-
`
`‘
`(8-)
`
`'
`
`I'
`
`r
`
`‘
`
`1
`
`where the subscript p refers to one of the five pure premiumcoordinattw of:a
`town, and1——- 1, 2..,,5 k-.- 1,2..20, and gip are weights representing the ,
`distribution of losses across coverage?
`Ifxk = vim); however, formula (6) must be used; In that case, We set
`
`2 For illustrative purposes, the town of Fairhaven, which was assigned to 1993 Territory 9, is
`included with those towns in Territory 8. Fall River is included with New:Bedford Actual 1993'
`rating territories are subject to judgmental adjustments and capping and are not always those
`shown here.
`
`3The coverage weight distribution, using 1990 exposures times four—year pure premiums, is
`[(gii)= (0.2229, 0,1109 02048, 0.3210 0.1404); (g,j=) Oif‘i aej, 1<1, j<5)].
`
`Page 00011
`
`Page 00011
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and-Claim 'Classp‘ication
`
`457 -
`
`981
`
`k
`
`e if k s i, k =i, 2.., 20, i=1,-2,..., 5.
`
`Now we calculate the distance between the initial partition matrix U (0) and the
`new partition matrix Um, by taking the simple matrix norm
`
`(10)
`
`If A < e = 0.05, the process is stopped. Otherwise, the'iterative algofithrn
`cantinues. The results of the calculation, with an- OL-cut of 0.2, are presented in
`Table 1.
`_
`
`Table 1
`
`Fuzzy Town Cluster Membership Values
`for Bristol, County, Massachusetts
`I
`.
`‘ Membership Values
`_‘
`._
`_
`,u,’
`
`Initial
`Cluster
`
`-
`
`.
`
`11,,
`
`7
`
`an
`
`E"
`
`p“
`
`Sum-
`
`Town Name
`
`Mansfield
`North Attl‘eborough
`Dighton
`Rehoboth
`Norton
`. Freetown
`Berkley
`Raynham
`seekonk
`Easton
`
`Attleboro
`Dartmouth
`
`Somerset
`Swansea
`
`Taunton
`
`Westport
`Acushnet
`Fail-haven
`Fall-River
`
`1
`1
`2
`2
`2
`2
`2
`2
`3
`3
`
`3
`3
`
`4
`4
`
`4
`
`4
`4
`4
`S
`
`0
`I
`0
`1
`0.22
`- 0.32
`040 " 0.2-3
`' 0.58 ,
`0
`, 0 '
`1
`'0
`1
`0
`0
`0.25
`0
`0
`O
`
`0
`'0
`
`0
`0
`
`0
`
`0
`0
`0 '
`0
`
`0
`0
`
`0
`0
`
`0
`
`0.37
`0
`O
`0
`
`0
`0
`0.46
`0.38 f
`0.42 ‘-
`0
`0
`l
`0.43
`1
`
`0
`0
`0
`-‘ 0.
`0
`I ‘0-
`0
`0
`0.32
`0
`
`_
`
`1
`1
`
`0
`0
`
`0.37
`
`0.30
`0
`0
`0
`
`0
`0
`
`l-
`1
`
`0.63.
`
`0.33
`1
`' l
`0
`
`0
`0
`0
`0
`0_
`0
`0
`0
`O
`0
`
`O
`0
`
`O
`0
`
`0
`
`0“
`0
`0'
`1
`
`1-
`1
`1
`1
`1
`1
`1
`1 -
`1
`1
`
`'-
`
`_1:
`1
`
`1’
`1
`
`1 .
`
`1'
`1
`'1
`1
`
`1
`.
`1
`o
`o
`'0
`'0'"
`_5
`New Bedford
`
`Sum 20 3.54 2. 82 6. 35 5.29 '- 2
`
`
`
`
`
`
`' Note: C—means fuzzy clustering algorithm, with five-coverage data pattern, ninth iteration stopping
`parameter 0.0499 < _.005, Ct-cut— 0.2. no geogrephical variables.-
`Figures1 and 2- display the. results ofthe transition from initial, tetritory
`clusters to final fuzzy clusters. Figure 1- displays the- 20 Bristol County towns
`
`Page 00012
`
`
`
`
`
`Page 00012
`
`
`
`458
`
`'
`
`The Journal. of Risk and Insurance
`
`Figure 1
`,
`Initial Territorial Town Clustering by Combined Index Territory
`for Bristol County, Massachusetts
`
`
`
`Figure 2
`Fuzzy Town Clustering by Five Coverage Indic'es
`for: Bristol County, Massachusetts
`
`
`
`
`grouped into their initial clusters in- increasing combined index- order.For
`example, Town 1 (Mansfield) has the lowest combined index value (0.8018)
`andISinthe lowest ranked territory, while TownJ20(New Bedferd) hasthe
`highest index (1.2977)- andis in the highest ranked tetrifory.
`'
`-
`-
`
`Page 00013
`
`Page 00013
`
`
`
`Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification
`
`459
`
`Figure 2 shows the fuzzy clusteringresults that provide fer the incorpora-
`tion of five-dimensional data (individual coverages), as well-as. the-fractional
`assignments (fuzziness) to the clusters. With fuzzy clustering, towns tend to
`become associated with nearby clusters as well as with their “home” cluster
`Town 8 (Raynharri) becomes associated With fuzzy cluster 3 and has little
`association (less than 0.2) with its original home cluster 2. Town 5 (Norton)
`with home cluster 2 splits into fuzzy clusters 1 and 3. These movementsare
`typical of fuzzy clustering results.
`
`Geographical Proximity
`We also perform a calculation adding two more features for each town—wits -‘
`geographical coordinates divided bythe coordinates of the town Withthelarge
`est Massachusetts coordinates Nantucket (the division is performedto adjust
`the scale and to match the other features, which are all close to 0111215. By _
`performing the algorithmon these vectors, including geographical coordinates,
`awe increase the chance of arriving at clusters that are not only actuanally
`similar, but relatively close geographically. . , ..
`This calculation13 performedin the same manner as before, but withseven
`feature variables. We show results for pure premium data weighted 50 percent
`and geographical variables 50 percent, but any relative weighting scheme.can
`be used to reflect the modeler’s preference forgeographic dependence oftenjij
`tones. The 50/50 results arepresentedm Table 2. Note that a full 50 percent
`weight on the two geographical coordinates produced only slight differences;__ ;
`from the five variable centers shownm Table 1. Recall that other states use
`geographical proximity as an important factor.in determining ratingtenitones ,
`(DuMouchel,1983,_.p 76) A map of Bristol County1s shown1n Appendur .B. '
`After the inclusion of geographical location, all towns retained nearly identi--
`cal membership values within each of the five clusters.5 A comparisonof the
`cluSter centers showninTable 3, With_and without the geographical vanables
`reveals how little effect the geographic variables had on the pure premium.
`cluster centers. Either thegeographical variable13 already-accounted for111 the.
`fiveLcoverageépattern or it is relatively weakin relation to the pure: premium
`patterns, at least for this data set.6
`-
`.
`-
`
`,
`
`'-
`
`,
`
`'
`
`'
`
`
`
`H. .
`
`_‘111 this apphcatron,the fomthroot of theratioisused tobound the geographical coordinates
`betWeen 0.49 and one, making them more comparable111 Scale to the purepremium mdlces This
`is equivalent to applying a- dilationOperator to thesirnple coordinateComparison to Nantucket1n
`order to produce comparable fuzzy membership values (see Lemaire, 1990,13. 44, and Cumniins
`and Derrig, 1993, p. 452).
`5 Although the membership values for Dighton appear to be quite different'1n Tables 1 and 2, r
`the actual pre 0.20-cut values for Dighton are 0.205 and0.199.
`5 The inclusion of location variables to cluster is not necessarily limited to geography. Brockett,
`Xia, and Der-rig (1995) provide two-dimensional location variablesthat are tepographically faith-'
`ful neural networks for the fraud data discussed below.
`
`Page 00014
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 00014
`
`
`
`'
`
`.
`
`
`
`Table 2
`Fuzzy Town Cluster Membership Values with GeographicalVariables
`“Wm
`.
`-;
`_
`for Bristol County, Massachuseus
`.-
`_
`.
`-
`-
`.
`_
`..
`-
`'
`Membership Values
`'-Im'tial
`
`'
`
`-
`
`.
`
`.
`.
`
`-
`
`-
`
`'
`
`460
`
`,.
`
`_1-
`
`1
`
`‘
`
`The Journal of Risk and Insurance
`
`.
`
`.'
`
`'
`
`.
`
`:
`
`‘
`
`.
`
`'
`
`'_
`
`<
`
`._
`
`-
`
`.
`
`-
`
`,
`
`'
`
`‘
`
`114,31
`'
`Sum'
`5-3;;-
`an
`11,,
`114,,
`Cluster
`Town-Name
`0 "1_-
`'0
`0'_
`0
`'1;
`" 1
`'
`Mansfield
`0
`'
`l"
`0
`' 0
`0
`l
`North Attleborough
`1
`. 0.
`.17
`0_
`0.61
`0.20
`0.39
`Dighton
`2
`0
`'1. ‘
`0
`_ 0.40
`0.22
`0.38
`Rehoboth
`2
`0*
`1.
`--0.40‘- 0----'
`-- ‘0
`--
`0.60
`1161-166
`2
`--0
`1'."
`0"
`0'"
`~"1
`0 ‘
`Freetév't'rn-
`2
`__ 0
`l"
`O-
`0' 0-
`1"
`0 '
`Berkley
`2
`*0 -'
`1'
`-
`1
`' =0
`'
`0'
`0.
`‘Raynhatn-
`2
`" 0'
`1i:-
`- 0244-“
`030'
`0‘
`-' 0.25
`Seekonk
`3
`0- "
`, 1
`'1 -
`"0-
`0 .
`'0
`Easto‘n -
`3
`‘0'”
`'1 '
`0
`"
`31
`.03
`-' 0 '-
`O ‘
`Attleboro
`3
`'0':
`"1
`0'.= 1 ~0'--
`1'0 -'
`Damnmim _'
`3f
`fit)
`1
`0‘
`=0 '
`1
`' 0'-
`somerset:
`-
`'21-?
`'0’
`-
`'0'
`~“—=1- 0 , 1
`-- 0
`swat-mes
`4
`.
`'-~-0.‘
`0:38 -
`0.62_
`0 ,
`‘1-2
`20*
`Taunton
`4
`0.311 0.28
`0.35 '
`0 1
`'1‘
`0
`WeStPOI’t'
`4
`'_'-0‘
`- 0.
`j
`'21;
`=
`10‘”
`2'11
`-. 0 -
`Abushnet':
`4
`‘ 0 .
`'0 1'
`'1
`"-0--
`1-
`0‘-
`'Fairhav'en
`4
`'0 --
`"0-
`0‘
`" l
`1_
`"0 I
`Fall River
`5
`‘.
`.0
`O
`0 '
`1';
`Z T '
`I 0"
`