`COUPLING METHOD
`
`HENRY G. SMALL and MKWWL E. D. KOENIG
`IInstitute for Scientific Information, 325 Chestnut Street, Philadelphia. PA 19106, U.S.A.
`
`Abstract--The classification of journal titles into fields or speciattbs is a problem of practical importance in
`library and information science. An algorithm is described which accomplishes such a classification using
`the single&k clustering technique and a novel application of the method of bibliographic coupling. The
`novelty consists in the use of two-step bibliographic coupling linkages, rather than the usual one-step linkages.
`This modification of the similarity measure leads to a marked improvement in the performance of single-link
`clustering in the formation of field or
`of jcmmais. Results of an experiment using this
`algorithm are reported which grouped ~~~~s
`into 168 cfusters. This scope is an ~proveme~t of nearly an
`order of magaitude over previous journal clustering experiments, The res&s are evaluated by comparison with
`an independently derived manual classitkation of the same joumai set. The generally good agreement indicates
`that this method of journal clustering will have sign&ant practical utility for journal classification.
`
`INTRODUCTION
`clustering or categorizing journals has aroused the interest of
`The concept of aIgo~t~ic~~y
`many members of the information science community. As CARPENTER and NARJN[I] point out,
`most work in the area seems to have been motivated by a combination of aesthetic and
`practical considerations. The aesthetic considerations include the challenge of doing algorith-
`mically what has been a very non-trivial task intellectually-the
`classification of journals. The
`task is an almost pure problem in numerical taxonomy, that of p~titioni~
`a population on the
`basis of shared characteristics.
`On the practical side the outcome of journal clustering can have various applications. The
`categories reveal the pattern, the mosaic of scholarly activity. An analysis over time would
`reveal shifts in that pattern, as journals entered or departed from clusters, and as clusters
`themselves emerged, merged, separated and disappeared. Such observations would have
`relevance for sociology, information science, and science policy. Clusters thus derived could
`also be used to analyze and promote Be rationaIization of journal coverage by secondary
`services. The DISISS (Design of Information Systems in the Social Sciences) project has
`proposed such an appIication[2]. Furthermore,
`journal cluster patterns would be useful for
`analyzing and validating thesauri, classification schemes, and indexing schemes.
`A number of previous studies have described attempts to cluster journals. In their seminal
`work of 196’7 XHIGNESS and Usooo~[3~
`examined the journaI-to-journd citation patterns within
`a group of 21 psychology journals to obtain a simiIarity matrix. This was accomplished by
`means of Shepard’s aIgorithm[4] which assigns distances between journaIs in n-dimensional
`space, keeping n as small as possible while preserving the rank orders of citation frequencies
`between
`journals. Nine of the 21 journals were assigned to three overlapping clusters,
`determined by the journals’ proximity to each other in n-space. The multidimensiouality of this
`approach limits it to relatively smalf numbers of journals.
`PARKER, PA[%EY and GARREIT[S], Iater in I%?, undertook an analysis of 17 journals in the
`field of communication research, The measure of relatedness between journals was a form of
`co-citation-the
`frequency of co-occurrence of citations to journals within articles in the 17
`source journals. (The term co-citation, more recently introduced[6], refers to a measure of
`relatedness between articles, defined as the frequency with which two articles are cited together
`by other articles.) Some 68 journals were cited frequently enough to be analyzed, of which
`approx. 30-35 were grouped into some 8-11 clusters (the exact number varies for each of the
`four time periods studied). A criticism as pointed out in the DISISS study described below is
`the lack of arny attempt to normalize for the level of citations. Without normalization the
`procedure almost inevitably links highly cited journals. The technique is, however, capable of
`
`277
`
`001
`
`Facebook Ex. 1016
`
`spe<y clusters
`
`
`HENRY G. SMALL and MICHAEL E. D. KOENIG
`providing “afhliates” as well as “members” for each cluster, but without normalization, the
`affiliates tend to be the most highly cited journals that are members of the most strongly linked
`clusters.
`Large scale attempts at clustering journals using citation relationships were not possible
`until the advent of the Science C~fu~~~n Index@ (SCI) database (compiled by the Institute for
`Scientific Information). Particularly important was Garfield’s reformating of the SC1 to show
`journal to journal citation patterns [7] which revealed the existence of very strong direct citation
`in the publication of Journal Citation
`linkages among journals. This work culminated
`Reports@[8] which is an index of these journal to journal citation patterns. CARPENTER and
`NARIN
`used these data to look at three disciplines: physics, chemistry and molecular biology.
`For each discipline the individual journals were manually pre-selected, and a separate joumal-
`to-journal citation matrix was prepared. A “hill climbing” algorithm was used which for each
`attempt requires the number of clusters to be predetermined, as the algorithm creates no new
`clusters and rarely eliminates any. A measure of cluster quality is then used to determine which
`level of clusters has the “best” fit. In this study, nine different combinations of journal
`similarity measures and cluster quality measures were used and then combined to produce the
`final results. Each of the three disciplines, ranging in size from 8 1 to 106 journals, was clustered
`into 11 or 12 clusters, with 5-16 journals remaining unclustered, and the clusters produced had a
`high degree of face validity.
`A pilot study to explore the feasibility of clustering social science journals was undertaken
`by the DISISS (Design of Information Systems in-the Social Sciences) project at the University
`of Bath in the U.K. in the early 197Os[2,9]. Citation data were obtained from 17 source
`journals. Again, a jour~l-to-journal
`citing matrix was used as the basic data form. The
`clustering algorithm called SCICON, operates on the basis of calculating the root mean square
`distance from members in n-dimensional space (n being the number of variables, in this case
`the 17 citing journals) to the center of gravity of each cluster and uses a “run-in” technique of
`starting with a large number of clusters and then reducing the number one at a time, examining
`at each step whether a better fit is accomplished by moving any journal to another cluster. The
`result of this technique used on 115 cited journals was three clusters: psychology (34 members),
`economics (21 members) and amorphous (60 members). Many of the smaller clusters produced
`during the run-in. when the number of clusters was higher, were meaningful however.
`The work described above, although useful and frequently imaginative, has been limited in
`its scope. The largest number of journals clustered at one time is barely more than a
`hundred-a very small portion of the universe of journals. The constraint on size appears to
`originate not from the lack of data, but from the sheer impracticality of processing the matrices
`and multidimensional arrays inherent in the techniques used, when any significant number of
`journals is to be considered.
`
`METHOD
`The procedure used in this experiment is a novel combination of some standard methods
`known to bibliometricians and numerical taxonomists. First. we use the well known technique
`of bibliographic coupling to deri,ve the basic journal-to-journal associations[lO]. Co-citation
`could equally well have been used as bibliographic coupling, but for computational reasons,
`bibliographic coupling was the more convenient association measure. For our purposes,
`bibliographic coupling is defined as the citing of the same document by two journals. (Con-
`vention~ly, bibliographic coupling is defined as the citing of the same document by two later
`documents.) The strength of bibliographic coupling (BC) is the number of identical, distinct
`documents cited by the two journals. This strength of coupling is normalized to compensate for
`the size effects of the two journals by dividing the bibliographic coupling strength by the sum of
`the number of references made by the two journals.
`The second procedure used is single-link clustering. This mode of clustering has been
`described elsewhere[ll]. We have used the fact that single-link clustering is equivalent to the
`application of a threshold on the item-to-item proximity measure. In our experiment,
`the
`method of single-link clustering was implemented in the following way: A file of journal-to-
`journal pairs with their appropriate coefficients of association is used as input. A threshold
`value of the journal-to-journal association is set and a journal is selected as a starting point. Ail
`
`002
`
`Facebook Ex. 1016
`
`278
`[ I]
`
`
`Journal clustering using a bibliographic coupling method
`
`219
`
`are
`level of association
`the prescribed
`at or above
`journal
`to this selected
`linked
`journals
`is then used as a starting
`to the cluster. Each of these journals
`in the file and assigned
`located
`point and all journals
`linked
`to them are assigned
`to the cluster. The cluster
`is complete when
`no “new”
`journals
`can be added to the cluster.
`to create another
`journal, and attempts
`The program
`then proceeds
`to the next, unclustered
`cluster. After all journals have been examined and assigned
`to clusters,
`the program
`terminates.
`The smallest cluster created using our procedure
`is a two member
`(two journal) cluster since
`journals not linked
`to at least one other journal at the prescribed
`level of association
`are not
`searched. The clusters are created at a particular
`level of the journal-to-journal
`association
`and
`we have no way of knowing what level is “optimum”
`except by inspection of the results and
`comparison with results obtained using other procedures.
`In general, a level is sought in which
`no very
`large cluster exists
`(greater
`than 100 journals),
`realizing
`that such a level, while
`appropriate
`for some areas or disciplines, may not be appropriate
`for others.
`two” between
`A novel feature of our journal clustering system
`is the use of paths of “length
`journals
`to determine
`the basic association measure used in clustering. Before we define what
`we mean by this, we can clarify our motivation by describing an earlier experiment which was
`not successful. We began with
`the file of journal pairs which were
`linked by normalized
`bibliographic
`coupling
`(NBC) described above. We then set a minimum
`threshold
`for NBC and
`extracted all journal pairs at or above
`this threshold.
`This gave a file of “strong”
`journal-to-journal
`linkages. The problem which we encountered
`was that we could not obtain a satisfactory
`set of single-link clusters using the NBC measure.
`The journals
`tended
`to chain together
`forming very large and loosely
`linked clusters.
`It is well
`known
`that the single-link
`algorithm has a tendency
`to form clusters of this kind, and this
`tendency,
`combined with
`the strongly
`interdisciplinary
`character
`of journal
`relationships,
`created enormous chains of journals which resisted
`fragmentation when the level of NBC was
`raised. Eventually, when the journals
`finally did break up into reasonably
`small clusters at a
`very high level of NBC,
`too few of the journals
`remained
`in the clusters
`to consider
`the
`experiment
`a success.
`measure.
`to modify our basic journal-to-journal
`As a result of this experience, we decided
`We had noticed
`that
`the chaining of journals
`to create gigantic
`clusters
`in the previous
`experiment was very often due to only a few links from a large or strongly
`interdisciplinary
`journal
`linking one
`journal
`“clump”
`to another. Our problem,
`then, was
`to enhance
`the
`“clumpiness”
`Iof the network
`so that inter-clump
`linkages could be “submerged”
`below some
`threshold value.
`two” between
`the number of paths of “length
`The method we chose was to determine
`journals. For example,
`suppose we take some arbitrary
`starting
`journal.
`It is linked with a
`number of other journals with an NBC strength at or above some threshold. These journals are,
`in turn,
`linked
`to other journals
`at or above
`this threshold. Now we select a second arbitrary
`journal and find all the distinct paths which lead from it to the starting
`journal but which pass
`through other
`(third)
`journals
`as intermediate
`steps. These are the paths of “length
`two”
`between
`the two journals. For every pair of journals,
`then, there is some number of two step
`paths (including zero) which connect
`them. It is also clear that the number of such paths for any
`pair of journals
`is limited
`to the lesser number of paths of “length one” which originate
`from
`one or the other journal. For example,
`if journal A has five links to other journals and journal
`B has ten links, the number of two-step paths leading from A to B cannot exceed five. Hence, we
`can normalize
`the two-step paths as shown in Fig. 1. This normalization provides a new measure of
`journal-to-journal
`association
`(normalized
`two-step bibliographic
`coupling: NTSBC) which has
`the property of varying from zero to one.
`in a
`journals
`the linkages between
`It is also easy to see intuitively why this should enhance
`“clump” and thus provide a better clustering
`than was obtained with the simple NBC. Suppose
`we have
`two clumps which are joined by only a few links. The number of two-step paths
`between
`journals within a clump will be high, while the number of two-step paths between
`journals
`in different
`clumps will be low. Hence, when a threshold on the two-step
`linkage
`measure
`is applied,
`the within-clump
`ties will remain and the between-clump
`ties will tend to be
`broken.
`It should be noted that there was a direct connection
`
`(a one-step path) between J, and J2 in
`
`003
`
`Facebook Ex. 1016
`
`
`
`HENRY G. Sum and MNXAEL E. D. KOEKIG
`
`Fig. I. ~~~t~ti~n of nom&ted
`coupling. Journals JX and J2 are linked by three
`two-step b~~o~~hic
`two-step paths. 1, has a total of eight one-step paths leading from it and I, has a totaB of six. The
`normalized two-step b~i~~phic
`coupling ~IKBC)
`is calculated as follows:
`
`No. two-step 1-2
`NTSBC =
`Na. one-step I+ No. one-step 2 - No. two-step 1-2
`3
`= -
`g+6-3
`
`= 0.273.
`
`Fig. 1. The inclusion of this direct link actually weakens the normalized measure from what it
`would be if the link did not exist. It does so by making the denominator of the NTSBC formula
`larger. In other wards, the strength of linkage between two journals connected by some number
`of two-step paths will be less if there is a one-step path between the two journals than if there is
`not. This seeming cont~iction
`could be easily removed if we adopt the simple rule that every
`one-step path counts as one two-step path in our calculation of NTSBC. It is unlikely, however,
`that this refinement will have much impact on the results of the clustering since all directly
`linked journals experience
`the same disadvantage and few journal pairs having frequent
`two-step links fail to be directly linked as well. In any event, we do not want to give undue
`weight to the one-step paths since they are responsible for the chaining effects observed earlier.
`Let us now review the method. We begin with an annual Science Cimion index
`and
`determine the bibliographic couphng strength for all pairs of source journals in this file. This
`BC strength is normalized by dividing by the sum of the number of references made by each
`journal during the year in question. A threshold is set on the normalized bibliographic coupling
`(NBC) and all journal pairs satisfying this threshold are selected. With this restricted Me, the
`number of two-step paths between all pairs of journals
`is determined. This number is
`normalized by dividing by the sum of the number of one-step paths emanating from each
`journal minus the number of two-step paths (see Fig. 1). The normalized two-step bibliographic
`coupling (NTSBC) is used as input to the single-link clustering routine. Clusters of journals are
`obtained at a specified, but arbitrary level of the NTSBC.
`
`CLUSTER FILE STATISTICS
`Before discussing the specific journal clusters obtained, we will describe the statistical
`characteristics of the initial and intermediate files (see Table 1). As noted above, an annual SCI
`cumulation is used as the database, which in this experiment was the 1974 file (items 1 and 2 in
`Table 1). From this file we created a special file listing each document cited by two or more
`distinct source journals, and the journals citing it. If a document is cited more than once by a
`certain journal, it is nevertheless counted as though it were only a single citation. This reduces
`the number of records in the file by about 50% (item 3). (The documents cited by only one
`journal are dropped since they do not contribute to BC.)
`The next step is to form all combinations of source journals which cite a given document,
`i.e. form all the bibliographic couplings in the file. There were almost seven million such
`couplings (item 4), which reduces to about 400,000 distinct pairs when identical journal pairs are
`gathered together and all pairs occurring only once are dropped. Each journal pair with its
`attached BC strength is then normalized by dividing by the sum of the number of references
`made by the pair of journals during 1974. A threshold of 0.01 was set on this NBC to eliminate
`weak linkages between journals (items 7 and 8). It is on this reduced file that the two-step paths
`are determined. This is done by forming pairs of journals which are linked to a common journal
`
`004
`
`Facebook Ex. 1016
`
`280
`
`
`Journal clustering using a bibliographic coupling method
`
`281
`
`in the file of journal pairs in both the
`(item 9). (This step is facilitated by the presence
`“forward” and “backward” versions, e.g. both AB and BA appear.) Again, identical pairs are
`gathered together and the frequency of two-step paths is attached to the pairs (items 10 and 11).
`The second normalization (according to Fig. 1) is carried out, and these data are input to the
`single-link clustering program. A threshold of 0.4 on the NTSBC resulted in 168 clusters
`containing a total of 890 journals, with an average cluster size of 5.3 journals per cluster. (The
`minimum cluster size is two journals and the largest cluster obtained at this level contains 96
`journals.)
`We contrast this clustering outcome with one obtained in our previous unsuccessful attempt
`using NBC directly as input to single-link clustering. For a threshold of 0.025 NBC, which
`
`Table 1. File statistics
`
`for journal clustering
`
`I.
`2.
`3.
`
`journal with references
`1974 SCI source
`1974 SC1 citations
`Citations
`to documents cited two or more times by
`distinct source
`journals
`4. Source
`journal pairs (bibliographic couplings)
`5. Distinct source
`journal pairs
`than I
`6. Journals
`in pairs at BC strength greater
`7. Distinct
`journal pairs at NBC greater
`than 0.01
`8. Journals
`in pairs at NBC greater
`than 0.01
`9. Total two-step paths between
`journals
`10. Distinct journal pairs connected by two-step paths
`II. Journals
`linked by two-step paths
`12. Distinct journal pairs at 0.4 NTSBC
`13. Journals clustered at 0.4 NTSBC
`14. Clusters formed at 0.4 NTSBC
`IS. Mean journals per cluster at 0.4 NTSBC
`16. Journals
`in largest cluster at 0.4 NTSBC
`
`2376
`5.168.1 I9
`
`2,478,207
`6,839,380
`705,167
`2359
`8044
`1679
`159,171
`45,180
`1586
`2071
`890
`168
`5.3
`%
`
`represented the most successful NBC results obtained, 119 clusters resulted containing a total
`of 747 journals, with an average cluster size of 6.3 journals per cluster. This larger mean cluster
`size was due to the largest cluster which contained 297 journals, constituting nearly 40% of the
`journals clustered. By contrast, for the clusters obtained at 0.4 NTSBC, the largest cluster of 96
`journals constituted only about 11% of the journals clustered. It is clear, then, that by using a
`two-step linkage measure the degree of chaining has been substantially reduced and the
`“clumliiness” of the journal network increased.
`Other clustering levels of the NTSBC were also tried and it appears that the critical level at
`which a transition occurs from a highly chained and enormous cluster to a group of subject or
`discipline oriented clusters is between 0.2 and 0.3 NTSBC. At 0.2 there were only 40 clusters
`with the largest cluster containing 1276 journals, nearly the entire journal set. At 0.3 NTSBC a
`radical change occurred. We obtained 153 clusters with the largest cluster containing 360
`journals. At level 0.4 we have increased the number of clusters by only 15 but the largest
`cluster declined in size nearly 75%.
`The existence of a “critical point” in the clustering level where there is a sudden breaking
`up of the largest cluster is also found in experiments clustering highly cited documents rather
`than journals[l2]. Whatever this may mean, it is clear that no one level of clustering is optimal
`for all scientific fields or specialties. Ideally one should adopt a variable level approach to seek
`out the best possible representation for a given area by varying the level up or down. This
`means that a way must be found of evaluating the quality of a cluster that is independent of the
`clustering methodology. This is a familiar situation in cluster analysis since it is generally
`recognized that adequate tests of cluster significance have not yet been developed and reliance
`on other means for evaluating results is necessary
`(e.g. their utility or agreement with
`classifications derived by other means). In the discussion of the clusters at level 0.4 NTSBC,
`which follows, we use two modes of f‘validating” the results. First, the classification obtained
`automatically is compared with one which was obtained manually and quite independently.
`Second, qualitative evaluations of some of the groupings of journals based on our understand-
`ing of the current state of the scientific subject matters involved are made.
`
`IPM Vol
`
`No
`
`005
`
`Facebook Ex. 1016
`
`13
`S-C
`
`
`282
`
`and MICHAEL E. D. KOENIG
`
`EVALUATION OF JOURNAL CLUSTERS
`We selected the level 0.4 NTSBC clusters for detailed examination because the largest
`cluster at this level contained 96 journals and was not so large as to suggest a completely
`meaningless journal grouping (the distribution of cluster size was the least skewed at this level).
`As we pointed out, as the level is raised, the size of the largest cluster decreases dramatically.
`This transition from macro-clusters
`to micro-clusters probably corresponds
`to the point at
`which an interdisciplinary chaining of journals breaks up and disciplinary or specialty groupings
`are formed. The cluster containing % journals seems to be such a disciplinary grouping in the
`biomedical field centered around cancer research. A closer look at the clusters obtained at the
`0.4 level will provide some idea of what can be expected at other levels, but not too much
`significance should be placed on this particular level.
`Of the 168 clusters obtained at this level. 79 contained three or more journals, and 89
`clusters contained only two journals each. (Clusters of one journal do not emerge from our
`clustering procedure because the basic input record is the journal pair.) The 89 two-journal
`clusters are not considered in the following discussion, but we should attempt to explain their
`significance. Like other bibliometric data. the distribution of cluster sizes (that is, the number of
`clusters confining
`two journals,
`three journals, four journals, etc.) is very skewed and
`approximately hyperbolic. This is true at any clustering level selected except the very lowest
`where all journals are in a single gigantic cluster. Thus, there are many small clusters and few
`large ones (21 clusters have 10 or more journals and 58 have from three to nine journals at level
`0.4). At lower levels of clustering, the small clusters may join up with one another or with a
`larger cluster. There are three possible interpretations of very small clusters: they may be
`genuinely isolated groupings; they may be tips of larger groupings which emerge at lower
`clustering levels; or they may be fragments of larger clusters which join up with the larger
`clusters at lower levels.
`For the purpose of comparison with the 79 clusters containing three or more journals at
`level 0.4, an independently derived journal classification was used. This classification appears as
`Table A-3 in Narin’s Evul~atjue
`~~~f~u~er~c~[l31. The Table lists Narin and co-workers’
`manual classification of the 1973 source journal list of the Science Citution
`Index. Roughly 2000
`journals are classified into nine major headings (fields) and 106 subheadings (subfields). Two
`points should be noted about the comparison of our journal clusters with Narin’s manual
`classification. First, Narin has classified nearly all source journals in the 1973 source list (over
`2000 journals), while our clusters at level 0.4 with three or more journals comprise only 701
`journals. Hence, we would not expect to find all journals Narin includes under a subheading in
`our clusters. Second, since our clustering experiment was done using the 1974 Science C~tat~~~
`Index, a few journals which were dropped or added to the Science Citation Index
`coverage
`since 1973 do not match up when Narin’s classification is compared with our clusters. The
`number of such cases is, however, small in relation to the number of journals in either file.
`One way of evaluating the match between these two classifications is to count the number of
`journals shared by one of our clusters and the Narin subheading to which it is most strongly
`related. This overlap is expressed as a fraction of the number of journals in our cluster and not
`as a fraction of the number of journals in the related Narin subheading because of the greater
`comprehensivity of the latter. This measure reflects, in effect, the dispersion of our cluster over
`Narin’s subheadings. For example, one of our clusters may contain journals which Narin has
`placed into several different subheadings, although usually there will be a single subheading
`which has the greatest overlap with our cluster. The fraction of journals in this cluster which
`falls in the single most closely related subheading will measure the degree to which the cluster’s
`journals are dispersed over Narin’s subheadings: the larger the fraction, the less the dispersion.
`Figure 2 shows the distribution of fractions of shared or overlapping journals for each of the
`79 clusters with the Narin subheading with which it has the largest overlap. Six small clusters
`(each having 3 journals) have fractions equal to zero because they did not match with any of
`Narin’s subheadings. Most of the journals in these clusters were not classified by Narin because
`they were newly added to the Science Citation Index
`coverage in 1974. The figure shows that 22
`of the 79 clusters had fractional overlaps with Narin’s subheadings of from 0.91 to 1.0. Of the
`22 clusters, 20 were perfect matches, i.e. all journals in the cluster appear under a single
`
`006
`
`Facebook Ex. 1016
`
`HENRY C. SMALL
`
`
`Journal clustering using a bibliographic coupling method
`
`283
`
`*ONLY CLUSTERS WITH CONSIDERED 3 OR MORE JOURNALS ARE
`
`Fig. 2. Fractional overlaps between
`
`journal clusters and subheadings
`
`in Narin’s classification
`
`scheme.
`
`(47%) clusters had fractional overlaps of 0.75 or better, and 56 (71%)
`subheading. Thirty-seven
`had overlaps of 0.5 or better.
`An example of a “good” match between a cluster and a subheading in the manual
`classification is cluster No. 1, a group of 10 journals (see Table 2). The fact that this group of
`journals comes out as cluster No. 1 has no significance except that the first journal in the
`cluster, A GRAEFES A, is early in the alphabet and its pairs were the fist to be selected by the
`computer. (IS1 standard 11 character journal abbreviations are used through&t. For full titles
`see Ref. 14.) The subheading in Narin’s scheme which matches this cluster is titled “ophthal-
`mology” and contains 18 journals. All 10 of the journals in cluster No. 1 appear under Narin’s
`“ophthalmology” subheading. Narin’s scheme lists eight additional journals which do not
`appear in cluster No. 1 or among any of the other clusters at level 0.4 NTSBC. It remains to be
`
`Table 2. Match between cluster No. 1 and Narin’s “ophthalmology”
`
`subheading
`
`Cluster No. 1
`
`A GRAEFES A
`ACT OPHTH K
`
`“ophthalmology”
`
`A GRAEFES A
`ACT OPHTH K
`
`BR J PHYS 0
`CAN J OPHTH
`DOC OPHTHAL
`
`KLIN MONATS
`OPHTHAL RES
`OPHTHALMOLA
`
`007
`
`Facebook Ex. 1016
`
`
`
`284
`
`HENRY G. SMALL and MICHAEL E. D. KOENIG
`
`levels used in creating
`
`the clustered
`
`file would result
`
`the various
`lowering
`determined whether
`in adding
`these journals
`to the cluster.
`is an example of a cluster which matches with
`Cluster No. 19 which contains 10 journals
`more than one of Narin’s subheadings. This cluster corresponds
`to two of Narin’s subheadings,
`one
`titled “obstetrics
`and gynecology”
`containing
`12 journals
`and another called “fertility”
`containing
`five journals
`(see Table 3). Eight of cluster No. 19’s 10 journals overlap with the
`“obstetrics
`and gynecology”
`subheading,
`and two of the cluster’s
`journals overlap with the
`“fertility”
`subheading. This
`is an instance where
`the manual classification
`and the citation-
`based clustering disagree on how journals
`should be grouped. Despite
`the fact that this cluster
`is not a “good” match
`to a particular
`subheading,
`it does have a high face validity. The
`clustering
`suggests
`that due to the commonality
`of the literature
`cited,
`the “obstetrics
`and
`gynecology”
`journals
`should perhaps be merged with the “fertility”
`group.
`To see how the cluster grouped
`these journals, actual linkages among the ten journals were
`drawn (Fig. 3). We see that of the two journals Narin placed in the “fertility”
`subheading one,
`FERT STERIL, was linked
`to the group only through CONTRACEPT, which was the other
`journal placed
`in the “fertility”
`subheading. CONTRACEPT,
`on the other hand, was strongly
`linked
`to the remainder
`of the cluster which Narin had classified under
`“obstetrics
`and
`gynecology.”
`is cluster No. 5 which contains
`An example of a “not-so-good” match with Narin’s scheme
`19 journals. As shown in Table 4, cluster No. 5 contains
`journals which appear in six of Narin’s
`
`Table 3. Match between cluster No. 19 and Narin’s “obstetrics
`“fertility”
`subheadings
`
`and gynecology”
`
`and
`
`Cluster No. 19
`
`ACT OBST SC
`
`CONTRACEPT
`
`REV F GY OB
`
`“obstetrics and gynecology”
`
`ACT OBST SC
`
`ARCH GYNAK
`AUST NZ J 0
`FORTSC GEB
`GYNAKOLOGE
`GYNECOL
`INV
`J OBSTET GY
`
`OBSTET GYN
`REV F GY OB
`
`BIOL REPROD
`CONTRACEPT
`FERT STERIL
`INT J FERT
`J REPR FERT
`
`FERT STEW ACTOBST SC REV F GY’ OB ’ I/ ALJST NZ J 0
`
`Fig. 3. Cluster No. I9 at level 0.4 NTSBC.
`
`008
`
`Facebook Ex. 1016
`
`
`
`Journal clustering using a bibliographic coupling method
`
`285
`
`Table 4. Match between cluster No. 5 and Narin’s subhead-
`ings
`
`Subheadings
`
`Cluster No. 5 (19)
`
`Cell Biology, Cytology
`and Histology
`(28)
`
`Anatomy and Morphology
`
`(9)
`
`(8)
`Microscopy
`Neurology and Neurosurgery
`General Biomedical Research
`Embryology
`(8)
`
`(45)
`(82)
`
`J
`
`CELL TIS RE
`CYTOBIOLOG
`CYTOBIOS
`HISTOCHEM
`HISTOCHEMIS
`J CELL SC1
`J HIST CYTO
`J ULTRA RES
`TISSUE CELL
`Z ZELL MIKR
`ACT ANATOM
`AM J ANAT
`ANAT REC
`J ANAT
`J MORPH
`J MICROSCOP
`J NEUROCYT
`PHI T ROY B
`Z ANAT ENTW
`
`subheadings. The cluster has the largest overlap with the subheading titled “cell biology,
`cytology and histology”, which includes 10 of the cluster’s 19 journals. Another subheading,
`“anatomy and morphology” contains five of the cluster’s journals, and four additional
`subheadings each contain one of the cluster’s journals. The most notable feature of the cluster
`is that it joins together the classical fields of anatomy and morphology with the more modern
`fields of cell biology, cytology and histology. The cluste