throbber
JOURNAL CLUSTERING USING A 3I3LI~G~P~IC
`COUPLING METHOD
`
`HENRY G. SMALL and MKWWL E. D. KOENIG
`IInstitute for Scientific Information, 325 Chestnut Street, Philadelphia. PA 19106, U.S.A.
`
`Abstract--The classification of journal titles into fields or speciattbs is a problem of practical importance in
`library and information science. An algorithm is described which accomplishes such a classification using
`the single&k clustering technique and a novel application of the method of bibliographic coupling. The
`novelty consists in the use of two-step bibliographic coupling linkages, rather than the usual one-step linkages.
`This modification of the similarity measure leads to a marked improvement in the performance of single-link
`clustering in the formation of field or
`of jcmmais. Results of an experiment using this
`algorithm are reported which grouped ~~~~s
`into 168 cfusters. This scope is an ~proveme~t of nearly an
`order of magaitude over previous journal clustering experiments, The res&s are evaluated by comparison with
`an independently derived manual classitkation of the same joumai set. The generally good agreement indicates
`that this method of journal clustering will have sign&ant practical utility for journal classification.
`
`INTRODUCTION
`clustering or categorizing journals has aroused the interest of
`The concept of aIgo~t~ic~~y
`many members of the information science community. As CARPENTER and NARJN[I] point out,
`most work in the area seems to have been motivated by a combination of aesthetic and
`practical considerations. The aesthetic considerations include the challenge of doing algorith-
`mically what has been a very non-trivial task intellectually-the
`classification of journals. The
`task is an almost pure problem in numerical taxonomy, that of p~titioni~
`a population on the
`basis of shared characteristics.
`On the practical side the outcome of journal clustering can have various applications. The
`categories reveal the pattern, the mosaic of scholarly activity. An analysis over time would
`reveal shifts in that pattern, as journals entered or departed from clusters, and as clusters
`themselves emerged, merged, separated and disappeared. Such observations would have
`relevance for sociology, information science, and science policy. Clusters thus derived could
`also be used to analyze and promote Be rationaIization of journal coverage by secondary
`services. The DISISS (Design of Information Systems in the Social Sciences) project has
`proposed such an appIication[2]. Furthermore,
`journal cluster patterns would be useful for
`analyzing and validating thesauri, classification schemes, and indexing schemes.
`A number of previous studies have described attempts to cluster journals. In their seminal
`work of 196’7 XHIGNESS and Usooo~[3~
`examined the journaI-to-journd citation patterns within
`a group of 21 psychology journals to obtain a simiIarity matrix. This was accomplished by
`means of Shepard’s aIgorithm[4] which assigns distances between journaIs in n-dimensional
`space, keeping n as small as possible while preserving the rank orders of citation frequencies
`between
`journals. Nine of the 21 journals were assigned to three overlapping clusters,
`determined by the journals’ proximity to each other in n-space. The multidimensiouality of this
`approach limits it to relatively smalf numbers of journals.
`PARKER, PA[%EY and GARREIT[S], Iater in I%?, undertook an analysis of 17 journals in the
`field of communication research, The measure of relatedness between journals was a form of
`co-citation-the
`frequency of co-occurrence of citations to journals within articles in the 17
`source journals. (The term co-citation, more recently introduced[6], refers to a measure of
`relatedness between articles, defined as the frequency with which two articles are cited together
`by other articles.) Some 68 journals were cited frequently enough to be analyzed, of which
`approx. 30-35 were grouped into some 8-11 clusters (the exact number varies for each of the
`four time periods studied). A criticism as pointed out in the DISISS study described below is
`the lack of arny attempt to normalize for the level of citations. Without normalization the
`procedure almost inevitably links highly cited journals. The technique is, however, capable of
`
`277
`
`001
`
`Facebook Ex. 1013
`
`spe&lty clusters
`

`

`HENRY G. SMALL and MICHAEL E. D. KOENIG
`providing “afhliates” as well as “members” for each cluster, but without normalization, the
`affiliates tend to be the most highly cited journals that are members of the most strongly linked
`clusters.
`Large scale attempts at clustering journals using citation relationships were not possible
`until the advent of the Science C~fu~~~n Index@ (SCI) database (compiled by the Institute for
`Scientific Information). Particularly important was Garfield’s reformating of the SC1 to show
`journal to journal citation patterns [7] which revealed the existence of very strong direct citation
`in the publication of Journal Citation
`linkages among journals. This work culminated
`Reports@[8] which is an index of these journal to journal citation patterns. CARPENTER and
`NARIN
`used these data to look at three disciplines: physics, chemistry and molecular biology.
`For each discipline the individual journals were manually pre-selected, and a separate joumal-
`to-journal citation matrix was prepared. A “hill climbing” algorithm was used which for each
`attempt requires the number of clusters to be predetermined, as the algorithm creates no new
`clusters and rarely eliminates any. A measure of cluster quality is then used to determine which
`level of clusters has the “best” fit. In this study, nine different combinations of journal
`similarity measures and cluster quality measures were used and then combined to produce the
`final results. Each of the three disciplines, ranging in size from 8 1 to 106 journals, was clustered
`into 11 or 12 clusters, with 5-16 journals remaining unclustered, and the clusters produced had a
`high degree of face validity.
`A pilot study to explore the feasibility of clustering social science journals was undertaken
`by the DISISS (Design of Information Systems in-the Social Sciences) project at the University
`of Bath in the U.K. in the early 197Os[2,9]. Citation data were obtained from 17 source
`journals. Again, a jour~l-to-journal
`citing matrix was used as the basic data form. The
`clustering algorithm called SCICON, operates on the basis of calculating the root mean square
`distance from members in n-dimensional space (n being the number of variables, in this case
`the 17 citing journals) to the center of gravity of each cluster and uses a “run-in” technique of
`starting with a large number of clusters and then reducing the number one at a time, examining
`at each step whether a better fit is accomplished by moving any journal to another cluster. The
`result of this technique used on 115 cited journals was three clusters: psychology (34 members),
`economics (21 members) and amorphous (60 members). Many of the smaller clusters produced
`during the run-in. when the number of clusters was higher, were meaningful however.
`The work described above, although useful and frequently imaginative, has been limited in
`its scope. The largest number of journals clustered at one time is barely more than a
`hundred-a very small portion of the universe of journals. The constraint on size appears to
`originate not from the lack of data, but from the sheer impracticality of processing the matrices
`and multidimensional arrays inherent in the techniques used, when any significant number of
`journals is to be considered.
`
`METHOD
`The procedure used in this experiment is a novel combination of some standard methods
`known to bibliometricians and numerical taxonomists. First. we use the well known technique
`of bibliographic coupling to deri,ve the basic journal-to-journal associations[lO]. Co-citation
`could equally well have been used as bibliographic coupling, but for computational reasons,
`bibliographic coupling was the more convenient association measure. For our purposes,
`bibliographic coupling is defined as the citing of the same document by two journals. (Con-
`vention~ly, bibliographic coupling is defined as the citing of the same document by two later
`documents.) The strength of bibliographic coupling (BC) is the number of identical, distinct
`documents cited by the two journals. This strength of coupling is normalized to compensate for
`the size effects of the two journals by dividing the bibliographic coupling strength by the sum of
`the number of references made by the two journals.
`The second procedure used is single-link clustering. This mode of clustering has been
`described elsewhere[ll]. We have used the fact that single-link clustering is equivalent to the
`application of a threshold on the item-to-item proximity measure. In our experiment,
`the
`method of single-link clustering was implemented in the following way: A file of journal-to-
`journal pairs with their appropriate coefficients of association is used as input. A threshold
`value of the journal-to-journal association is set and a journal is selected as a starting point. Ail
`
`002
`
`Facebook Ex. 1013
`
`278
`[ I]
`

`

`Journal clustering using a bibliographic coupling method
`
`219
`
`are
`level of association
`the prescribed
`at or above
`journal
`to this selected
`linked
`journals
`is then used as a starting
`to the cluster. Each of these journals
`in the file and assigned
`located
`point and all journals
`linked
`to them are assigned
`to the cluster. The cluster
`is complete when
`no “new”
`journals
`can be added to the cluster.
`to create another
`journal, and attempts
`The program
`then proceeds
`to the next, unclustered
`cluster. After all journals have been examined and assigned
`to clusters,
`the program
`terminates.
`The smallest cluster created using our procedure
`is a two member
`(two journal) cluster since
`journals not linked
`to at least one other journal at the prescribed
`level of association
`are not
`searched. The clusters are created at a particular
`level of the journal-to-journal
`association
`and
`we have no way of knowing what level is “optimum”
`except by inspection of the results and
`comparison with results obtained using other procedures.
`In general, a level is sought in which
`no very
`large cluster exists
`(greater
`than 100 journals),
`realizing
`that such a level, while
`appropriate
`for some areas or disciplines, may not be appropriate
`for others.
`two” between
`A novel feature of our journal clustering system
`is the use of paths of “length
`journals
`to determine
`the basic association measure used in clustering. Before we define what
`we mean by this, we can clarify our motivation by describing an earlier experiment which was
`not successful. We began with
`the file of journal pairs which were
`linked by normalized
`bibliographic
`coupling
`(NBC) described above. We then set a minimum
`threshold
`for NBC and
`extracted all journal pairs at or above
`this threshold.
`This gave a file of “strong”
`journal-to-journal
`linkages. The problem which we encountered
`was that we could not obtain a satisfactory
`set of single-link clusters using the NBC measure.
`The journals
`tended
`to chain together
`forming very large and loosely
`linked clusters.
`It is well
`known
`that the single-link
`algorithm has a tendency
`to form clusters of this kind, and this
`tendency,
`combined with
`the strongly
`interdisciplinary
`character
`of journal
`relationships,
`created enormous chains of journals which resisted
`fragmentation when the level of NBC was
`raised. Eventually, when the journals
`finally did break up into reasonably
`small clusters at a
`very high level of NBC,
`too few of the journals
`remained
`in the clusters
`to consider
`the
`experiment
`a success.
`measure.
`to modify our basic journal-to-journal
`As a result of this experience, we decided
`We had noticed
`that
`the chaining of journals
`to create gigantic
`clusters
`in the previous
`experiment was very often due to only a few links from a large or strongly
`interdisciplinary
`journal
`linking one
`journal
`“clump”
`to another. Our problem,
`then, was
`to enhance
`the
`“clumpiness”
`Iof the network
`so that inter-clump
`linkages could be “submerged”
`below some
`threshold value.
`two” between
`the number of paths of “length
`The method we chose was to determine
`journals. For example,
`suppose we take some arbitrary
`starting
`journal.
`It is linked with a
`number of other journals with an NBC strength at or above some threshold. These journals are,
`in turn,
`linked
`to other journals
`at or above
`this threshold. Now we select a second arbitrary
`journal and find all the distinct paths which lead from it to the starting
`journal but which pass
`through other
`(third)
`journals
`as intermediate
`steps. These are the paths of “length
`two”
`between
`the two journals. For every pair of journals,
`then, there is some number of two step
`paths (including zero) which connect
`them. It is also clear that the number of such paths for any
`pair of journals
`is limited
`to the lesser number of paths of “length one” which originate
`from
`one or the other journal. For example,
`if journal A has five links to other journals and journal
`B has ten links, the number of two-step paths leading from A to B cannot exceed five. Hence, we
`can normalize
`the two-step paths as shown in Fig. 1. This normalization provides a new measure of
`journal-to-journal
`association
`(normalized
`two-step bibliographic
`coupling: NTSBC) which has
`the property of varying from zero to one.
`in a
`journals
`the linkages between
`It is also easy to see intuitively why this should enhance
`“clump” and thus provide a better clustering
`than was obtained with the simple NBC. Suppose
`we have
`two clumps which are joined by only a few links. The number of two-step paths
`between
`journals within a clump will be high, while the number of two-step paths between
`journals
`in different
`clumps will be low. Hence, when a threshold on the two-step
`linkage
`measure
`is applied,
`the within-clump
`ties will remain and the between-clump
`ties will tend to be
`broken.
`It should be noted that there was a direct connection
`
`(a one-step path) between J, and J2 in
`
`003
`
`Facebook Ex. 1013
`
`

`

`HENRY G. Sum and MNXAEL E. D. KOEKIG
`
`Fig. I. ~~~t~ti~n of nom&ted
`coupling. Journals JX and J2 are linked by three
`two-step b~~o~~hic
`two-step paths. 1, has a total of eight one-step paths leading from it and I, has a totaB of six. The
`normalized two-step b~i~~phic
`coupling ~IKBC)
`is calculated as follows:
`
`No. two-step 1-2
`NTSBC =
`Na. one-step I+ No. one-step 2 - No. two-step 1-2
`3
`= -
`g+6-3
`
`= 0.273.
`
`Fig. 1. The inclusion of this direct link actually weakens the normalized measure from what it
`would be if the link did not exist. It does so by making the denominator of the NTSBC formula
`larger. In other wards, the strength of linkage between two journals connected by some number
`of two-step paths will be less if there is a one-step path between the two journals than if there is
`not. This seeming cont~iction
`could be easily removed if we adopt the simple rule that every
`one-step path counts as one two-step path in our calculation of NTSBC. It is unlikely, however,
`that this refinement will have much impact on the results of the clustering since all directly
`linked journals experience
`the same disadvantage and few journal pairs having frequent
`two-step links fail to be directly linked as well. In any event, we do not want to give undue
`weight to the one-step paths since they are responsible for the chaining effects observed earlier.
`Let us now review the method. We begin with an annual Science Cimion index
`and
`determine the bibliographic couphng strength for all pairs of source journals in this file. This
`BC strength is normalized by dividing by the sum of the number of references made by each
`journal during the year in question. A threshold is set on the normalized bibliographic coupling
`(NBC) and all journal pairs satisfying this threshold are selected. With this restricted Me, the
`number of two-step paths between all pairs of journals
`is determined. This number is
`normalized by dividing by the sum of the number of one-step paths emanating from each
`journal minus the number of two-step paths (see Fig. 1). The normalized two-step bibliographic
`coupling (NTSBC) is used as input to the single-link clustering routine. Clusters of journals are
`obtained at a specified, but arbitrary level of the NTSBC.
`
`CLUSTER FILE STATISTICS
`Before discussing the specific journal clusters obtained, we will describe the statistical
`characteristics of the initial and intermediate files (see Table 1). As noted above, an annual SCI
`cumulation is used as the database, which in this experiment was the 1974 file (items 1 and 2 in
`Table 1). From this file we created a special file listing each document cited by two or more
`distinct source journals, and the journals citing it. If a document is cited more than once by a
`certain journal, it is nevertheless counted as though it were only a single citation. This reduces
`the number of records in the file by about 50% (item 3). (The documents cited by only one
`journal are dropped since they do not contribute to BC.)
`The next step is to form all combinations of source journals which cite a given document,
`i.e. form all the bibliographic couplings in the file. There were almost seven million such
`couplings (item 4), which reduces to about 400,000 distinct pairs when identical journal pairs are
`gathered together and all pairs occurring only once are dropped. Each journal pair with its
`attached BC strength is then normalized by dividing by the sum of the number of references
`made by the pair of journals during 1974. A threshold of 0.01 was set on this NBC to eliminate
`weak linkages between journals (items 7 and 8). It is on this reduced file that the two-step paths
`are determined. This is done by forming pairs of journals which are linked to a common journal
`
`004
`
`Facebook Ex. 1013
`
`280
`

`

`Journal clustering using a bibliographic coupling method
`
`281
`
`in the file of journal pairs in both the
`(item 9). (This step is facilitated by the presence
`“forward” and “backward” versions, e.g. both AB and BA appear.) Again, identical pairs are
`gathered together and the frequency of two-step paths is attached to the pairs (items 10 and 11).
`The second normalization (according to Fig. 1) is carried out, and these data are input to the
`single-link clustering program. A threshold of 0.4 on the NTSBC resulted in 168 clusters
`containing a total of 890 journals, with an average cluster size of 5.3 journals per cluster. (The
`minimum cluster size is two journals and the largest cluster obtained at this level contains 96
`journals.)
`We contrast this clustering outcome with one obtained in our previous unsuccessful attempt
`using NBC directly as input to single-link clustering. For a threshold of 0.025 NBC, which
`
`Table 1. File statistics
`
`for journal clustering
`
`I.
`2.
`3.
`
`journal with references
`1974 SCI source
`1974 SC1 citations
`Citations
`to documents cited two or more times by
`distinct source
`journals
`4. Source
`journal pairs (bibliographic couplings)
`5. Distinct source
`journal pairs
`than I
`6. Journals
`in pairs at BC strength greater
`7. Distinct
`journal pairs at NBC greater
`than 0.01
`8. Journals
`in pairs at NBC greater
`than 0.01
`9. Total two-step paths between
`journals
`10. Distinct journal pairs connected by two-step paths
`II. Journals
`linked by two-step paths
`12. Distinct journal pairs at 0.4 NTSBC
`13. Journals clustered at 0.4 NTSBC
`14. Clusters formed at 0.4 NTSBC
`IS. Mean journals per cluster at 0.4 NTSBC
`16. Journals
`in largest cluster at 0.4 NTSBC
`
`2376
`5.168.1 I9
`
`2,478,207
`6,839,380
`705,167
`2359
`8044
`1679
`159,171
`45,180
`1586
`2071
`890
`168
`5.3
`%
`
`represented the most successful NBC results obtained, 119 clusters resulted containing a total
`of 747 journals, with an average cluster size of 6.3 journals per cluster. This larger mean cluster
`size was due to the largest cluster which contained 297 journals, constituting nearly 40% of the
`journals clustered. By contrast, for the clusters obtained at 0.4 NTSBC, the largest cluster of 96
`journals constituted only about 11% of the journals clustered. It is clear, then, that by using a
`two-step linkage measure the degree of chaining has been substantially reduced and the
`“clumliiness” of the journal network increased.
`Other clustering levels of the NTSBC were also tried and it appears that the critical level at
`which a transition occurs from a highly chained and enormous cluster to a group of subject or
`discipline oriented clusters is between 0.2 and 0.3 NTSBC. At 0.2 there were only 40 clusters
`with the largest cluster containing 1276 journals, nearly the entire journal set. At 0.3 NTSBC a
`radical change occurred. We obtained 153 clusters with the largest cluster containing 360
`journals. At level 0.4 we have increased the number of clusters by only 15 but the largest
`cluster declined in size nearly 75%.
`The existence of a “critical point” in the clustering level where there is a sudden breaking
`up of the largest cluster is also found in experiments clustering highly cited documents rather
`than journals[l2]. Whatever this may mean, it is clear that no one level of clustering is optimal
`for all scientific fields or specialties. Ideally one should adopt a variable level approach to seek
`out the best possible representation for a given area by varying the level up or down. This
`means that a way must be found of evaluating the quality of a cluster that is independent of the
`clustering methodology. This is a familiar situation in cluster analysis since it is generally
`recognized that adequate tests of cluster significance have not yet been developed and reliance
`on other means for evaluating results is necessary
`(e.g. their utility or agreement with
`classifications derived by other means). In the discussion of the clusters at level 0.4 NTSBC,
`which follows, we use two modes of f‘validating” the results. First, the classification obtained
`automatically is compared with one which was obtained manually and quite independently.
`Second, qualitative evaluations of some of the groupings of journals based on our understand-
`ing of the current state of the scientific subject matters involved are made.
`
`IPM Vol
`
`No
`
`005
`
`Facebook Ex. 1013
`
`13
`S-C
`

`

`282
`
`and MICHAEL E. D. KOENIG
`
`EVALUATION OF JOURNAL CLUSTERS
`We selected the level 0.4 NTSBC clusters for detailed examination because the largest
`cluster at this level contained 96 journals and was not so large as to suggest a completely
`meaningless journal grouping (the distribution of cluster size was the least skewed at this level).
`As we pointed out, as the level is raised, the size of the largest cluster decreases dramatically.
`This transition from macro-clusters
`to micro-clusters probably corresponds
`to the point at
`which an interdisciplinary chaining of journals breaks up and disciplinary or specialty groupings
`are formed. The cluster containing % journals seems to be such a disciplinary grouping in the
`biomedical field centered around cancer research. A closer look at the clusters obtained at the
`0.4 level will provide some idea of what can be expected at other levels, but not too much
`significance should be placed on this particular level.
`Of the 168 clusters obtained at this level. 79 contained three or more journals, and 89
`clusters contained only two journals each. (Clusters of one journal do not emerge from our
`clustering procedure because the basic input record is the journal pair.) The 89 two-journal
`clusters are not considered in the following discussion, but we should attempt to explain their
`significance. Like other bibliometric data. the distribution of cluster sizes (that is, the number of
`clusters confining
`two journals,
`three journals, four journals, etc.) is very skewed and
`approximately hyperbolic. This is true at any clustering level selected except the very lowest
`where all journals are in a single gigantic cluster. Thus, there are many small clusters and few
`large ones (21 clusters have 10 or more journals and 58 have from three to nine journals at level
`0.4). At lower levels of clustering, the small clusters may join up with one another or with a
`larger cluster. There are three possible interpretations of very small clusters: they may be
`genuinely isolated groupings; they may be tips of larger groupings which emerge at lower
`clustering levels; or they may be fragments of larger clusters which join up with the larger
`clusters at lower levels.
`For the purpose of comparison with the 79 clusters containing three or more journals at
`level 0.4, an independently derived journal classification was used. This classification appears as
`Table A-3 in Narin’s Evul~atjue
`~~~f~u~er~c~[l31. The Table lists Narin and co-workers’
`manual classification of the 1973 source journal list of the Science Citution
`Index. Roughly 2000
`journals are classified into nine major headings (fields) and 106 subheadings (subfields). Two
`points should be noted about the comparison of our journal clusters with Narin’s manual
`classification. First, Narin has classified nearly all source journals in the 1973 source list (over
`2000 journals), while our clusters at level 0.4 with three or more journals comprise only 701
`journals. Hence, we would not expect to find all journals Narin includes under a subheading in
`our clusters. Second, since our clustering experiment was done using the 1974 Science C~tat~~~
`Index, a few journals which were dropped or added to the Science Citation Index
`coverage
`since 1973 do not match up when Narin’s classification is compared with our clusters. The
`number of such cases is, however, small in relation to the number of journals in either file.
`One way of evaluating the match between these two classifications is to count the number of
`journals shared by one of our clusters and the Narin subheading to which it is most strongly
`related. This overlap is expressed as a fraction of the number of journals in our cluster and not
`as a fraction of the number of journals in the related Narin subheading because of the greater
`comprehensivity of the latter. This measure reflects, in effect, the dispersion of our cluster over
`Narin’s subheadings. For example, one of our clusters may contain journals which Narin has
`placed into several different subheadings, although usually there will be a single subheading
`which has the greatest overlap with our cluster. The fraction of journals in this cluster which
`falls in the single most closely related subheading will measure the degree to which the cluster’s
`journals are dispersed over Narin’s subheadings: the larger the fraction, the less the dispersion.
`Figure 2 shows the distribution of fractions of shared or overlapping journals for each of the
`79 clusters with the Narin subheading with which it has the largest overlap. Six small clusters
`(each having 3 journals) have fractions equal to zero because they did not match with any of
`Narin’s subheadings. Most of the journals in these clusters were not classified by Narin because
`they were newly added to the Science Citation Index
`coverage in 1974. The figure shows that 22
`of the 79 clusters had fractional overlaps with Narin’s subheadings of from 0.91 to 1.0. Of the
`22 clusters, 20 were perfect matches, i.e. all journals in the cluster appear under a single
`
`006
`
`Facebook Ex. 1013
`
`HENRY C. SMALL
`

`

`Journal clustering using a bibliographic coupling method
`
`283
`
`*ONLY CLUSTERS WITH CONSIDERED 3 OR MORE JOURNALS ARE
`
`Fig. 2. Fractional overlaps between
`
`journal clusters and subheadings
`
`in Narin’s classification
`
`scheme.
`
`(47%) clusters had fractional overlaps of 0.75 or better, and 56 (71%)
`subheading. Thirty-seven
`had overlaps of 0.5 or better.
`An example of a “good” match between a cluster and a subheading in the manual
`classification is cluster No. 1, a group of 10 journals (see Table 2). The fact that this group of
`journals comes out as cluster No. 1 has no significance except that the first journal in the
`cluster, A GRAEFES A, is early in the alphabet and its pairs were the fist to be selected by the
`computer. (IS1 standard 11 character journal abbreviations are used through&t. For full titles
`see Ref. 14.) The subheading in Narin’s scheme which matches this cluster is titled “ophthal-
`mology” and contains 18 journals. All 10 of the journals in cluster No. 1 appear under Narin’s
`“ophthalmology” subheading. Narin’s scheme lists eight additional journals which do not
`appear in cluster No. 1 or among any of the other clusters at level 0.4 NTSBC. It remains to be
`
`Table 2. Match between cluster No. 1 and Narin’s “ophthalmology”
`
`subheading
`
`Cluster No. 1
`
`A GRAEFES A
`ACT OPHTH K
`
`“ophthalmology”
`
`A GRAEFES A
`ACT OPHTH K
`
`BR J PHYS 0
`CAN J OPHTH
`DOC OPHTHAL
`
`KLIN MONATS
`OPHTHAL RES
`OPHTHALMOLA
`
`007
`
`Facebook Ex. 1013
`
`

`

`284
`
`HENRY G. SMALL and MICHAEL E. D. KOENIG
`
`levels used in creating
`
`the clustered
`
`file would result
`
`the various
`lowering
`determined whether
`in adding
`these journals
`to the cluster.
`is an example of a cluster which matches with
`Cluster No. 19 which contains 10 journals
`more than one of Narin’s subheadings. This cluster corresponds
`to two of Narin’s subheadings,
`one
`titled “obstetrics
`and gynecology”
`containing
`12 journals
`and another called “fertility”
`containing
`five journals
`(see Table 3). Eight of cluster No. 19’s 10 journals overlap with the
`“obstetrics
`and gynecology”
`subheading,
`and two of the cluster’s
`journals overlap with the
`“fertility”
`subheading. This
`is an instance where
`the manual classification
`and the citation-
`based clustering disagree on how journals
`should be grouped. Despite
`the fact that this cluster
`is not a “good” match
`to a particular
`subheading,
`it does have a high face validity. The
`clustering
`suggests
`that due to the commonality
`of the literature
`cited,
`the “obstetrics
`and
`gynecology”
`journals
`should perhaps be merged with the “fertility”
`group.
`To see how the cluster grouped
`these journals, actual linkages among the ten journals were
`drawn (Fig. 3). We see that of the two journals Narin placed in the “fertility”
`subheading one,
`FERT STERIL, was linked
`to the group only through CONTRACEPT, which was the other
`journal placed
`in the “fertility”
`subheading. CONTRACEPT,
`on the other hand, was strongly
`linked
`to the remainder
`of the cluster which Narin had classified under
`“obstetrics
`and
`gynecology.”
`is cluster No. 5 which contains
`An example of a “not-so-good” match with Narin’s scheme
`19 journals. As shown in Table 4, cluster No. 5 contains
`journals which appear in six of Narin’s
`
`Table 3. Match between cluster No. 19 and Narin’s “obstetrics
`“fertility”
`subheadings
`
`and gynecology”
`
`and
`
`Cluster No. 19
`
`ACT OBST SC
`
`CONTRACEPT
`
`REV F GY OB
`
`“obstetrics and gynecology”
`
`ACT OBST SC
`
`ARCH GYNAK
`AUST NZ J 0
`FORTSC GEB
`GYNAKOLOGE
`GYNECOL
`INV
`J OBSTET GY
`
`OBSTET GYN
`REV F GY OB
`
`BIOL REPROD
`CONTRACEPT
`FERT STERIL
`INT J FERT
`J REPR FERT
`
`FERT STEW ACTOBST SC REV F GY’ OB ’ I/ ALJST NZ J 0
`
`Fig. 3. Cluster No. I9 at level 0.4 NTSBC.
`
`008
`
`Facebook Ex. 1013
`
`

`

`Journal clustering using a bibliographic coupling method
`
`285
`
`Table 4. Match between cluster No. 5 and Narin’s subhead-
`ings
`
`Subheadings
`
`Cluster No. 5 (19)
`
`Cell Biology, Cytology
`and Histology
`(28)
`
`Anatomy and Morphology
`
`(9)
`
`(8)
`Microscopy
`Neurology and Neurosurgery
`General Biomedical Research
`Embryology
`(8)
`
`(45)
`(82)
`
`J
`
`CELL TIS RE
`CYTOBIOLOG
`CYTOBIOS
`HISTOCHEM
`HISTOCHEMIS
`J CELL SC1
`J HIST CYTO
`J ULTRA RES
`TISSUE CELL
`Z ZELL MIKR
`ACT ANATOM
`AM J ANAT
`ANAT REC
`J ANAT
`J MORPH
`J MICROSCOP
`J NEUROCYT
`PHI T ROY B
`Z ANAT ENTW
`
`subheadings. The cluster has the largest overlap with the subheading titled “cell biology,
`cytology and histology”, which includes 10 of the cluster’s 19 journals. Another subheading,
`“anatomy and morphology” contains five of the cluster’s journals, and four additional
`subheadings each contain one of the cluster’s journals. The most notable feature of the cluster
`is that it joins together the classical fields of anatomy and morphology with the more modern
`fields of cell biology, cytology and histology. Th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket