throbber
Case 4:18-cv-07229-YGR Document 192-6 Filed 04/19/21 Page 1 of 5
`Case 4:18-cv-07229—YGR Document 192-6 Filed 04/19/21 Page 1 of 5
`
`EXHIBIT 5
`
`EXHIBIT 5
`
`

`

`2/23/2021
`
`CiteSeerX - Wikipedia
`Case 4:18-cv-07229-YGR Document 192-6 Filed 04/19/21 Page 2 of 5
`
`Owner
`
`CiteSeerx
`Type of site
`Bibliographic
`database
`Pennsylvania State
`University College
`of Information
`Sciences and
`Technology
`citeseerx.ist.psu
`.edu (https://citese
`erx.ist.psu.edu/)
`Registration
`Optional
`Launched
`2008 / 1997
`Current status Active
`Content
`Creative Commons
`license
`BY-NC-SA
`license[1]
`
`URL
`
`CiteSeerX
`
`CiteSeerx (originally called CiteSeer) is a public search engine and digital library for
`scientific and academic papers, primarily in the fields of computer and information
`science. CiteSeer is considered as a predecessor of academic search tools such as
`Google Scholar and Microsoft Academic Search. CiteSeer-like engines and archives
`usually only harvest documents from publicly available websites and do not crawl
`publisher websites. For this reason, authors whose documents are freely available are
`more likely to be represented in the index.
`
`CiteSeer's goal is to improve the dissemination and access of academic and scientific
`literature. As a non-profit service that can be freely used by anyone, it has been
`considered as part of the open access movement that is attempting to change academic
`and scientific publishing to allow greater access to scientific literature. CiteSeer freely
`provided Open Archives Initiative metadata of all indexed documents and links
`indexed documents when possible to other sources of metadata such as DBLP and the
`ACM Portal. To promote open data, CiteSeerx shares its data for non-commercial
`purposes under a Creative Commons license.[1]
`
`CiteSeer changed its name to ResearchIndex at one point and then changed it back.
`
`Contents
`History
`CiteSeer and CiteSeer.IST
`CiteSeerx
`Current features
`Automated information extraction
`Focused crawling
`Usage
`Data
`Other SeerSuite-based search engines
`See also
`References
`Further reading
`External links
`
`History
`
`CiteSeer and CiteSeer.IST
`
`CiteSeer was created by researchers Lee Giles, Kurt Bollacker and Steve Lawrence in 1997 while they were at the NEC
`Research Institute (now NEC Labs), Princeton, New Jersey, USA. CiteSeer's goal was to actively crawl and harvest
`academic and scientific documents on the web and use autonomous citation indexing to permit querying by citation or by
`document, ranking them by citation impact. At one point, it was called ResearchIndex.
`
`CiteSeer became public in 1998 and had many new features unavailable in academic search engines at that time. These
`included:
`
`https://en.wikipedia.org/wiki/CiteSeerX
`
`1/4
`
`Exhibit #
`
`Hall-Ellis 8
`
`3-01-2021 JB
`
`exhibitsticker.com
`
`

`

`CiteSeerX - Wikipedia
`2/23/2021
`Case 4:18-cv-07229-YGR Document 192-6 Filed 04/19/21 Page 3 of 5
`Autonomous Citation Indexing automatically created a citation index that can be used for literature search and
`evaluation.
`Citation statistics and related documents were computed for all articles cited in the database, not just the indexed
`articles.
`Reference linking allowing browsing of the database using citation links.
`Citation context showed the context of citations to a given paper, allowing a researcher to quickly and easily see what
`other researchers have to say about an article of interest.
`Related documents were shown using citation and word based measures and an active and continuously updated
`bibliography is shown for each document.
`
`CiteSeer was granted a United States patent # 6289342, titled "Autonomous citation indexing and literature browsing
`using citation context", on September 11, 2001. The patent was filed on May 20, 1998, and has priority to January 5, 1998.
`A continuation patent (US Patent # 6738780) was filed on May 16, 2001 and granted on May 18, 2004.
`
`After NEC, in 2004 it was hosted as CiteSeer.IST on the World Wide Web at the College of Information Sciences and
`Technology, The Pennsylvania State University, and had over 700,000 documents. For enhanced access, performance and
`research, similar versions of CiteSeer were supported at universities such as the Massachusetts Institute of Technology,
`University of Zürich and the National University of Singapore. However, these versions of CiteSeer proved difficult to
`maintain and are no longer available. Because CiteSeer only indexes freely available papers on the web and does not have
`access to publisher metadata, it returns fewer citation counts than sites, such as Google Scholar, that have publisher
`metadata.
`
`CiteSeer had not been comprehensively updated since 2005 due to limitations in its architecture design. It had a
`representative sampling of research documents in computer and information science but was limited in coverage because it
`was limited to papers that are publicly available, usually at an author's homepage, or those submitted by an author. To
`overcome some of these limitations, a modular and open source architecture for CiteSeer was designed – CiteSeerx.
`
`CiteSeerx
`
`CiteSeerx replaced CiteSeer and all queries to CiteSeer were redirected. CiteSeerx[2] is a public search engine and digital
`library and repository for scientific and academic papers primarily with a focus on computer and information science.[2]
`However, recently CiteSeerx has been expanding into other scholarly domains such as economics, physics and others.
`Released in 2008, it was loosely based on the previous CiteSeer search engine and digital library and is built with a new
`open source infrastructure, SeerSuite, and new algorithms and their implementations. It was developed by researchers Dr.
`Isaac Councill and Dr. C. Lee Giles at the College of Information Sciences and Technology, Pennsylvania State University. It
`continues to support the goals outlined by CiteSeer to actively crawl and harvest academic and scientific documents on the
`public web and to use a citation inquiry by citations and ranking of documents by the impact of citations. Currently, Lee
`Giles, Prasenjit Mitra, Susan Gauch, Min-Yen Kan, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Pucktada
`Treeratpituk, Jian Wu, Douglas Jordan, Steve Carman, Jack Carroll, Jim Jansen, and Shuyi Zheng are or have been actively
`involved in its development. Recently, a table search feature was introduced.[3] It has been funded by the National Science
`Foundation, NASA, and Microsoft Research.
`
`CiteSeerx continues to be rated as one of the world's top repositories and was rated number 1 in July 2010.[4] It currently
`has over 6 million documents with nearly 6 million unique authors and 120 million citations.
`
`CiteSeerx also shares its software, data, databases and metadata with other researchers, currently by Amazon S3 and by
`rsync.[5] Its new modular open source architecture and software (available previously on SourceForge but now on GitHub)
`is built on Apache Solr and other Apache and open source tools which allows it to be a testbed for new algorithms in
`document harvesting, ranking, indexing, and information extraction.
`
`CiteSeerx caches some PDF files that it has scanned. As such, each page include a DMCA link which can be used to report
`copyright violations.[6]
`Current features
`
`Automated information extraction
`
`https://en.wikipedia.org/wiki/CiteSeerX
`
`2/4
`
`

`

`CiteSeerX - Wikipedia
`2/23/2021
`Case 4:18-cv-07229-YGR Document 192-6 Filed 04/19/21 Page 4 of 5
`CiteSeerx uses automated information extraction tools, usually built on machine learning methods such ParsCit, to extract
`scholarly document metadata such as title, authors, abstract, citations, etc. As such, there are sometime errors in authors
`and titles. Other academic search engines have similar errors.
`
`Focused crawling
`
`CiteSeerx crawls publicly available scholarly documents primarily from author webpages and other open resources, and
`does not have access to publisher metadata. As such citation counts in CiteSeerx are usually less than those in Google
`Scholar and Microsoft Academic Search who have access to publisher metadata.
`
`Usage
`
`CiteSeerx has nearly 1 million users worldwide based on unique IP addresses and has millions of hits daily. Annual
`downloads of document PDFs was nearly 200 million for 2015.
`
`Data
`
`CiteSeerx data is regularly shared under a Creative Commons BY-NC-SA license with researchers worldwide and has been
`and is used in many experiments and competitions.
`
`Thanks to its OAI-PMH endpoint,[7] CiteSeerX is an open archive and its content is indexed like an institutional repository
`in academic search engines, for instance BASE and Unpaywall consumers.
`Other SeerSuite-based search engines
`
`The CiteSeer model had been extended to cover academic documents in business with SmealSearch and in e-business with
`eBizSearch. However, these were not maintained by their sponsors. An older version of both of these could be once found
`at BizSeer.IST but is no longer in service.
`
`Other Seer-like search and repository systems have been built for chemistry, ChemXSeer and for archaeology, ArchSeer.
`Another had been built for robots.txt file search, BotSeer. All of these are built on the open source tool SeerSuite, which
`uses the open source indexer Lucene.
`See also
`Arnetminer
`arXiv
`Collection of Computer Science Bibliographies
`DBLP (Digital Bibliography & Library Project)
`Disciplinary repository
`Google Scholar
`List of academic databases and search engines
`Microsoft Academic
`Research Papers in Economics (RePEc)
`Semantic Scholar
`
`References
`1. "CiteSeerX Data Policy" (https://web.archive.org/web/20120105193216/http://csxstatic.ist.psu.edu/about/data).
`Archived from the original (http://csxstatic.ist.psu.edu/about/data) on 2012-01-05. Retrieved 2015-11-10.
`2. "About CiteSeerX" (http://citeseerx.ist.psu.edu/about/site). Retrieved 2010-05-07.
`3. "The CiteSeerX Team" (https://web.archive.org/web/20180726034438/http://csxstatic.ist.psu.edu/about/team).
`Pennsylvania State University. Archived from the original (http://csxstatic.ist.psu.edu:80/about/team) on 2018-07-26.
`Retrieved 2018-05-01.
`
`https://en.wikipedia.org/wiki/CiteSeerX
`
`3/4
`
`

`

`CiteSeerX - Wikipedia
`2/23/2021
`Case 4:18-cv-07229-YGR Document 192-6 Filed 04/19/21 Page 5 of 5
`4. "Ranking Web of World Repositories: Top 800 Repositories" (https://web.archive.org/web/20100724004342/http://repos
`itories.webometrics.info/top800_rep.asp). Cybermetrics Lab. July 2010. Archived from the original (http://repositories.w
`ebometrics.info/top800_rep.asp) on 2010-07-24. Retrieved 2010-07-24.
`5. "About CiteSeerX Data" (https://web.archive.org/web/20120105193216/http://csxstatic.ist.psu.edu/about/data).
`Pennsylvania State University. Archived from the original (http://csxstatic.ist.psu.edu/about/data) on 2012-01-05.
`Retrieved 2012-01-25.
`6. For example, "CiteSeerx – DMCA Notice". CiteSeerX 10.1.1.604.4916 (https://citeseerx.ist.psu.edu/viewdoc/summary?
`doi=10.1.1.604.4916). "The document with the identifier "10.1.1.604.4916" has been removed due to a DMCA
`takedown notice. If you believe the removal has been in error, please contact us through the feedback page, along with
`the identifier mentioned in this page."
`7. Hirst, Author Tony (2011-12-08). "Using OAI-PMH as a Single Record Level Query Interface to Citeseer" (https://blog.o
`useful.info/2011/12/08/using-oai-pmh-as-a-query-interface-to-citeseer/). Retrieved 2020-04-25.
`
`Further reading
`
`Giles, C. Lee; Bollacker, Kurt D.; Lawrence, Steve (1998). "CiteSeer: an automatic citation indexing system".
`Proceedings of the Third ACM Conference on Digital Libraries. pp. 89–98. CiteSeerX 10.1.1.30.6847 (https://citeseerx.i
`st.psu.edu/viewdoc/summary?doi=10.1.1.30.6847). doi:10.1145/276675.276685 (https://doi.org/10.1145%2F276675.27
`6685). ISBN 978-0-89791-965-4. S2CID 514080 (https://api.semanticscholar.org/CorpusID:514080).
`
`External links
`
`Official website of CiteSeerx (https://citeseerx.ist.psu.edu/)
`CiteSeerX (https://github.com/SeerLabs/CiteSeerX) on GitHub
`SeerSuite (https://sourceforge.net/projects/citeseerx/) on SourceForge.net (historic)
`
`Retrieved from "https://en.wikipedia.org/w/index.php?title=CiteSeerX&oldid=1001491063"
`
`This page was last edited on 19 January 2021, at 22:49 (UTC).
`Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the
`Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
`
`https://en.wikipedia.org/wiki/CiteSeerX
`
`4/4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket