throbber
Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 1 of 22 PageID #: 574
`
`IN THE UNITED STATES DISTRICT COURT
`FOR THE DISTRICT OF DELAWARE
`
`C.A. No. 23-816-MN
`
`JURY TRIAL DEMANDED
`
`)))))))))
`
`
`
`FRIENDLIAI INC.,
`
`Plaintiff,
`
`v.
`
`HUGGING FACE, INC.,
`
`Defendant.
`
`PLAINTIFF’S FIRST AMENDED COMPLAINT
`FOR PATENT INFRINGEMENT
`
`Plaintiff FriendliAI Inc. (“FriendliAI”), for its First Amended Complaint against
`
`Defendant Hugging Face, Inc. (“Hugging Face” or “Defendant”), hereby alleges as follows:
`
`NATURE OF THE ACTION
`
`1.
`
`This is a civil action arising under the Patent Laws of the United States, 35 U.S.C.
`
`§§ 271 et seq., for infringement of United States Patent No. 11,442,775 (the “’775 patent”) and
`
`United States Patent No. 11,836,520 (the “’520 patent”) (collectively, the “Patents-in-Suit”)
`
`relating to artificial intelligence technology, specifically for machine-learning transformer neural
`
`network models.
`
`2.
`
`Artificial intelligence, or AI, is a field of computer science that focuses on
`
`creating intelligent machines capable of performing tasks that typically require human
`
`intelligence. Using algorithms, data, and computational power, AI can understand, process, and
`
`generate human language; analyze and interpret data; and make decisions or take actions to
`
`achieve specific goals. AI has a wide range of applications across various industries, including
`
`healthcare, finance, transportation, manufacturing, cybersecurity, gaming, and entertainment.
`
`The global AI market size is projected to grow at a Compound Annual Growth Rate (CAGR) of
`
`Petitioner, EX1007
`IPR2024-01234
`Hugging Face, Inc., v. FriendliAI Inc.
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 2 of 22 PageID #: 575
`
`36.8%, reaching $1,345.2 billion by 2030 from $150.2 billion in 2023. See
`
`https://www.marketsandmarkets.com/PressReleases/artificial-intelligence.asp.
`
`3.
`
`FriendliAI is a pioneering AI company that focuses on large-scale generative AI
`
`technology. Following substantial research, FriendliAI developed a serving engine for large-scale
`
`generative AI models that optimizes efficiency, throughput and latency. FriendliAI’s novel
`
`serving engine can be used for a variety of generative AI tasks, including data generation,
`
`translation, sentiment analysis, text summarization, auto-correction and the like. Such efforts by
`
`FriendliAI, and its founder and CEO Dr. Byung-gon Chun, have resulted in the issuance of
`
`multiple patents, including the Patents-in-Suit.
`
`THE PARTIES
`
`4.
`
`Plaintiff FriendliAI is a company organized and existing under the laws of the
`
`Republic of Korea, with its principal place of business at 5F, AMC Tower, 222 Bongeunsa-ro,
`
`Gangnam-gu, Seoul, 06135, Korea.
`
`5.
`
`Defendant Hugging Face, Inc. is a company organized and existing under the laws
`
`of the State of Delaware, with its principal place of business at 20 Jay St, Ste 620, Brooklyn, New
`
`York, 11201.
`
`JURISDICTION AND VENUE
`
`6.
`
`This action arises under the patent laws of the United States, including 35 U.S.C.
`
`§§ 271 et seq. The jurisdiction of this Court over the subject matter of this action is proper under
`
`28 U.S.C. §§ 1331 and 1338(a).
`
`7.
`
`Venue is proper in this District pursuant to 28 U.S.C. §§ 1391(b), (c), and 1400(b).
`
`Defendant is an entity organized under the laws of Delaware and resides in Delaware for purposes
`
`of venue under 28 U.S.C. § 1400(b). Defendant conducts business in Delaware, at least by
`
`offering for sale, selling, and otherwise making available products and services through its website,
`
`2
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 3 of 22 PageID #: 576
`
`which are accessible in Delaware. Defendant has also committed and continues to commit acts of
`
`infringement in this District.
`
`8.
`
`This Court has personal jurisdiction over Defendant because Defendant conducts
`
`business in Delaware by at least offering for sale, selling, and otherwise making available products
`
`and services through its website, which are accessible in Delaware, and because infringement has
`
`occurred and continues to occur in Delaware.
`
`9.
`
`Personal jurisdiction also exists over Defendant because it is an entity incorporated
`
`in and organized under the law of Delaware.
`
`BACKGROUND OF THE PATENTED TECHNOLOGY
`
`10.
`
`The founder and CEO of FriendliAI is Dr. Byung-gon Chun, a computer scientist
`
`and Professor known for his contributions to the field of computer systems, distributed computing,
`
`and artificial intelligence. Dr. Chun received his Ph.D. in Computer Science from the University
`
`of California, Berkeley. He is currently a Professor in the Computer Science and Engineering
`
`(CSE) Department at Seoul National University (SNU), where he leads the Software Platform Lab
`
`(SPL), conducts research on machine learning systems, and teaches courses, including courses on
`
`Artificial Intelligence and Big Data Systems. Dr. Chun has published numerous papers in
`
`reputable conferences and journals, and has received a number of awards, including the EuroSys
`
`2021 Test of Time Award, the 2020 ACM SIGOPS Hall of Fame Award, the 2020 Google
`
`Research Award, the 2019 SNU Education Award, and the 2018 Amazon Machine Learning
`
`Research Award. Dr. Chun’s work has contributed to advancements in the design and optimization
`
`of the performance, scalability, and reliability of large-scale distributed systems, including serving
`
`and training transformer-based generative AI models.
`
`3
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 4 of 22 PageID #: 577
`
`11.
`
`The vision of FriendliAI is to enable innovation by lowering barriers to serving
`
`generative AI. In furtherance of that vision, FriendliAI developed PeriFlow1 (a version of a
`
`distributed serving system called Orca), a patented solution that efficiently serves large-scale AI
`
`transformer models. PeriFlow/Orca uses a novel optimization technique referred to as batching
`
`with iteration-level scheduling, also known as dynamic batching or continuous batching, which
`
`provides for improved throughput and decreased latency as compared to prior art systems.
`
`12.
`
`Before FriendliAI’s inventions, transformer models had begun to be widely used
`
`for AI applications. Pre-existing transformer models, however, suffered from latency and
`
`throughput issues, especially when used for large-scale applications. Dr. Chun and his colleagues
`
`at FriendliAI and Seoul National University recognized that such issues resulted from the use of a
`
`blunt scheduling mechanism, which was primarily designed to schedule executions at request
`
`granularity. In existing models, a form of “batching,” referred to as naive batching or static
`
`batching, could be used to process multiple requests at once in order to increase overall throughput.
`
`But because requests can vary in complexity and length, particularly in generative AI applications,
`
`and because the transformer generation model only returns the execution results to the serving
`
`system when it finishes processing all requests in the batch, a request that “finishes” early cannot
`
`be sent to the client immediately, but rather must wait until the last request in the batch is
`
`“finished,” imposing a substantial amount of extra latency (or in plain language, causing delays in
`
`processing the requests in the batch). Similarly, when a new request arrives in the middle of the
`
`current batch’s execution, the aforementioned scheduling mechanism makes the newly-arrived
`
`1 PeriFlow was recently rebranded as “Friendli,” with the PeriFlow product now named the
`“Friendli Engine” the “PeriFlow Suite” now named the “Friendli Suite,” and the “PeriFlow Cloud”
`now named “Friendli Dedicated Endpoints.”
`
`4
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 5 of 22 PageID #: 578
`
`request wait until all requests in the current batch have finished. The inflexibility of such
`
`scheduling mechanisms resulted in high latency and low overall throughput.
`
`13.
`
`Dr. Chun and his colleagues recognized that a method of scheduling the system on
`
`a finer granularity could resolve the latency and throughput issues associated with existing
`
`systems.
`
`14.
`
`After substantial research, Dr. Chun developed novel technology referred to as
`
`batching with iteration-level scheduling (also called dynamic batching or continuous batching).
`
`Iteration-level scheduling allows for a finished request to be sent to a client, and for new requests
`
`to be sent to the execution engine, before all requests in a batch are completed. Iteration-level
`
`scheduling provides for a highly efficient and scalable serving of generative AI transformer
`
`models, with optimized throughput and latency.
`
`15.
`
`Batching with iteration-level scheduling is described in a paper co-authored by Dr.
`
`Chun and others at FriendliAI, and presented in July 2022, during the Proceedings of the 16th
`
`USENIX Symposium on Operating Systems Design and Implementation. See Ex. 5 (Yu, G.-I.,
`
`Jeong, J.S., Kim, G.-W., Kim,S., and Chun, B.-G. Orca: A distributed serving system for
`
`{Transformer-Based} generative models. In 16th USENIX Symposium on Operating Systems
`
`Design and Implementation (OSDI 22), pp. 521–538, 2022).
`
`16.
`
`Dr. Chun’s paper is recognized by others in the industry as the “first” to disclose
`
`iteration-level scheduling (also known as dynamic batching or continuous batching). See Ex. 6
`
`(https://www.anyscale.com/blog/continuous-batching-llm-inference).
`
`17.
`
`The industry has also recognized the benefit of FriendliAI’s novel technology—an
`
`increase in throughput and decrease in latency—with one article detailing how “continuous
`
`batching” (which is also known in the industry as dynamic batching or batching with iteration-
`
`5
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 6 of 22 PageID #: 579
`
`level scheduling) “improves throughput by wasting fewer opportunities to schedule new requests,
`
`and improves latency by being capable of immediately injecting new requests into the compute
`
`stream.” See Ex. 6 (https://www.anyscale.com/blog/continuous-batching-llm-inference).
`
`18.
`
`The Patents-in-Suit resulted from Dr. Chun’s research to develop an innovative
`
`serving engine for generative AI transformer models.
`
`THE PATENTS-IN-SUIT
`
`19.
`
`The ’775 Patent, entitled “Dynamic Batching for Inference System for
`
`Transformer-Based Generation Tasks,” was duly and legally issued by the United States Patent
`
`and Trademark office on September 13, 2022. The inventors of the patent are Gyeongin Yu, Geon-
`
`Woo Kim, Joo Seong Jeong, Soojeong Kim and Byung-Gon Chun, and the patent is assigned to
`
`FriendliAI. A copy of the ’775 Patent is attached hereto as Exhibit 1.
`
`20.
`
`FriendliAI is the exclusive owner of all rights, title, and interest in the ’775 Patent,
`
`and has all rights to bring this suit to recover damages for any current or past infringement of the
`
`’775 Patent. See Ex. 1.
`
`21.
`
`The ’520 Patent, entitled “Dynamic Batching for Inference System for
`
`Transformer-Based Generation Tasks,” was duly and legally issued by the United States Patent
`
`and Trademark office on December 5, 2023. The inventors of the patent are Gyeongin Yu, Geon-
`
`Woo Kim, Joo Seong Jeong, Soojeong Kim and Byung-Gon Chun, and the patent is assigned to
`
`FriendliAI. A copy of the ’520 Patent is attached hereto as Exhibit 2.
`
`22.
`
`FriendliAI is the exclusive owner of all rights, title, and interest in the ’520 Patent,
`
`and has all rights to bring this suit to recover damages for any current or past infringement of the
`
`’520 Patent. See Ex. 2.
`
`6
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 7 of 22 PageID #: 580
`
`23.
`
`The Patents-in-Suit are directed to, among other things, novel methods used to
`
`process batches of transformer-based requests, which involves scheduling batches at an iteration-
`
`level.
`
`24.
`
`For example, claim 10 of the ’775 patent recites the claim language below,
`
`including claim language (bolded and italicized) directed to iteration-level scheduling:
`
`10. A non-transitory computer-readable storage medium storing computer program
`instructions executable to perform operations for dynamically executing batches of
`requests on one or more execution engines running a machine-learning transformer
`model, the operations comprising:
`
`receiving, by a serving system, one or more requests for execution, the serving system
`including a scheduler and one or more execution engines each coupled to access a
`machine-learning transformer model including at least a set of decoders;
`scheduling, by the scheduler, a batch of requests including the one or more
`requests for execution on an execution engine;
`generating, by the execution engine, a first set of output tokens by applying the
`transformer model to a first set of inputs for the batch of requests, wherein
`applying the transformer model comprises applying at least one batch operation to
`one or more input tensors associated with the batch of requests;
`receiving, by a request processor, a new request from a client device, the new request
`including a sequence of input tokens;
`scheduling, by the scheduler, a second batch of requests additionally including the
`new request for execution on the execution engine, the second batch of requests
`scheduled responsive to determining that the execution engine has memory
`available to execute the second batch of requests, wherein in a second set of
`inputs for the second batch of requests, a length of the sequence of input tokens
`for the new request is different from a length of an input for at least one request
`other than the new request; and
`generating, by the execution engine, a second set of output tokens by applying the
`transformer model to the second set of inputs for the second batch.
`
`25.
`
`For example, claim 35 of the ’520 patent recites the claim language below,
`
`including claim language (bolded and italicized) directed to iteration-level scheduling:
`
`35. A non-transitory computer-readable storage medium storing computer program
`instructions executable to perform operations for dynamically executing batches of
`requests on one or more execution engines running a machine-learning transformer
`model, the operations comprising:
`
`7
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 8 of 22 PageID #: 581
`
`receiving, by a serving system, one or more requests for execution, the serving system
`including a scheduler and one or more execution engines each coupled to access a
`machine-learning transformer model including at least a set of decoders;
`scheduling, by the scheduler, a batch of requests including the one or more
`requests for execution on an execution engine;
`generating, by the execution engine, a first set of output tokens by applying the
`transformer model to a first set of inputs for the batch of requests, wherein
`applying the transformer model comprises applying at least one batch operation to
`one or more input tensors associated with the batch of requests;
`receiving, by a request processor, a new request from a client device, the new request
`including a sequence of input tokens;
`scheduling, by the scheduler, a second batch of requests for execution on the
`execution engine, the second batch of requests including the new request and at
`least one request in the batch of requests, wherein a length of an internal state for
`the new request is different from a length of another internal state for the at least
`one request; and
`generating, by the execution engine, a second set of output tokens by applying the
`transformer model to the second set of inputs for the second batch.
`
`26.
`
`The Patents-in-Suit are directed to unconventional, non-routine techniques for
`
`executing batches of requests on an inference system using a transformer model, including for
`
`large-scale generative AI. The Patents-in-Suit explains, inter alia, that “at one or more iterations,
`
`the inference system can modify the batch being executed on the execution engine by adding new
`
`incoming requests to the batch or removing completed requests from the batch.” ’775 patent at
`
`2:62-64.
`
`27.
`
`The claimed inventions overcame problems in the field, including latency and low
`
`throughput, by batching and scheduling requests at an iteration-level (i.e., dynamically batching
`
`or continuously batching). See, e.g., ’775 Patent at 3:2-6 (patented invention allows for the
`
`“response [to] be provided to the user faster” and allows adding new requests to avoid execution
`
`engine “being under-utilized”), 5:30-32 (discussing processing request by batches “to achieve
`
`higher processor utilization”), 24:28-31 (“dynamic batching allows the serving system 435 to
`
`dynamically adjust batches that are processed on execution engines such that the hardware of the
`
`execution engine can be fully utilized.”).
`
`8
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 9 of 22 PageID #: 582
`
`28.
`
`To the extent any marking or notice was required by 35 U.S.C. § 287, FriendliAI
`
`has complied with the marking requirements of 35 U.S.C. § 287. FriendliAI provides websites,
`
`which are accessible to the public without charge and which state that PeriFlow is protected by the
`
`Patents-in-Suit. See Ex. 7 (https://friendli.ai/patents); Ex. 8 (https://periflow.ai/patents).
`
`29.
`
`FriendliAI permits the public to sign-up for and/or use an embodiment of the
`
`patented system (“PeriFlow”) via https://friendli.ai/periflow and https://periflow.ai/periflow. See
`
`https://friendli.ai/periflow; https://periflow.ai/periflow. On both websites, FriendliAI has marked
`
`the PeriFlow product by stating at the top of the page that “Our engine technology is protected by
`
`patents in the United States and Korea,” and including a link entitled “Visit patents.” By clicking
`
`this link, the user is directed to the websites associating PeriFlow—the patented article—with the
`
`Patents-in-Suit. See Ex. 7 (https://friendli.ai/patents); Ex. 8 (https://periflow.ai/patents).
`
`DEFENDANT’S INFRINGING ACTIVITIES
`
`I.
`
`THE ACCUSED FUNCTIONALITY
`
`30.
`
`Hugging Face offers an inference server for Large Language Models (“LLMs”)
`
`called Text Generation Inference (“TGI”).
`
`31.
`
`On information and belief, Hugging Face launched Text Generation Inference on
`
`or around February 2023.
`
`32.
`
`An “important feature” of Text Generation Inference is continuous batching of
`
`incoming requests. See Ex. 9 (https://github.com/huggingface/text-generation-
`
`inference/tree/main/router). Hugging Face also refers to continuous batching as “dynamic
`
`batching.” See Ex. 10 (https://huggingface.co/text-generation-inference); see also
`
`https://github.com/huggingface/text-generation-
`
`9
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 10 of 22 PageID #: 583
`
`Noinference/pull/258/commits/c6bb42286e3642261ff9bf7d376687684351c00et (changing
`
`dynamic batching’s name to continuous batching).
`
`33.
`
`Continuous, or dynamic, batching involves “regularly running queries in the same
`
`forward step of the LLM (a ‘batch’) and also removing them when they are finished.” See Ex. 11.
`
`34.
`
`According to Hugging Face, continuous batching enables “increased total
`
`throughput.” See Ex. 12 (https://github.com/huggingface/text-generation-inference#readme). It
`
`further allows for optimal balance between “exploiting the hardware and perceived latency,” as
`
`compared to previously existing inference systems with no continuous batching at all, which
`
`suffered from low throughput, and inference systems with static batching, which suffered from
`
`low
`
`latency.
`
`
`
`See
`
`Ex.
`
`9
`
`(https://github.com/huggingface/text-generation-
`
`inference/tree/main/router). See id. Inference servers with continuous batching thus allow for
`
`obtaining the “sweet spot” of optimal throughput and latency. See id.
`
`35.
`
`For further example, Defendant’s documentation explains how Text Generation
`
`Inference, which includes the continuous batching feature, enables high throughput and low
`
`latency:
`
`10
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 11 of 22 PageID #: 584
`
`Ex. 13 (https://huggingface.co/blog/inference-endpoints-llm) (emphasis added).
`
`II.
`
`DEFENDANT’S KNOWLEDGE OF THE PATENTS-IN-SUIT
`
`36.
`
`The Patents-in-Suit claim advancements in the large-scale transformer-based AI
`
`industry in which Defendant actively participates. The claimed advancements were described in
`
`a paper, titled “Orca: A distributed serving system for {Transformer-Based} generative models”
`
`(the “Orca paper”), published by Dr. Chun and his colleagues in July 2022. See Ex. 5; see also
`
`Ex. 14 (https://www.usenix.org/conference/osdi22/presentation/yu). The Orca paper identifies
`
`FriendliAI as companies associated with the authors of the Orca paper.
`
`37.
`
`FriendliAI’s website identifies its inference engine as PeriFlow, and further
`
`identifies PeriFlow as being also known as Orca. See https://www.friendli.ai/;
`
`https://medium.com/friendliai/serve-generative-ai-models-like-t5-faster-than-ever-with-orca-32-
`
`8x-faster-for-t5-3b-a6ae560cfe25.
`
`11
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 12 of 22 PageID #: 585
`
`38.
`
`FriendliAI’s press release further identifies the ’775 patent and identifies that
`
`PeriFlow (aka Orca) practices the ’775 patent. See Ex. 15
`
`(https://www.businesswire.com/news/home/20230720505202/en/FriendliAI-Launches-Public-
`
`Beta-of-PeriFlow-Cloud).
`
`39.
`
`On information and belief, Defendant has been and is aware of the Orca paper
`
`and/or the technology described in the Orca paper. On information and belief, Defendant copied
`
`the technology described in the Orca paper in developing TGI, including the continuous batching
`
`(or dynamic batching) feature.
`
`40.
`
`A third-party article describing continuous batching (also known as dynamic
`
`batching or iteration-level scheduling) asserts that the Orca paper appears to be the “first” to
`
`describe such technique, and further recognizes that Defendant uses such infringing technique in
`
`TGI. See Ex. 6 (https://www.anyscale.com/blog/continuous-batching-llm-inference). On
`
`information and belief, Defendant has been and is aware of the aforementioned third-party article.
`
`41.
`
`On or around July 21, 2023, FriendliAI’s counsel notified Defendant that Defendant
`
`is infringing the ’775 patent and specifically identified how Defendant’s TGI infringes the ’775
`
`patent. FriendliAI’s counsel further notified Defendant that FriendliAI’s PeriFlow solution (also
`
`known as Orca), practices the ’775 patent, and provided a copy of the press release explaining that
`
`PeriFlow is protected by the ’775 patent. FriendliAI’s counsel demanded that Defendant cease
`
`and desist its infringing acts.
`
`42.
`
`On or around December 15, 2023, FriendliAI’s counsel notified Defendant that a
`
`continuation application relating to the ’775 patent issued as the ’520 patent. FriendliAI’s counsel
`
`further notified Defendant that Defendant is infringing the ’520 patent and specifically identified
`
`how Defendant’s TGI infringes the ’520 patent.
`
`12
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 13 of 22 PageID #: 586
`
`43.
`
`On information and belief, Defendant is and has been aware of the Patents-in-Suit
`
`and the fact that Defendant’s TGI practices the claimed invention of the Patents-in-Suit. Despite
`
`its knowledge of the Patents-in-Suit and of its infringement of those Patents, and despite its
`
`knowledge that FriendliAI’s PeriFlow solution is protected by the Patents-in-Suit, Defendant has
`
`continued to willfully infringe the Patents-in-Suit, obtaining the significant benefits of FriendliAI’s
`
`innovation without seeking FriendliAI permission or paying any compensation to FriendliAI for
`
`access to this valuable technology. For example, Defendant has continued to use TGI, including
`
`the continuous batching (or dynamic batching) feature, without a license after receiving the notices
`
`from FriendliAI’s counsel referenced herein, even though those notices notified Defendant of its
`
`infringing acts and demanded that Defendant cease and desist.
`
`COUNT I: INFRINGEMENT OF THE ’775 PATENT
`
`FriendliAI incorporates by reference paragraphs 1-43.
`
`The ’775 Patent is valid and enforceable.
`
`Defendant has infringed, and continues to infringe, one or more claims of the ’775
`
`44.
`
`45.
`
`46.
`
`Patent under 35 U.S.C. § 271, either literally and/or under the doctrine of equivalents, by making,
`
`using, selling, and/or offering for sale by those claims, including products and services that
`
`incorporate Text-Generation Inference (the “Accused Functionality” or “TGI”). Defendant’s
`
`products and services that incorporate the Accused Functionality (“the Accused Products and
`
`Services”) include, but are not limited to the TGI toolkit distributed or otherwise made available
`
`by Hugging Face (including open source and under license), Spaces, Inference Endpoints,
`
`Enterprise Hub (formerly known as Private Hub), Hugging Face Hub, HuggingChat,
`
`OpenAssistant, and Docker Hub containers, including all past, current, and future versions of these
`
`products and services.
`
`13
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 14 of 22 PageID #: 587
`
`47.
`
`As one example, Defendant infringes one or more claims of the ’775 Patent by
`
`using TGI, which includes the feature of continuous batching, or dynamic batching:
`
`Ex. 12 (https://github.com/huggingface/text-generation-inference#features) (emphasis added).
`
`Defendant infringes one or more claims of the ’775 Patent through continuous batching or
`
`dynamic batching at least by receiving requests for execution on a machine-learning transformer
`
`model, scheduling a batch of requests for execution, generating output by applying the
`
`transformer model, receiving a new request for execution, executing the new request in a second
`
`batch where the new request is a different length than at least one other request, and generating a
`
`second set of output tokens. For example, Defendant has infringed, and continues to infringe,
`
`representative Claim 10 of the ’775 Patent through continuous batching or dynamic batching, as
`
`shown in Exhibit 3.
`
`48.
`
`Defendant has infringed, and continues to infringe, one or more claims of the ’775
`
`Patent under 35 U.S.C. § 271(a), either literally and/or under the doctrine of equivalents, by
`
`14
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 15 of 22 PageID #: 588
`
`making and/or using the Accused Products and Services. For example, Defendant makes and
`
`uses TGI to serve “the most popular Large Language Models” (“LLMs”) within their Spaces,
`
`Inference Endpoints, and Enterprise Hub products and services. See Ex. 10
`
`(https://huggingface.co/text-generation-inference). Defendant’s documentation explains that
`
`TGI is “used in production at HuggingFace to power LLMs api-inference widgets,” that are used
`
`on each of the Accused Products and Services. Ex. 12 (https://github.com/huggingface/text-
`
`generation-inference). For each of the Accused Products and Services, Defendant uses TGI on
`
`their own servers to provide services to clients and the public. See Ex. 10
`
`(https://huggingface.co/text-generation-inference). For example, Defendant makes and provides
`
`a “Chat UI” to the public that uses TGI and is hosted on Spaces, that runs Defendant’s servers.
`
`Id. For further example, Defendant also uses the Accused Functionality in publicly available
`
`products and services like HuggingChat and OpenAssistant. See
`
`https://huggingface.co/blog/sagemaker-huggingface-llm; https://huggingface.co/chat;
`
`https://open-assistant.io. For further example, Defendant also uses the Accused Functionality to
`
`provide services to customers using the Spaces, Inference Endpoints, and Enterprise Hub
`
`products and services. See, e.g., Ex. 10 (https://huggingface.co/text-generation-inference); see
`
`also https://huggingface.co/spaces; https://huggingface.co/blog/inference-endpoints-llm;
`
`https://huggingface.co/docs/inference-endpoints/main/en/others/runtime#inference-endpoints-
`
`version; https://huggingface.co/blog/introducing-private-hub. For example, Defendant’s
`
`documentation indicates that Text Generation Inference on Inference Endpoints is executed on
`
`Defendant’s servers. See https://huggingface.co/blog/inference-endpoints-llm. For further
`
`example, Defendant’s documentation indicates that Enterprise Hub allows customers to deploy
`
`15
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 16 of 22 PageID #: 589
`
`services like Inference Endpoints, which uses TGI and is, on information and belief, hosted by
`
`Defendant on Defendant’s servers. See https://huggingface.co/blog/introducing-private-hub.
`
`49.
`
`Defendant has also infringed, and continues to infringe, one or more claims of the
`
`’775 Patent under 35 U.S.C. § 271(a), either literally and/or under the doctrine of equivalents, by
`
`selling, offering for sale, and/or otherwise making available the Accused Products incorporating
`
`the Accused Functionality. For example, Defendant sells, offers for sale, and otherwise makes
`
`available the Accused Products and Services, including but not limited to Spaces, Inference
`
`Endpoints, and Enterprise Hub. See https://huggingface.co/pricing;
`
`https://huggingface.co/docs/inference-endpoints/main/en/others/runtime#inference-endpoints-
`
`version; https://huggingface.co/blog/inference-endpoints-llm;
`
`https://huggingface.co/blog/introducing-private-hub; https://huggingface.co/spaces. For further
`
`example, Defendant also makes available the Docker Hub container for customers to download
`
`and use the Accused Functionality on their own system. See Ex. 12
`
`(https://github.com/huggingface/text-generation-inference#docker).
`
`50.
`
`In addition or in the alternative, Defendant has also induced infringement, and
`
`continues to induce infringement, of one or more claims of the ’775 Patent under 35 U.S.C. §
`
`271(b). Defendant actively, knowingly, and intentionally induces infringement of the ’775 Patent
`
`by selling or otherwise making available the Accused Products and Services, with the knowledge
`
`and intent that third-party customers and users will use the Accused Products and Services,
`
`including the Accused Functionality, made available by Defendant to infringe the ’775 Patent.
`
`Defendant acts with the knowledge and intent to encourage and facilitate third-party infringement
`
`through the dissemination of the Accused Products and Services and/or the creation and
`
`16
`
`

`

`Case 1:23-cv-00816-MN Document 27 Filed 01/08/24 Page 17 of 22 PageID #: 590
`
`dissemination of supporting materials, documentation, instructions, code and/or technical
`
`information related to the Accused Products and Services, including the Accused Functionality.
`
`51.
`
`Defendant specifically intends and is aware that the ordinary and customary use
`
`of the Accused Products and Services would infringe the ’775 Patent. For example, Defendant
`
`sells and provides the Accused Products and Services, which when used in their ordinary and
`
`customary manner intended and instructed by Defendant, infringes one or more claims of the
`
`’775 Patent, including at least claim 10. On information and belief, Defendant further provides
`
`supporting materials, documentation, instructions, code and/or technical information that cause
`
`their customers and partners to operate the Accused Products and Services for their ordinary and
`
`customary use. See, e.g., Ex. 10 (https://huggingface.co/text-generation-inference/); see also
`
`https://huggingface.co/blog/falcon; https://huggingface.co/blog/inference-endpoints-llm;
`
`https://github.com/huggingface/text-generation-inference#docker. Defendant accordingly
`
`induces third parties to use the Accused Products and Services in their ordinary and customary
`
`way to infringe the ’775 Patent, knowing, or at least being willfully blind to the fact, that such
`
`use constitutes infringement of the ’775 Patent.
`
`52.
`
`In addition or in the alternative, Defendant contributes to the infringement by third
`
`parties, such as customers and users, of one or more claims of the ’775 Patent under 35 U.S.C. §
`
`271(c), by making, selling and/or offering for sale in the United States, and/or importing into the
`
`United States, Accused Products, which include the Accused Functionality, knowing that those
`
`products constitute a material part of the inventions of the ’775 Patent, knowing that those products
`
`are especially made or adapte

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket