throbber
IN THE UNITED STATES DISTRICT COURT
`FOR THE DISTRICT OF DELAWARE
`
`FRIENDLIAI INC.,
`
`Plaintiff,
`
`C.A. No. 1:23-CV-00816-MN
`
`v.
`
`HUGGING FACE, INC.,
`
`Defendant.
`
`PLAINTIFF’S FRIENDLIAI INC.’S
`FINAL INFRINGEMENT CONTENTIONS
`
`Pursuant to Section 6(e) of the Scheduling Order (D.I. 17), Plaintiff FriendliAI Inc.
`
`(“FriendliAI”) hereby submits its final infringement contentions against Hugging Face, Inc.
`
`(“Hugging Face”). These infringement contentions include claim charts that relate each known
`
`Hugging Face Accused Product to the currently asserted claims of the Asserted Patents.1
`
`The claim charts are attached to these disclosures as Exhibit A (for the ’775 patent) and
`
`Exhibit B (for the ’520 patent). Specifically, these disclosures contend that the Accused
`
`Functionality in each of the Accused Products infringe (1) at least claims 1-18 of the ’775 patent,
`
`and (2) at least claims 1-66 of the ’520 patent.2
`
`1 The Asserted Patents include U.S. Patent No. 11,442,775 (the ’775 patent) and U.S. Patent No.
`11,836,520 (the ’520 patent).
`2 The Accused Products include Text Generation Inference (TGI), and any product, service,
`platform, solutions, model, dataset, application, or toolkit that Hugging Face has made, sold,
`offered for sale, advertised, marketed, used, or imported into the United States that incorporates,
`applies, or otherwise uses Text Generation Inference (TGI), including but not limited to the TGI
`toolkit distributed or otherwise made available by Hugging Face (including open source and under
`license), Spaces, Inference Endpoints, Enterprise Hub (formerly known as Private Hub), Hugging
`Face Hub, HuggingChat, OpenAssistant, and Docker Hub containers, including all past, current,
`and future versions of these products and services, even if they are not explicitly referred to by
`
`1
`
`Petitioner, EX1013
`IPR2024-01234
`Hugging Face, Inc., v. FriendliAI Inc.
`
`

`

`
`
`Hugging Face (1) directly infringes by running TGI for its customers on its own servers;
`
`(2) directly infringes through internal testing of its products; (3) directly infringes by selling
`
`software that allegedly “performs and/or dictates and controls performance of (or automatically
`
`performs)” the claimed method steps in the hands of Hugging Face or its customers, (4) indirectly
`
`infringes by inducing customers to run TGI, and (5) contributorily infringes by selling and/or
`
`distributing a component (TGI) especially designed for use in the patented invention.
`
`FriendliAI makes these disclosures based on discovery received to date. Discovery is
`
`ongoing and Hugging Face has continuously delayed providing required discovery in this case.
`
`(See D.I. 62.) FriendliAI reserves the right to modify, supplement, amend, or otherwise alter these
`
`disclosures after receiving and reviewing the source code and pending discovery. Because
`
`FriendliAI’s infringement investigation is ongoing, FriendliAI reserves the right to supplement or
`
`amend these disclosures as appropriate under the Scheduling Order, the Federal Rules, the Local
`
`Rules, the Scheduling Order, and the District of Delaware’s Default Standard for Discovery, and
`
`in light of additional fact and/or expert discovery. In addition to the documents cited herein,
`
`FriendliAI may rely on additional documents and testimony. Should Hugging Face take the
`
`position that any limitation of any claim identified in Exhibits A and B is not present literally in
`
`the Accused Products, FriendliAI reserves the right to assert infringement under the doctrine of
`
`equivalents.
`
`FriendliAI also makes these disclosures before the parties have finalized claim construction
`
`positions, submit claim construction briefing to the Court, or participate in a Markman hearing.
`
`
`these product names, including any that may be released during the pendency of this matter
`(“Accused Products”). FriendliAI identifies the above Accused Products, but further discovery
`may be necessary to reveal additional Accused Products. FriendliAI reserves the right to modify,
`supplement, amend, or otherwise alter these disclosures based on forthcoming discovery.
`
`2
`
`

`

`
`
`FriendliAI therefore reserves the right to supplement or amend these disclosures as appropriate in
`
`light of ongoing claim construction proceedings and any claim construction ruling by the Court.
`
`
`
`Dated: July 1, 2024
`
`OF COUNSEL:
`
`Michael J. Sacksteder
`Shreyas A. Kale
`Samantha Ong
`FENWICK & WEST LLP
`555 California Street, 12th Floor
`San Francisco, CA 94104
`Telephone:
`415.875.2300
`Facsimile:
`415.281.1350
`MSacksteder@fenwick.com
`SKale@fenwick.com
`Song@fenwick.com
`
`Jessica M. Kaempf
`FENWICK & WEST LLP
`401 Union Street, 5th Floor
`Seattle, WA 98101
`Tel: (206) 389-4550
`JKaempf@fenwick.com
`
`
`
`
`
`
`
`
`Respectfully submitted,
`
`POTTER ANDERSON & CORROON LLP
`
`
`
`By: /s/ Jessica Kaempf
`David E. Moore (#3983)
`Bindu A. Palapura (#5370)
`Andrew L. Brown (#6766)
`Hercules Plaza, 6th Floor
`1313 N. Market Street
`Wilmington, DE 19801
`Tel: (302) 984-6000
`dmoore@potteranderson.com
`bpalapura@potteranderson.com
`abrown@potteranderson.com
`
` Attorneys for Plaintiff FriendliAI Inc.
`
`
`3
`
`

`

`
`
`CERTIFICATE OF SERVICE
`
`I hereby certify that on July 1, 2024, a true and correct copy of the foregoing document
`
`was served via electronic mail upon the following counsel of record:
`
`
`Stephen J. Krafschik
`Christina B. Vavala
`POLSINELLI PC
`222 Delaware Avenue, Ste 1101
`Wilmington, DE 19801
`Telephone: (302) 252-0920
`Email: skraftschik@polsinelli.com
`Email: cvavala@polsinelli.com
`
`Jason A. Wietjes
`POLSINELLI PC
`2950 N. Harwood St., Ste 2100
`Dallas, TX 75201
`Telephone: (214) 661-5519
`Email: jwietjes@polsinelli.com
`Attorneys for Defendant
`
`
`
`
`
`
`
`
`
`
`/s/ Jessica Kaempf
`Jessica Kaempf
`
`
`
`
`
`
`4
`
`

`

`
`
`EXHIBIT A
`
`FriendliAI Inc. v. Hugging Face, Inc.: Final Infringement Chart for U.S. Patent No. 11,442,775
`
`The claim chart below provides an exemplary mapping of an exemplary accused functionality, Text Generation Inference (“TGI”
`
`or “Accused Functionality”), to the claims of U.S. Patent No. 11,442,775 (“the ’775 patent”). Discovery is ongoing and this chart may
`
`be amended based on information provided on the Accused Products during discovery. This chart is not intended to provide a
`
`comprehensive explanation of infringement or identification of Accused Products and Services, which FriendliAI will provide in
`
`accordance with the Court’s rules and case schedule. Plaintiff is seeking Defendants’ internal documentation, including without
`
`limitation technical documentation, in discovery and therefore Plaintiff reserves the right to amend or further supplement this chart.
`
`Proposed claim constructions have not been finalized and claim construction briefing is still pending. The accused functionality meets
`
`each claim limitation regardless of claim construction, as further indicated below. Plaintiff, however, reserves the right to amend or
`
`further supplement this chart based on the Court’s claim construction. Source code for certain accused products has not yet been bates
`
`stamped and provided to Plaintiff, and these contentions may be amended pursuant to that source code.
`
`
`
`
`
`
`
`1
`
`
`
`

`

`
`
`U.S. Patent No.
`11,442,775
`
`Claim 1. A method of
`dynamically executing
`batches of requests on
`one or more execution
`engines running a
`machine-learning
`transformer model,
`comprising:
`
`Accused Functionality
`
`Defendant has designed and developed, and has used, sold, offered for sale, and/or otherwise made
`available (and continues to use, sell, offer for sale, and/or otherwise make available), the Accused
`Products and Services that use Text Generation Inference (“TGI” or the “Accused Functionality”). The
`Accused Functionality performs operations for dynamically executing batches of requests on one or more
`execution engines running a machine-learning transformer model. See https://huggingface.co/text-
`generation-inference.1
`
`The Accused Products include Text Generation Inference (TGI), and any product, service, platform,
`solutions, model, dataset, application, or toolkit that Hugging Face has made, sold, offered for sale,
`advertised, marketed, used, or imported into the United States that incorporates, applies, or otherwise uses
`Text Generation Inference (TGI), including but not limited to the TGI toolkit distributed or otherwise
`made available by Hugging Face (including open source and under license), Spaces, Inference Endpoints,
`Enterprise Hub (formerly known as Private Hub), Hugging Face Hub, HuggingChat, OpenAssistant, and
`Docker Hub containers, including all past, current, and future versions of these products and services,
`even if they are not explicitly referred to by these product names, including any that may be released
`during the pendency of this matter (“Accused Products”). FriendliAI identifies the above Accused
`Products, but further discovery may be necessary to reveal additional Accused Products. FriendliAI
`reserves the right to modify, supplement, amend, or otherwise alter these disclosures based on
`forthcoming discovery.
`
`Hugging Face indirectly infringes the Asserted Patents through contributory infringement and induced
`infringement. Hugging Face had knowledge of the Asserted Patents (see Amended Compl., at, e.g., ¶¶ 43,
`50, 53, 63, 66). Hugging Face contributorily infringes each of the Asserted Patents by selling and/or
`distributing a component especially designed for use in the patented invention. Specifically, Hugging
`Face sells and/or distributes TGI. Further, TGI is (at a minimum) a material component of the claimed
`invention, and is not suitable for substantial non-infringing uses. The batching efficiencies that Hugging
`Face touts in its description of TGI are at the core of many of the Hugging Face products at issue in this
`case. Hugging Face induces infringement of the Asserted Patents by actively and knowingly aiding and
`abetting Hugging Face users’ and business partners’ direct infringement. Hugging Face had knowledge of
`
`1 By charting the preamble, Plaintiff does not take any position as to whether the preamble is a limitation of the claim.
`
`
`
`2
`
`
`
`

`

`
`
`
`
`the infringement because it continued to use and promote technology infringing the Asserted Patents since
`at least July 20, 2023, when it was placed on notice that the Accused Products infringed the claims of the
`asserted patents. FriendliAI further incorporates by reference the allegations in the Amended Complaint,
`at, e.g., ¶¶ 41-42. Hugging Face provides users and business partners with instructions on how to use TGI
`in an infringing manner. Hugging Face provides tools and instructions on its website on how to use TGI
`to perform steps that result in the directly infringing use. Hugging Face also advertises or promotes the
`use of TGI. Additionally, Hugging Face is responsible for and exercises control over the manufacturing
`and development of TGI. Hugging Face has actively encouraged its customers to commit direct
`infringement of the asserted claims with knowledge that the acts encouraged constitute infringement and a
`specific intent to infringe the asserted claims.
`
`The Accused Products also indirectly infringe at least because Hugging Face provides TGI to customers
`to execute on their own servers, Hugging Face induces customers to infringe the ’775 patent claims by
`providing TGI and instructions to execute TGI on third-party systems, or Hugging Face indirectly
`infringes by providing TGI in any other fashion to be used with other software or hardware. (See
`https://github.com/huggingface/text-generation-inference).
`
`Hugging Face also indirectly infringes when it provides enterprise consulting and support to customers to
`implement TGI on either the customer’s own hardware or in their own particular environment. (See
`DEF_000298; FRIENDLIAI_00001163). In some cases, this includes Hugging Face providing custom
`TGI containers optimized to execute on the customer’s hardware or in the customer’s environment. Id.
`Hugging Face also indirectly infringes when it licenses or otherwise provides TGI to partners and
`collaborators to execute in their environment, and encourages users to use TGI in their partners’
`environment. (See DEF_360181; DEF_360159; FRIENDLIAI_00001163).
`
`3
`
`
`
`

`

`
`
`
`
`
`
`(See https://www.youtube.com/watch?v=cTyXDhiltXw; see also
`https://d1.awsstatic.com/events/Summits/reinvent2023/AIM328_Accelerate-FM-development-with-
`Amazon-SageMaker-JumpStart.pdf).
`
`Hugging Face’s Spaces product infringes the claims of the ’775 patent by incorporating TGI into its
`execution environment: DEF_347045; DEF_347048; DEF_347057; DEF_347071;
`https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/consuming_tgi (“ChatUI can
`automatically consume the TGI server and even provides an option to switch between different TGI
`endpoints. You can try it out at Hugging Chat, or use the ChatUI Docker Space to deploy your own
`Hugging Chat to Spaces.”); https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI.
`Customers are encouraged to create their own applications on Spaces (e.g., Chatbots) using TGI. (See
`
`4
`
`
`
`

`

`
`
`
`
`https://github.com/huggingface/chat-ui; https://huggingface.co/chat/; DEF_000050;
`https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/consuming_tgi).
`
`Hugging Face’s Inference Endpoints product infringes the claims of the ’775 patent by incorporating TGI
`into its execution environment: https://huggingface.co/blog/inference-endpoints-llm;
`https://huggingface.co/text-generation-inference. Hugging Face provides a custom TGI container to users
`of Inference Endpoints that allows users to easily deploy TGI onto various hardware systems. Hugging
`Face builds and deploys TGI for users of Inference Endpoints. (DEF_347031; DEF_347040;
`DEF_347045; DEF_347074; DEF_347080; DEF_347083; DEF_347085; DEF_347112-DEF_347116;
`DEF_347040).
`
`Hugging Face’s Enterprise Hub product infringes the claims of the ’775 patent by incorporating TGI into
`its execution environment:
`
`https://huggingface.co/enterprise; see https://huggingface.co/text-generation-inference.
`
`
`
`5
`
`
`
`

`

`
`
`
`
`
`(https://huggingface.co/enterprise; see also FRIENDLIAI_00001163. Enterprise Hub implements TGI
`for its customers, as evident from its source code. (DEF_30711-DEF_30722; DEF_30726; DEF_30728;
`DEF_30730- DEF_30732; DEF_30787; DEF_30797; DEF_30800). The Hugging Face Hub infringes for
`at least the same reasons as the Enterprise Hub, as the source code overlaps for the two products.
`
`Hugging Face’s HuggingChat product infringes the claims of the ’775 patent by incorporating TGI
`into its execution environment: https://huggingface.co/text-generation-inference;
`https://huggingface.co/chat/; https://huggingface.co/docs/text-generation-inference/index (“Text
`Generation Inference is used in production by multiple projects, such as: Hugging Chat, an open-source
`interface for open-access models, such as Open Assistant and Llama.”).
`
`6
`
`
`
`

`

`
`
`
`
`1[A] receiving, by a
`serving system, one or
`more requests for
`execution, the serving
`system including a
`scheduler and one or
`more execution engines
`each coupled to access
`a machine-learning
`transformer model
`including at least a set
`of decoders;
`
`Hugging Face deploys the OpenAssistant product that infringes the claims of the ’775 patent by
`incorporating TGI into its execution environment: https://huggingface.co/text-generation-inference (“Text
`Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant
`initiative implements optimization for all supported model architectures”); https://huggingface.co/chat/;
`https://huggingface.co/docs/text-generation-inference/index (“Text Generation Inference is used in
`production by multiple projects, such as: Hugging Chat, an open-source interface for open-access models,
`such as Open Assistant and Llama. OpenAssistant, an open-source community effort to train LLMs in the
`open”).
`
`Hugging Face’s Docker Hub product infringes the claims of the ’775 patent by incorporating TGI into its
`execution environment and by distributing TGI as a Docker Container: https://huggingface.co/docs/text-
`generation-inference/quicktour (“Quick Tour - The easiest way of getting started is using the official
`Docker container.”); https://huggingface.co/docs/text-generation-inference/installation (“This section
`explains how to install the CLI tool as well as installing TGI from source. The strongly recommended
`approach is to use Docker, as it does not require much setup. Check the Quick Tour to learn how to run
`TGI with Docker.”); https://github.com/huggingface/text-generation-inference.
`
`The Accused Functionality performs the step of receiving, by a serving system, one or more requests for
`execution, the serving system including a scheduler and one or more execution engines each coupled to
`access a machine-learning transformer model including at least a set of decoders.
`
`The Accused Functionality performs the step of receiving, by a serving system, one or more requests for
`execution. The Accused Functionality receives a request for execution:
`
`7
`
`
`
`

`

`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L33-L78).
`
`The Accused Functionality includes a scheduler:
`
`
`
`8
`
`
`
`

`

`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L239-L348; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L580-L589)
`
`The Accused Functionality scheduler puts the received requests into a queue:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L115-L127; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L239-L348)
`
`
`
`9
`
`
`
`

`

`
`
`
`
`The Accused Functionality includes one or more one or more execution engines, comprising one or more
`shards:
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/launcher/src/main.rs#L987-L996; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`97-L155; see also https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/cli.py#L154
`-L155).
`
`Each of the execution engines in the Accused Functionality is coupled to access a machine-learning
`transformer model including at least a set of decoders. The Accused Functionality gets a model:
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`123-L124).
`
`The machine-learning transformer models used by the Accused Functionality include at least a set of
`decoders:
`
`10
`
`
`
`

`

`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/__i
`nit__.py#L98-L276).
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a batch of requests
`including the one or more requests for execution on an execution engine.
`
`1[B] scheduling, by the
`scheduler, a batch of
`requests including the
`
`11
`
`
`
`

`

`
`
`
`
`one or more requests
`for execution on an
`execution engine;
`
`The Accused Functionality performs the scheduling of a batch of requests, including the one or more
`requests for execution on an execution engine. The Accused Functionality removes a batch of requests,
`comprising the one or more requests, from the scheduling queue, then schedules execution of the first
`iteration on the execution engine:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L260-L261).
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L263-L265).
`
`
`
`
`
`
`
`The Accused Functionality also schedules execution of further iterations on the execution engine:
`
`12
`
`
`
`

`

`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L339-L341).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1[C] generating, by the
`execution engine, a
`first set of output
`
`The Accused Functionality performs the step of generating, by the execution engine, a first set of output
`tokens by applying the transformer model to a first set of inputs for the batch of requests, wherein
`
`13
`
`
`
`

`

`tokens by applying the
`transformer model to a
`first set of inputs for
`the batch of requests,
`wherein applying the
`transformer model
`comprises applying at
`least one batch
`operation to one or
`more input tensors
`associated with the
`batch of requests;
`
`
`
`
`
`applying the transformer model comprises applying at least one batch operation to one or more input
`tensors associated with the batch of requests.
`
`The Accused Functionality first applies the transformer model to a first set of inputs for the batch of
`requests, wherein applying the transformer model comprises applying at least one batch operation to one
`or more input tensors associated with the batch of request:
`
`
`● (https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L361-L362; see
`also https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L147; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L219; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L239; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/custom_modeling/flash_llama_modeling.py#L347; https://github.com/huggingface/text-
`generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/custom_modeling/flash_llama_modeling.py#L277-L278).
`
`
`
`
`
`
`
`
`
`14
`
`
`
`

`

`
`
`
`
`The Accused Functionality then generates tokens from applying the transformer model to the set of inputs
`in the batch:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/sharded_client.rs#L98-
`L103).
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/client.rs#L107-L108).
`
`
`
`15
`
`
`
`

`

`
`
`
`
`The generated tokens are passed back to the Accused Functionality server, and another batch can be
`scheduled to execute:
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`61-L62).
`
`In order to perform these steps, TGI creates the first batch and adds the input tokens to the batch. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`57-L59; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L93-L99; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L147; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L219; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L239.
`
`The Accused Functionality performs at least one batch operation on the input tensors. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L347; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L277-L278; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L140).
`
`
`16
`
`
`
`

`

`
`
`
`
`
`
`1[D] receiving, by a
`request processor, a
`new request from a
`client device, the new
`request including a
`sequence of input
`tokens;
`
`The Accused Functionality performs the step of receiving, by a request processor, a new request from a
`client device, the new request including a sequence of input tokens.Defendant has proposed a construction
`of “a new request” that means “a request different from each of the one or more requests of the [first]
`batch of requests.” Plaintiff disagrees with this construction, or the need to construe this term.
`Regardless, the accused functionality infringes under the plain meaning or under Defendant’s proposed
`construction.
`
`The Accused Functionality receives, by a request processor, a new request from a client device, the new
`request including a sequence of input token:
`
`17
`
`
`
`

`

`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L33-L78).
`
`The Accused Functionality then tokenizes the input text into a sequence of input tokens:
`
`
`
`18
`
`
`
`

`

`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L97-L99).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1[E] scheduling, by the
`scheduler, a second
`batch of requests
`additionally including
`the new request for
`execution on the
`execution engine, the
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a second batch of requests
`additionally including the new request for execution on the execution engine, the second batch of requests
`scheduled responsive to determining that the execution engine has memory available to execute the
`second batch of requests, wherein in a second set of inputs for the second batch of requests, a length of
`the sequence of input tokens for the new request is different from a length of an input for at least one
`request other than the new request.
`
`19
`
`
`
`

`

`
`
`
`
`second batch of
`requests scheduled
`responsive to
`determining that the
`execution engine has
`memory available to
`execute the second
`batch of requests,
`wherein in a second set
`of inputs for the second
`batch of requests, a
`length of the sequence
`of input tokens for the
`new request is different
`from a length of an
`input for at least one
`request other than the
`new request; and
`
`Defendant has proposed a construction of “second batch of requests” that means “a batch that includes at
`least one request from the [first] batch of requests.” Plaintiff disagrees with this construction, or the need
`to construe this term. Regardless, the accused functionality infringes under the plain meaning or under
`Defendant’s proposed construction.
`
`Defendant has proposed a construction of “a length of the sequence of input tokens for the new request is
`different from a length of an input for [the] at least one request [other than the new request]” where
`“‘length’ refers to a ‘number or quantity of discrete <units>” for a given context— e.g., the number of
`“tokens” in a sequence, such as “a [number or quantity of discrete tokens] of the sequence of input tokens
`for the new request is different from a [number or quantity of discrete tokens] of an input for at least one
`request.’” Plaintiff disagrees with this construction, or the need to construe this term. Regardless, the
`accused functionality infringes under the plain meaning or under Defendant’s proposed construction.
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a second batch of requests
`additionally including the new request for execution on the execution engine. The Accused Functionality
`receives a new set of requests including the new request:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L289-L291).
`
`The Accused Functionality merges the new set of requests into the previous, first batch of requests:
`
`
`
`
`
`20
`
`
`
`

`

`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L318-L322).
`
`The Accused Functionality then schedules a second batch of requests that also includes the new request
`for execution:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L339-L341).
`
`
`
`
`
`The Accused Functionality schedules the second batch of requests responsive to determining that the
`execution engine has memory available to execute the second batch of requests:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L287-L291).
`
`
`
`21
`
`
`
`

`

`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/queue.rs#L187-L192).
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/launcher/src/main.rs#L156-L162).
`
`The Accused Functionality performs no check to validate that the inputs for the second batch of requests
`are the same length, and implies that requests in a batch can have an arbitrary length. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/queue.rs;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs).
`
`TGI also confirms this behavior by explaining that it performs continuous batching, which allows for new
`requests, of different length than the first request, to be processed in the same batch:
`
`22
`
`
`
`

`

`
`
`
`
`
`
`(https://huggingface.co/blog/falcon; see also https://github.com/huggingface/text-generation-
`inference/tree/main/router; https://www.anyscale.com/blog/continuous-batching-llm-inference.)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`23
`
`
`
`

`

`
`
`
`
`1[F] generating, by the
`execution engine, a
`second set of output
`tokens by applying the
`transformer model to
`the second set of inputs
`for the second batch.
`
`The Accused Functionality performs the step of generating, by the execution engine, a second set of
`output tokens by applying the transformer model to the second set of inputs for the second batch.
`
`The Accused Functionality performs this step in the same fashion as described above, with respect to
`Claim 1[C]. (See https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L361-L362;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/sharded_client.rs#L98-
`L103; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/client.rs#L107-L108;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`61-L62; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L634-L646; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L602; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L648-L657; https://github.com/huggingface/text-generati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket