`FOR THE DISTRICT OF DELAWARE
`
`FRIENDLIAI INC.,
`
`Plaintiff,
`
`C.A. No. 1:23-CV-00816-MN
`
`v.
`
`HUGGING FACE, INC.,
`
`Defendant.
`
`PLAINTIFF’S FRIENDLIAI INC.’S
`FINAL INFRINGEMENT CONTENTIONS
`
`Pursuant to Section 6(e) of the Scheduling Order (D.I. 17), Plaintiff FriendliAI Inc.
`
`(“FriendliAI”) hereby submits its final infringement contentions against Hugging Face, Inc.
`
`(“Hugging Face”). These infringement contentions include claim charts that relate each known
`
`Hugging Face Accused Product to the currently asserted claims of the Asserted Patents.1
`
`The claim charts are attached to these disclosures as Exhibit A (for the ’775 patent) and
`
`Exhibit B (for the ’520 patent). Specifically, these disclosures contend that the Accused
`
`Functionality in each of the Accused Products infringe (1) at least claims 1-18 of the ’775 patent,
`
`and (2) at least claims 1-66 of the ’520 patent.2
`
`1 The Asserted Patents include U.S. Patent No. 11,442,775 (the ’775 patent) and U.S. Patent No.
`11,836,520 (the ’520 patent).
`2 The Accused Products include Text Generation Inference (TGI), and any product, service,
`platform, solutions, model, dataset, application, or toolkit that Hugging Face has made, sold,
`offered for sale, advertised, marketed, used, or imported into the United States that incorporates,
`applies, or otherwise uses Text Generation Inference (TGI), including but not limited to the TGI
`toolkit distributed or otherwise made available by Hugging Face (including open source and under
`license), Spaces, Inference Endpoints, Enterprise Hub (formerly known as Private Hub), Hugging
`Face Hub, HuggingChat, OpenAssistant, and Docker Hub containers, including all past, current,
`and future versions of these products and services, even if they are not explicitly referred to by
`
`1
`
`Petitioner, EX1013
`IPR2024-01234
`Hugging Face, Inc., v. FriendliAI Inc.
`
`
`
`
`
`Hugging Face (1) directly infringes by running TGI for its customers on its own servers;
`
`(2) directly infringes through internal testing of its products; (3) directly infringes by selling
`
`software that allegedly “performs and/or dictates and controls performance of (or automatically
`
`performs)” the claimed method steps in the hands of Hugging Face or its customers, (4) indirectly
`
`infringes by inducing customers to run TGI, and (5) contributorily infringes by selling and/or
`
`distributing a component (TGI) especially designed for use in the patented invention.
`
`FriendliAI makes these disclosures based on discovery received to date. Discovery is
`
`ongoing and Hugging Face has continuously delayed providing required discovery in this case.
`
`(See D.I. 62.) FriendliAI reserves the right to modify, supplement, amend, or otherwise alter these
`
`disclosures after receiving and reviewing the source code and pending discovery. Because
`
`FriendliAI’s infringement investigation is ongoing, FriendliAI reserves the right to supplement or
`
`amend these disclosures as appropriate under the Scheduling Order, the Federal Rules, the Local
`
`Rules, the Scheduling Order, and the District of Delaware’s Default Standard for Discovery, and
`
`in light of additional fact and/or expert discovery. In addition to the documents cited herein,
`
`FriendliAI may rely on additional documents and testimony. Should Hugging Face take the
`
`position that any limitation of any claim identified in Exhibits A and B is not present literally in
`
`the Accused Products, FriendliAI reserves the right to assert infringement under the doctrine of
`
`equivalents.
`
`FriendliAI also makes these disclosures before the parties have finalized claim construction
`
`positions, submit claim construction briefing to the Court, or participate in a Markman hearing.
`
`
`these product names, including any that may be released during the pendency of this matter
`(“Accused Products”). FriendliAI identifies the above Accused Products, but further discovery
`may be necessary to reveal additional Accused Products. FriendliAI reserves the right to modify,
`supplement, amend, or otherwise alter these disclosures based on forthcoming discovery.
`
`2
`
`
`
`
`
`FriendliAI therefore reserves the right to supplement or amend these disclosures as appropriate in
`
`light of ongoing claim construction proceedings and any claim construction ruling by the Court.
`
`
`
`Dated: July 1, 2024
`
`OF COUNSEL:
`
`Michael J. Sacksteder
`Shreyas A. Kale
`Samantha Ong
`FENWICK & WEST LLP
`555 California Street, 12th Floor
`San Francisco, CA 94104
`Telephone:
`415.875.2300
`Facsimile:
`415.281.1350
`MSacksteder@fenwick.com
`SKale@fenwick.com
`Song@fenwick.com
`
`Jessica M. Kaempf
`FENWICK & WEST LLP
`401 Union Street, 5th Floor
`Seattle, WA 98101
`Tel: (206) 389-4550
`JKaempf@fenwick.com
`
`
`
`
`
`
`
`
`Respectfully submitted,
`
`POTTER ANDERSON & CORROON LLP
`
`
`
`By: /s/ Jessica Kaempf
`David E. Moore (#3983)
`Bindu A. Palapura (#5370)
`Andrew L. Brown (#6766)
`Hercules Plaza, 6th Floor
`1313 N. Market Street
`Wilmington, DE 19801
`Tel: (302) 984-6000
`dmoore@potteranderson.com
`bpalapura@potteranderson.com
`abrown@potteranderson.com
`
` Attorneys for Plaintiff FriendliAI Inc.
`
`
`3
`
`
`
`
`
`CERTIFICATE OF SERVICE
`
`I hereby certify that on July 1, 2024, a true and correct copy of the foregoing document
`
`was served via electronic mail upon the following counsel of record:
`
`
`Stephen J. Krafschik
`Christina B. Vavala
`POLSINELLI PC
`222 Delaware Avenue, Ste 1101
`Wilmington, DE 19801
`Telephone: (302) 252-0920
`Email: skraftschik@polsinelli.com
`Email: cvavala@polsinelli.com
`
`Jason A. Wietjes
`POLSINELLI PC
`2950 N. Harwood St., Ste 2100
`Dallas, TX 75201
`Telephone: (214) 661-5519
`Email: jwietjes@polsinelli.com
`Attorneys for Defendant
`
`
`
`
`
`
`
`
`
`
`/s/ Jessica Kaempf
`Jessica Kaempf
`
`
`
`
`
`
`4
`
`
`
`
`
`EXHIBIT A
`
`FriendliAI Inc. v. Hugging Face, Inc.: Final Infringement Chart for U.S. Patent No. 11,442,775
`
`The claim chart below provides an exemplary mapping of an exemplary accused functionality, Text Generation Inference (“TGI”
`
`or “Accused Functionality”), to the claims of U.S. Patent No. 11,442,775 (“the ’775 patent”). Discovery is ongoing and this chart may
`
`be amended based on information provided on the Accused Products during discovery. This chart is not intended to provide a
`
`comprehensive explanation of infringement or identification of Accused Products and Services, which FriendliAI will provide in
`
`accordance with the Court’s rules and case schedule. Plaintiff is seeking Defendants’ internal documentation, including without
`
`limitation technical documentation, in discovery and therefore Plaintiff reserves the right to amend or further supplement this chart.
`
`Proposed claim constructions have not been finalized and claim construction briefing is still pending. The accused functionality meets
`
`each claim limitation regardless of claim construction, as further indicated below. Plaintiff, however, reserves the right to amend or
`
`further supplement this chart based on the Court’s claim construction. Source code for certain accused products has not yet been bates
`
`stamped and provided to Plaintiff, and these contentions may be amended pursuant to that source code.
`
`
`
`
`
`
`
`1
`
`
`
`
`
`
`
`U.S. Patent No.
`11,442,775
`
`Claim 1. A method of
`dynamically executing
`batches of requests on
`one or more execution
`engines running a
`machine-learning
`transformer model,
`comprising:
`
`Accused Functionality
`
`Defendant has designed and developed, and has used, sold, offered for sale, and/or otherwise made
`available (and continues to use, sell, offer for sale, and/or otherwise make available), the Accused
`Products and Services that use Text Generation Inference (“TGI” or the “Accused Functionality”). The
`Accused Functionality performs operations for dynamically executing batches of requests on one or more
`execution engines running a machine-learning transformer model. See https://huggingface.co/text-
`generation-inference.1
`
`The Accused Products include Text Generation Inference (TGI), and any product, service, platform,
`solutions, model, dataset, application, or toolkit that Hugging Face has made, sold, offered for sale,
`advertised, marketed, used, or imported into the United States that incorporates, applies, or otherwise uses
`Text Generation Inference (TGI), including but not limited to the TGI toolkit distributed or otherwise
`made available by Hugging Face (including open source and under license), Spaces, Inference Endpoints,
`Enterprise Hub (formerly known as Private Hub), Hugging Face Hub, HuggingChat, OpenAssistant, and
`Docker Hub containers, including all past, current, and future versions of these products and services,
`even if they are not explicitly referred to by these product names, including any that may be released
`during the pendency of this matter (“Accused Products”). FriendliAI identifies the above Accused
`Products, but further discovery may be necessary to reveal additional Accused Products. FriendliAI
`reserves the right to modify, supplement, amend, or otherwise alter these disclosures based on
`forthcoming discovery.
`
`Hugging Face indirectly infringes the Asserted Patents through contributory infringement and induced
`infringement. Hugging Face had knowledge of the Asserted Patents (see Amended Compl., at, e.g., ¶¶ 43,
`50, 53, 63, 66). Hugging Face contributorily infringes each of the Asserted Patents by selling and/or
`distributing a component especially designed for use in the patented invention. Specifically, Hugging
`Face sells and/or distributes TGI. Further, TGI is (at a minimum) a material component of the claimed
`invention, and is not suitable for substantial non-infringing uses. The batching efficiencies that Hugging
`Face touts in its description of TGI are at the core of many of the Hugging Face products at issue in this
`case. Hugging Face induces infringement of the Asserted Patents by actively and knowingly aiding and
`abetting Hugging Face users’ and business partners’ direct infringement. Hugging Face had knowledge of
`
`1 By charting the preamble, Plaintiff does not take any position as to whether the preamble is a limitation of the claim.
`
`
`
`2
`
`
`
`
`
`
`
`
`
`the infringement because it continued to use and promote technology infringing the Asserted Patents since
`at least July 20, 2023, when it was placed on notice that the Accused Products infringed the claims of the
`asserted patents. FriendliAI further incorporates by reference the allegations in the Amended Complaint,
`at, e.g., ¶¶ 41-42. Hugging Face provides users and business partners with instructions on how to use TGI
`in an infringing manner. Hugging Face provides tools and instructions on its website on how to use TGI
`to perform steps that result in the directly infringing use. Hugging Face also advertises or promotes the
`use of TGI. Additionally, Hugging Face is responsible for and exercises control over the manufacturing
`and development of TGI. Hugging Face has actively encouraged its customers to commit direct
`infringement of the asserted claims with knowledge that the acts encouraged constitute infringement and a
`specific intent to infringe the asserted claims.
`
`The Accused Products also indirectly infringe at least because Hugging Face provides TGI to customers
`to execute on their own servers, Hugging Face induces customers to infringe the ’775 patent claims by
`providing TGI and instructions to execute TGI on third-party systems, or Hugging Face indirectly
`infringes by providing TGI in any other fashion to be used with other software or hardware. (See
`https://github.com/huggingface/text-generation-inference).
`
`Hugging Face also indirectly infringes when it provides enterprise consulting and support to customers to
`implement TGI on either the customer’s own hardware or in their own particular environment. (See
`DEF_000298; FRIENDLIAI_00001163). In some cases, this includes Hugging Face providing custom
`TGI containers optimized to execute on the customer’s hardware or in the customer’s environment. Id.
`Hugging Face also indirectly infringes when it licenses or otherwise provides TGI to partners and
`collaborators to execute in their environment, and encourages users to use TGI in their partners’
`environment. (See DEF_360181; DEF_360159; FRIENDLIAI_00001163).
`
`3
`
`
`
`
`
`
`
`
`
`
`
`(See https://www.youtube.com/watch?v=cTyXDhiltXw; see also
`https://d1.awsstatic.com/events/Summits/reinvent2023/AIM328_Accelerate-FM-development-with-
`Amazon-SageMaker-JumpStart.pdf).
`
`Hugging Face’s Spaces product infringes the claims of the ’775 patent by incorporating TGI into its
`execution environment: DEF_347045; DEF_347048; DEF_347057; DEF_347071;
`https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/consuming_tgi (“ChatUI can
`automatically consume the TGI server and even provides an option to switch between different TGI
`endpoints. You can try it out at Hugging Chat, or use the ChatUI Docker Space to deploy your own
`Hugging Chat to Spaces.”); https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI.
`Customers are encouraged to create their own applications on Spaces (e.g., Chatbots) using TGI. (See
`
`4
`
`
`
`
`
`
`
`
`
`https://github.com/huggingface/chat-ui; https://huggingface.co/chat/; DEF_000050;
`https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/consuming_tgi).
`
`Hugging Face’s Inference Endpoints product infringes the claims of the ’775 patent by incorporating TGI
`into its execution environment: https://huggingface.co/blog/inference-endpoints-llm;
`https://huggingface.co/text-generation-inference. Hugging Face provides a custom TGI container to users
`of Inference Endpoints that allows users to easily deploy TGI onto various hardware systems. Hugging
`Face builds and deploys TGI for users of Inference Endpoints. (DEF_347031; DEF_347040;
`DEF_347045; DEF_347074; DEF_347080; DEF_347083; DEF_347085; DEF_347112-DEF_347116;
`DEF_347040).
`
`Hugging Face’s Enterprise Hub product infringes the claims of the ’775 patent by incorporating TGI into
`its execution environment:
`
`https://huggingface.co/enterprise; see https://huggingface.co/text-generation-inference.
`
`
`
`5
`
`
`
`
`
`
`
`
`
`
`(https://huggingface.co/enterprise; see also FRIENDLIAI_00001163. Enterprise Hub implements TGI
`for its customers, as evident from its source code. (DEF_30711-DEF_30722; DEF_30726; DEF_30728;
`DEF_30730- DEF_30732; DEF_30787; DEF_30797; DEF_30800). The Hugging Face Hub infringes for
`at least the same reasons as the Enterprise Hub, as the source code overlaps for the two products.
`
`Hugging Face’s HuggingChat product infringes the claims of the ’775 patent by incorporating TGI
`into its execution environment: https://huggingface.co/text-generation-inference;
`https://huggingface.co/chat/; https://huggingface.co/docs/text-generation-inference/index (“Text
`Generation Inference is used in production by multiple projects, such as: Hugging Chat, an open-source
`interface for open-access models, such as Open Assistant and Llama.”).
`
`6
`
`
`
`
`
`
`
`
`
`1[A] receiving, by a
`serving system, one or
`more requests for
`execution, the serving
`system including a
`scheduler and one or
`more execution engines
`each coupled to access
`a machine-learning
`transformer model
`including at least a set
`of decoders;
`
`Hugging Face deploys the OpenAssistant product that infringes the claims of the ’775 patent by
`incorporating TGI into its execution environment: https://huggingface.co/text-generation-inference (“Text
`Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant
`initiative implements optimization for all supported model architectures”); https://huggingface.co/chat/;
`https://huggingface.co/docs/text-generation-inference/index (“Text Generation Inference is used in
`production by multiple projects, such as: Hugging Chat, an open-source interface for open-access models,
`such as Open Assistant and Llama. OpenAssistant, an open-source community effort to train LLMs in the
`open”).
`
`Hugging Face’s Docker Hub product infringes the claims of the ’775 patent by incorporating TGI into its
`execution environment and by distributing TGI as a Docker Container: https://huggingface.co/docs/text-
`generation-inference/quicktour (“Quick Tour - The easiest way of getting started is using the official
`Docker container.”); https://huggingface.co/docs/text-generation-inference/installation (“This section
`explains how to install the CLI tool as well as installing TGI from source. The strongly recommended
`approach is to use Docker, as it does not require much setup. Check the Quick Tour to learn how to run
`TGI with Docker.”); https://github.com/huggingface/text-generation-inference.
`
`The Accused Functionality performs the step of receiving, by a serving system, one or more requests for
`execution, the serving system including a scheduler and one or more execution engines each coupled to
`access a machine-learning transformer model including at least a set of decoders.
`
`The Accused Functionality performs the step of receiving, by a serving system, one or more requests for
`execution. The Accused Functionality receives a request for execution:
`
`7
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L33-L78).
`
`The Accused Functionality includes a scheduler:
`
`
`
`8
`
`
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L239-L348; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L580-L589)
`
`The Accused Functionality scheduler puts the received requests into a queue:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L115-L127; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L239-L348)
`
`
`
`9
`
`
`
`
`
`
`
`
`
`The Accused Functionality includes one or more one or more execution engines, comprising one or more
`shards:
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/launcher/src/main.rs#L987-L996; see also
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`97-L155; see also https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/cli.py#L154
`-L155).
`
`Each of the execution engines in the Accused Functionality is coupled to access a machine-learning
`transformer model including at least a set of decoders. The Accused Functionality gets a model:
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`123-L124).
`
`The machine-learning transformer models used by the Accused Functionality include at least a set of
`decoders:
`
`10
`
`
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/__i
`nit__.py#L98-L276).
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a batch of requests
`including the one or more requests for execution on an execution engine.
`
`1[B] scheduling, by the
`scheduler, a batch of
`requests including the
`
`11
`
`
`
`
`
`
`
`
`
`one or more requests
`for execution on an
`execution engine;
`
`The Accused Functionality performs the scheduling of a batch of requests, including the one or more
`requests for execution on an execution engine. The Accused Functionality removes a batch of requests,
`comprising the one or more requests, from the scheduling queue, then schedules execution of the first
`iteration on the execution engine:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L260-L261).
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L263-L265).
`
`
`
`
`
`
`
`The Accused Functionality also schedules execution of further iterations on the execution engine:
`
`12
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L339-L341).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1[C] generating, by the
`execution engine, a
`first set of output
`
`The Accused Functionality performs the step of generating, by the execution engine, a first set of output
`tokens by applying the transformer model to a first set of inputs for the batch of requests, wherein
`
`13
`
`
`
`
`
`tokens by applying the
`transformer model to a
`first set of inputs for
`the batch of requests,
`wherein applying the
`transformer model
`comprises applying at
`least one batch
`operation to one or
`more input tensors
`associated with the
`batch of requests;
`
`
`
`
`
`applying the transformer model comprises applying at least one batch operation to one or more input
`tensors associated with the batch of requests.
`
`The Accused Functionality first applies the transformer model to a first set of inputs for the batch of
`requests, wherein applying the transformer model comprises applying at least one batch operation to one
`or more input tensors associated with the batch of request:
`
`
`● (https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L361-L362; see
`also https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L147; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L219; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/flash_causal_lm.py#L239; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/custom_modeling/flash_llama_modeling.py#L347; https://github.com/huggingface/text-
`generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/mod
`els/custom_modeling/flash_llama_modeling.py#L277-L278).
`
`
`
`
`
`
`
`
`
`14
`
`
`
`
`
`
`
`
`
`The Accused Functionality then generates tokens from applying the transformer model to the set of inputs
`in the batch:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/sharded_client.rs#L98-
`L103).
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/client.rs#L107-L108).
`
`
`
`15
`
`
`
`
`
`
`
`
`
`The generated tokens are passed back to the Accused Functionality server, and another batch can be
`scheduled to execute:
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`61-L62).
`
`In order to perform these steps, TGI creates the first batch and adds the input tokens to the batch. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`57-L59; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L93-L99; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L147; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L219; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L239.
`
`The Accused Functionality performs at least one batch operation on the input tensors. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L347; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L277-L278; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/cust
`om_modeling/flash_llama_modeling.py#L140).
`
`
`16
`
`
`
`
`
`
`
`
`
`
`
`1[D] receiving, by a
`request processor, a
`new request from a
`client device, the new
`request including a
`sequence of input
`tokens;
`
`The Accused Functionality performs the step of receiving, by a request processor, a new request from a
`client device, the new request including a sequence of input tokens.Defendant has proposed a construction
`of “a new request” that means “a request different from each of the one or more requests of the [first]
`batch of requests.” Plaintiff disagrees with this construction, or the need to construe this term.
`Regardless, the accused functionality infringes under the plain meaning or under Defendant’s proposed
`construction.
`
`The Accused Functionality receives, by a request processor, a new request from a client device, the new
`request including a sequence of input token:
`
`17
`
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/server.rs#L33-L78).
`
`The Accused Functionality then tokenizes the input text into a sequence of input tokens:
`
`
`
`18
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L97-L99).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1[E] scheduling, by the
`scheduler, a second
`batch of requests
`additionally including
`the new request for
`execution on the
`execution engine, the
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a second batch of requests
`additionally including the new request for execution on the execution engine, the second batch of requests
`scheduled responsive to determining that the execution engine has memory available to execute the
`second batch of requests, wherein in a second set of inputs for the second batch of requests, a length of
`the sequence of input tokens for the new request is different from a length of an input for at least one
`request other than the new request.
`
`19
`
`
`
`
`
`
`
`
`
`second batch of
`requests scheduled
`responsive to
`determining that the
`execution engine has
`memory available to
`execute the second
`batch of requests,
`wherein in a second set
`of inputs for the second
`batch of requests, a
`length of the sequence
`of input tokens for the
`new request is different
`from a length of an
`input for at least one
`request other than the
`new request; and
`
`Defendant has proposed a construction of “second batch of requests” that means “a batch that includes at
`least one request from the [first] batch of requests.” Plaintiff disagrees with this construction, or the need
`to construe this term. Regardless, the accused functionality infringes under the plain meaning or under
`Defendant’s proposed construction.
`
`Defendant has proposed a construction of “a length of the sequence of input tokens for the new request is
`different from a length of an input for [the] at least one request [other than the new request]” where
`“‘length’ refers to a ‘number or quantity of discrete <units>” for a given context— e.g., the number of
`“tokens” in a sequence, such as “a [number or quantity of discrete tokens] of the sequence of input tokens
`for the new request is different from a [number or quantity of discrete tokens] of an input for at least one
`request.’” Plaintiff disagrees with this construction, or the need to construe this term. Regardless, the
`accused functionality infringes under the plain meaning or under Defendant’s proposed construction.
`
`The Accused Functionality performs the step of scheduling, by the scheduler, a second batch of requests
`additionally including the new request for execution on the execution engine. The Accused Functionality
`receives a new set of requests including the new request:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L289-L291).
`
`The Accused Functionality merges the new set of requests into the previous, first batch of requests:
`
`
`
`
`
`20
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L318-L322).
`
`The Accused Functionality then schedules a second batch of requests that also includes the new request
`for execution:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L339-L341).
`
`
`
`
`
`The Accused Functionality schedules the second batch of requests responsive to determining that the
`execution engine has memory available to execute the second batch of requests:
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L287-L291).
`
`
`
`21
`
`
`
`
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/queue.rs#L187-L192).
`
`
`
`
`
`(https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/launcher/src/main.rs#L156-L162).
`
`The Accused Functionality performs no check to validate that the inputs for the second batch of requests
`are the same length, and implies that requests in a batch can have an arbitrary length. (See
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/queue.rs;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs).
`
`TGI also confirms this behavior by explaining that it performs continuous batching, which allows for new
`requests, of different length than the first request, to be processed in the same batch:
`
`22
`
`
`
`
`
`
`
`
`
`
`
`(https://huggingface.co/blog/falcon; see also https://github.com/huggingface/text-generation-
`inference/tree/main/router; https://www.anyscale.com/blog/continuous-batching-llm-inference.)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`23
`
`
`
`
`
`
`
`
`
`1[F] generating, by the
`execution engine, a
`second set of output
`tokens by applying the
`transformer model to
`the second set of inputs
`for the second batch.
`
`The Accused Functionality performs the step of generating, by the execution engine, a second set of
`output tokens by applying the transformer model to the second set of inputs for the second batch.
`
`The Accused Functionality performs this step in the same fashion as described above, with respect to
`Claim 1[C]. (See https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/src/infer.rs#L361-L362;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/sharded_client.rs#L98-
`L103; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/router/client/src/client.rs#L107-L108;
`https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/server.py#L
`61-L62; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L634-L646; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L602; https://github.com/huggingface/text-generation-
`inference/blob/bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad/server/text_generation_server/models/flas
`h_causal_lm.py#L648-L657; https://github.com/huggingface/text-generati