throbber
’775 Patent
`Claim Appendix
`The tables below provide side-by-side comparisons between corresponding claims of the ’775 patent.
`
`Differences for the compared claim language on the right side of each chart are indicated with cross-through and
`
`underline.
`
`I.
`
`#
`
`’775 PATENT | INDEPENDENT CLAIMS 1 AND 10
`775 PAT | CLAIM 1
`#
`1. A method of dynamically executing
`batches of requests on one or more
`execution engines running a machine-
`learning transformer model,
`comprising:
`
`775_1[PRE]
`
`775_10[PRE]
`
`775_1[A]
`
`775_1[B]
`
`receiving, by a serving system, one or
`more requests for execution, the
`serving system including a scheduler
`and one or more execution engines
`each coupled to access a machine-
`learning transformer model including
`at least a set of decoders;
`scheduling, by the scheduler, a batch
`of requests including the one or more
`requests for execution on an execution
`engine;
`
`775_10[A]
`
`775_10[B]
`
`1
`
`775 PAT | CLAIM 10
`A method ofnon-transitory computer-
`readable storage medium storing
`computer program instructions
`executable to perform operations
`for dynamically executing batches of
`requests on one or more execution
`engines running a machine-learning
`transformer model, the
`operations comprising:
`receiving, by a serving system, one or
`more requests for execution, the
`serving system including a scheduler
`and one or more execution engines
`each coupled to access a machine-
`learning transformer model including
`at least a set of decoders;
`scheduling, by the scheduler, a batch
`of requests including the one or more
`requests for execution on an
`execution engine;
`
`Petitioner, EX1022
`IPR2024-01234
`Hugging Face, Inc., v. FriendliAI Inc.
`
`

`

`#
`
`775_1[C]
`
`775_1[D]
`
`775_1[E]
`
`775 PAT | CLAIM 1
`generating, by the execution engine, a
`first set of output tokens by applying
`the transformer model to a first set of
`inputs for the batch of requests,
`wherein applying the transformer
`model comprises applying at least one
`batch operation to one or more input
`tensors associated with the batch of
`requests;
`receiving, by a request processor, a
`new request from a client device, the
`new request including a sequence of
`input tokens;
`scheduling, by the scheduler, a second
`batch of requests additionally
`including the new request for
`execution on the execution engine, the
`second batch of requests scheduled
`responsive to determining that the
`execution engine has memory
`available to execute the second batch
`of requests, wherein in a second set of
`inputs for the second batch of
`requests, a length of the sequence of
`input tokens for the new request is
`different from a length of an input for
`
`#
`
`775_10[C]
`
`775_10[D]
`
`775_10[E]
`
`2
`
`’775 Patent
`Claim Appendix
`775 PAT | CLAIM 10
`generating, by the execution engine, a
`first set of output tokens by applying
`the transformer model to a first set of
`inputs for the batch of requests,
`wherein applying the transformer
`model comprises applying at least one
`batch operation to one or more input
`tensors associated with the batch of
`requests;
`receiving, by a request processor, a
`new request from a client device, the
`new request including a sequence of
`input tokens;
`scheduling, by the scheduler, a
`second batch of requests additionally
`including the new request for
`execution on the execution engine,
`the second batch of requests
`scheduled responsive to determining
`that the execution engine has memory
`available to execute the second batch
`of requests, wherein in a second set of
`inputs for the second batch of
`requests, a length of the sequence of
`input tokens for the new request is
`different from a length of an input for
`
`

`

`#
`
`775_1[F]
`
`II.
`
`775_10[F]
`
`#
`
`#
`
`775 PAT | CLAIM 1
`at least one request other than the new
`request; and
`generating, by the execution engine, a
`second set of output tokens by
`applying the transformer model to the
`second set of inputs for the second
`batch.
`’775 PATENT | DEPENDENT CLAIMS 2 AND 11
`775 PAT | CLAIM 2
`#
`2. The method of claim 1, further
`comprising: responsive to determining
`that a request in the first batch of
`requests has been completed,
`providing output tokens generated for
`the completed request to a client
`device as a response to the request,
`and
`
`775_2[A]
`
`775_11[A]
`
`775_2[B]
`
`wherein the second batch of requests
`includes at least one of the remaining
`requests from the one or more
`requests and the new request.
`
`775_11[B]
`
`
`
`
`
`
`3
`
`’775 Patent
`Claim Appendix
`775 PAT | CLAIM 10
`at least one request other than the new
`request; and
`generating, by the execution engine, a
`second set of output tokens by
`applying the transformer model to the
`second set of inputs for the second
`batch.
`
`775 PAT | CLAIM 11
`11. The method of claim 1,non-
`transitory computer-readable storage
`medium of claim 10, the
`operations further comprising:
`responsive to determining that a
`request in the first batch of requests
`has been completed, providing output
`tokens generated for the completed
`request to a client device as a
`response to the request, and
`wherein the second batch of requests
`includes at least one of the remaining
`requests from the one or more
`requests and the new request.
`
`

`

`III.
`
`’775 PATENT | DEPENDENT CLAIMS 3 AND 12
`775 PAT | CLAIM 3
`#
`3. The method of claim 2, wherein the
`request is associated with a cache
`memory in the execution engine
`dedicated for storing an internal state
`for the request, and responsive to
`determining that the request has been
`completed, freeing the dedicated
`cache memory for the request in the
`execution engine.
`
`775_3
`
`#
`
`775_12
`
`
`IV.
`
`’775 PATENT | DEPENDENT CLAIMS 4 AND 13
`775 PAT | CLAIM 4
`#
`4. The method of claim 1, wherein the
`input for the at least one request is an
`output token from the first set of
`output tokens for the at least one
`request, and wherein a length of the
`sequence of input tokens for the new
`request is different from a length of
`the output token for the at least one
`request.
`
`775_4
`
`#
`
`775_13
`
`4
`
`’775 Patent
`Claim Appendix
`
`775 PAT | CLAIM 12
`12. The method of claim 1,non-
`transitory computer-readable storage
`medium of claim 10, wherein the
`request is associated with a cache
`memory in the execution engine
`dedicated for storing an internal state
`for the request, and responsive to
`determining that the request has been
`completed, freeing the dedicated
`cache memory for the request in the
`execution engine.
`
`775 PAT | CLAIM 13
`13. The method non-transitory
`computer-readable storage medium of
`claim 10, wherein the input for the at
`least one request is an at least
`one output token from the first set of
`output tokens for the at least one
`request, and wherein a length of the
`sequence of input tokens for the new
`request is different from a length of
`the at least one output token for the at
`least one request.
`
`

`

`’775 Patent
`Claim Appendix
`
`775 PAT | CLAIM 14
`14. The method of claim 4,non-
`transitory computer-readable storage
`medium of claim 13, wherein the
`execution engine includes a cache
`memory for maintaining a key cache
`tensor for storing keys and a value
`cache tensor for storing values for the
`at least one request, and
`wherein after scheduling the second
`batch of requests, allocating, by the
`execution engine, a new cache
`memory dedicated to maintaining a
`key cache tensor and a value cache
`tensor for the new request.
`
`V.
`
`775_5[A]
`
`’775 PATENT | DEPENDENT CLAIMS 5 AND 14
`775 PAT | CLAIM 5
`#
`5. The method of claim 4, wherein the
`execution engine includes a cache
`memory for maintaining a key cache
`tensor for storing keys and a value
`cache tensor for storing values for the
`at least one request, and
`
`#
`
`775_14[A]
`
`775_5[B]
`
`wherein after scheduling the second
`batch of requests, allocating, by the
`execution engine, a new cache
`memory dedicated to maintaining a
`key cache tensor and a value cache
`tensor for the new request.
`
`
`
`
`
`
`775_14[B]
`
`5
`
`

`

`VI.
`
`’775 PATENT | DEPENDENT CLAIMS 6 AND 15
`775 PAT | CLAIM 6
`#
`6. The method of claim 5, wherein
`after generating the second set of
`output tokens, a length of the key
`cache tensor for the at least one
`request is different from a length of a
`key cache tensor for the new request,
`and a length of the value cache tensor
`for the at least one request is different
`from a length of a value cache tensor
`for the new request.
`
`775_6
`
`#
`
`775_15
`
`VII. ’775 PATENT | DEPENDENT CLAIMS 7 AND 16
`775 PAT | CLAIM 7
`#
`7. The method of claim 1, after
`receiving the new request from the
`client device, determining, by the
`scheduler, that there is insufficient
`memory available to execute the
`second batch of requests on a second
`execution engine different from the
`execution engine, and responsive to
`the determination for the second
`execution engine, determining
`
`775_7
`
`#
`
`775_16
`
`6
`
`’775 Patent
`Claim Appendix
`
`775 PAT | CLAIM 15
`15. The method of claim 4,non-
`transitory computer-readable storage
`medium of claim 14, wherein after
`generating the second set of output
`tokens, a length of the key cache
`tensor for the at least one request is
`different from a length of a key cache
`tensor for the new request, and a
`length of the value cache tensor for
`the at least one request is different
`from a length of a value cache tensor
`for the new request.
`
`775 PAT | CLAIM 16
`16. The method of claim 1,non-
`transitory computer-readable storage
`medium of claim 10, after receiving
`the new request from the client
`device, determining, by the scheduler,
`that there is insufficient memory
`available to execute the second batch
`of requests on a second execution
`engine different from the execution
`engine, and responsive to the
`
`

`

`#
`
`775 PAT | CLAIM 7
`whether the execution engine has the
`memory available to execute the
`second batch of requests.
`
`VIII. ’775 PATENT | DEPENDENT CLAIMS 8 AND 17
`775 PAT | CLAIM 8
`#
`8. The method of claim 1, wherein the
`execution engine is configured as a
`graphics processing unit (GPU) or a
`tensor processing unit (TPU).
`
`775_8
`
`#
`
`#
`
`775_17
`
`IX.
`
`’775 PATENT | DEPENDENT CLAIMS 9 AND 18
`775 PAT | CLAIM 9
`#
`9. The method of claim 1, wherein
`each token in the sequence of input
`tokens represents a text unit.
`
`775_9
`
`#
`
`775_18
`
`’775 Patent
`Claim Appendix
`775 PAT | CLAIM 16
`determination for the second
`execution engine, determining
`whether the execution engine has the
`memory available to execute the
`second batch of requests.
`
`775 PAT | CLAIM 17
`17. The method of claim 1,non-
`transitory computer-readable storage
`medium of claim 10, wherein the
`execution engine is configured as a
`graphics processing unit (GPU) or a
`tensor processing unit (TPU).
`
`775 PAT | CLAIM 18
`18. The method of claim 1,non-
`transitory computer-readable storage
`medium of claim 10, wherein each
`token in the sequence of input tokens
`represents a text unit.
`
`7
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket