throbber

`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`__________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`__________________________________________________________________
`
`SONY CORPORATION; SONY MOBILE COMMUNICATIONS AB; SONY
`MOBILE COMMUNICATIONS (USA) INC; AND SONY ELECTRONICS INC.
`Petitioner
`
`
`Patent No. 7,296,121
`Issue Date: Nov. 13, 2007
`Title: REDUCING PROBE TRAFFIC IN MULTIPROCESSOR SYSTEMS
`__________________________________________________________________
`
`EXPERT DECLARATION OF PROFESSOR DANIEL J. SORIN
`
`No. IPR2015-158
`
`__________________________________________________________________
`
`
`
`
`
`
`
`Petition for Inter Partes Review of 
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`I.
`
`INTRODUCTION
`
`1. I, Professor Daniel J. Sorin, have been retained by counsel for Sony
`
`Corporation, Sony Mobile Communications AB, Sony Mobile
`
`Communications (USA) Inc., and Sony Electronics Inc. (collectively,
`
`“Sony”).
`
`2. I submit this declaration in support of Sony’s Petition for Inter Partes Review
`
`of U.S. Pat. No. 7,296,121, No. IPR2015-158.
`
`II. QUALIFICATIONS
`
`3. I hold a Ph.D. in Electrical and Computer Engineering from the University
`
`of Wisconsin—Madison (awarded in 2002). My doctoral dissertation
`
`focused on checkpointing/recovery of multiprocessors with cache-coherent
`
`shared memory.
`
`4. I am an Associate Professor in the Department of Electrical and Computer
`
`Engineering at Duke University. Prior to being an Associate Professor, I
`
`was an Assistant Professor in the Department of Electrical and Computer
`
`Engineering at Duke University (2002-2009), a Research Assistant in the
`
`Computer Sciences Department at the University of Wisconsin—Madison
`
`(1996-2002), and a Teaching Assistant in the Department of Electrical and
`
`Computer Engineering at the University of Wisconsin—Madison (1996-
`
`1997).
`
`
`
`2
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`5. I am the author or co-author of two books: “A Primer on Memory
`
`Consistency and Cache Coherence” Synthesis Lectures on Computer Architecture,
`
`Morgan & Claypool Publishers, 2011; and “Fault Tolerant Computer
`
`Architecture” Synthesis Lectures on Computer Architecture, Morgan & Claypool
`
`Publishers, 2009.
`
`6. I have published over 70 technical articles, including over 20 related to cache
`
`coherence technology.
`
`7. I have over 16 years of experience in the design and implementation of
`
`cache coherency systems and protocols in multi-processor computer
`
`systems.
`
`8. My curriculum vitae more fully describes my education, professional
`
`experience, and relevant publications. See Sony-1014.
`
`III. MATERIALS CONSIDERED
`
`9. I have reviewed U.S. Pat. No. 7,296,121 (the “’121 Patent”) including its
`
`claims.
`
`10. I have reviewed U.S. Patent No. 7,698,509 to Koster (“Koster”). I
`
`understand Koster is prior art to the ’121 Patent.
`
`11. I have reviewed Jeffrey Kuskin, et al., The Stanford FLASH Multiprocessor,
`
`PROCEEDINGS ON THE 21ST ANNUAL INTERNATIONAL SYMPOSIUM ON
`
`
`
`3
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`COMPUTER ARCHITECTURE, IEEE (1994) (“Kuskin”). I understand Kuskin
`
`is prior art to the ’121 Patent.
`
`12. I have reviewed S. Park et al., Verification of Cache Coherence Protocols by
`
`Aggregation of Distributed Transactions, THEORY OF COMPUTING SYSTEMS 31
`
`(1998) (“Park”). I understand Park is prior art to the ’121 Patent.
`
`13. I have reviewed U.S. Patent No. 6,088,769 to Luick (“Luick”). I
`
`understand Luick is prior art to the ’121 Patent.
`
`14. I have reviewed U.S. Pat. Pub. No. 2002/0073261 (“Kosaraju”). I
`
`understand Kosaraju is prior art to the ’121 Patent.
`
`IV. PERSON OF ORDINARY SKILL IN THE ART
`
`15. Generally, the ’121 Patent is in the field cache coherency in multi-processor
`
`computer systems.
`
`16. I understand that the focus of this Inter Partes Review is the subject matter
`
`of claims 1-3, 8, 11-12, and 14-25 of the ’121 Patent. Generally, these
`
`claims describe a “probe filtering unit” (“PFU”) which increases the
`
`efficiency of memory transactions in a point-to-point multi-processor
`
`computer system having multiple cache memories. The claims further
`
`describe how the PFU receives probes from processing nodes
`
`corresponding to memory lines, and how the PFU transmits the probes
`
`only to selected ones of the processing nodes with references to probe
`
`
`
`4
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`filtering information representative of states associated with selected ones of
`
`the cache memories.
`
`17. In the 2002–2004 timeframe, a person with ordinary skill in the art with
`
`respect to the technology disclosed by the ’121 Patent would have a PhD
`
`degree in Electrical Engineering, Computer Engineering, or Computer
`
`Science or a MS degree and two to three years of industry experience in the
`
`area of cache coherency in multi-processor computer systems.
`
`18. Based upon my education and experience, I consider myself to be a person
`
`of at least ordinary skill in the field of technology disclosed by the ’121
`
`Patent.
`
`V. KOSTER
`
`19. Koster describes a point-to-point multi-processor architecture with a
`
`“snoop filter” having a “shadow tag memory” that stores the tags of data
`
`stored in the local cache memories of several microprocessors. By having
`
`the shadow tag memory, the snoop filter forwards a received broadcast for
`
`requested data (by one microprocessor) to a specific other microprocessor
`
`only if the shadow tag memory indicates that the other microprocessor has
`
`a copy of the requested data.
`
`20. At the time Koster was filed (July 13, 2004), it would have been obvious to
`
`one of ordinary skill in the art to implement the “snoop filter” disclosed in
`
`
`
`5
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`Koster on an integrated circuit, and more specifically, on an application-
`
`specific integrated circuit. This is so because such an implementation was
`
`the most common, most performant, and least burdensome of the known
`
`methods at the time.
`
`21. Furthermore, at the time Koster was filed (July 13, 2004), integrated circuits
`
`were necessarily created with a set of semiconductor processing masks.
`
`Accordingly, it thus would have been obvious to one of ordinary skill in the
`
`art at this time to use a set of semiconductor processing masks
`
`representative of at least a portion of the “snoop filter” disclosed in Koster
`
`to create an integrated circuit implementing the “snoop filter” of Koster.
`
`VI. KOSTER AND KUSKIN
`
`22. At the time Koster was filed (July 13, 2004), it would have been obvious to
`
`one of ordinary skill in the art to combine the teachings of Koster and
`
`Kuskin. Both Koster and Kuskin disclose solutions to solving cache
`
`coherency problems. Koster discloses a cache coherent computer system
`
`with a plurality of microprocessors that are integrated circuits. See Koster at
`
`2:26-3:5. Kuskin, similarly, discloses a cache coherency architecture, but
`
`with a “programmable protocol processor for flexibility.” Kuskin at 303.
`
`Furthermore, prior to July 13, 2004, it was known to those of ordinary skill
`
`in the art that implementing cache coherence using a programmable
`
`
`
`6
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`microprocessor afforded more flexibility than, for example, an ASIC.
`
`Accordingly, it would have been obvious to combine Koster with Kuskin in
`
`order to obtain the flexibility taught by Kuskin while retaining the increased
`
`performance taught by Koster.
`
`23. As used in claim 21 of the ’121 Patent, the term “netlist” would be
`
`understood by one of ordinary skill in the art to mean “a computer
`
`representation of a collection of logic units and how they are to be
`
`connected.”
`
`24. Verilog is a hardware description language that was well known prior to
`
`2000.
`
`25. Kuskin teaches building and simulating a cache coherency controller using a
`
`hardware description language for a tangible chip. See Kuskin at 302
`
`(describing “our Verilog code”), at 311 (“We currently have a detailed
`
`system-level simulator up and running. . . . On the hardware design front
`
`we are busily coding the Verilog description of the MAGIC chip.”). Use of
`
`a Verilog description to create a tangible chip necessarily requires the
`
`creation of a simulatable representation comprising a netlist
`
`26. Prior to July 13, 2004, it would have been obvious to a person of ordinary
`
`skill in the art to implement the “snoop filter” of Koster using a hardware
`
`
`
`7
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`description language (as shown in Kuskin) because that was the only
`
`commonly used method in the industry for designing hardware.
`
`VII. KOSTER, KUSKIN, AND PARK
`
`27. At the time Koster was filed (July 13, 2004), it would have been obvious to
`
`combine the teachings of Kuskin with Park. Kuskin discloses the Stanford
`
`FLASH multiprocessor, which implements its own cache-coherence
`
`protocol. Kuskin at 304 (“This section presents a base cache-coherence
`
`protocol and a base block-transfer protocol we have designed for
`
`FLASH.”). Park includes a more detailed description of this same Stanford
`
`FLASH cache coherency protocol. Park at § 4. Therefore, it would have
`
`been obvious to one of ordinary skill in the art to combine the teachings of
`
`Kuskin and Park because anyone studying the FLASH cache coherency
`
`protocol disclosed in Kuskin would look to Park for a more detailed
`
`description of the same system. As previously described, it would have
`
`been obvious to combine the teachings of Koster and Kuskin. Accordingly,
`
`at the time Koster was filed (July 13, 2004), it would have been obvious to
`
`combine the teachings of Koster, Kuskin, and Park.
`
`28. Koster discloses a solution to solving cache coherency problems. Cache
`
`coherency problems do not exist in a “read-only” system where the data
`
`cannot be changed, i.e., it is not possible for the cached data to become
`
`
`
`8
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`incoherent. Accordingly, although Koster primarily concerns the scenario
`
`of a node requesting “read” access, anyone of ordinary skill in the art would
`
`understand the system disclosed in Koster to allow a node to request
`
`“read/write” access as well.
`
`29. In the Stanford FLASH cache coherency protocol, when a node issues a
`
`request for “read/write” access to a memory line, the protocol may use a
`
`“delayed” mode where the home node (i.e., the node with the coherence
`
`directory) accumulates invalidation acknowledgements from other nodes
`
`(having shared copies of the memory line) before sending an “exclusive
`
`copy” of the memory line to the requesting node. Park at 362. Thus, in this
`
`“delayed mode,” the home node accumulates a plurality of responses, not
`
`just one, before sending a response back to the requesting node.
`
`VIII. LUICK AND KOSARAJU
`
`30. Prior to November 4, 2001 many architectures were well known to those
`
`with ordinary skill in the art for interconnecting multiple processing nodes
`
`in a computer system. Two of these architectures are shown in Figures 1
`
`and 5 of Kosaraju:
`
`
`
`9
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`Kosaraju, Fig. 1
`
`
`
`Kosaraju, Fig. 5
`
`
`
`31. A “Bus” architecture is where all communications of the processing nodes
`
`flow over a common bus. The Bus architecture does not use point-to-point
`
`links between any pair of processing nodes. The advantages to using a Bus
`
`architecture are its ability to provide totally ordered broadcasts/multicasts,
`
`low latency, and a simple design. However, the disadvantage to using a Bus
`
`
`
`10
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`architecture is the lack of scalability to a large number of processing nodes.
`
`A designer typically uses a bus architecture when the computer system
`
`requires a broadcast that must reach all of the processing nodes in the
`
`system, e.g., in a computer system using a snooping coherence protocol.
`
`32. A “Fully Connected” architecture (which Kosaraju shows in Figure 5) is
`
`where each processing node is adapted to be connected and communicate
`
`directly with each other processing node. The Fully Connected architecture
`
`uses point-to-point links between a processing node and every other
`
`processing node. The advantages to using a Fully Connected architecture
`
`are its low-latency paths between processing nodes and its ability to carry
`
`multiple messages concurrently. However, the disadvantages to using a
`
`Fully Connected architecture are lack of scalability to a large number of
`
`processing nodes and implementation cost.
`
`33. At the time Kosarju was filed (December 13, 2000), it would have been
`
`obvious to interchange the bus architecture shown in Figure 1 of Luick with
`
`the fully connected point-to-point architecture shown in Figure 5 of
`
`Kosaraju. Figure 1 of Luick discloses a multi-processor computer system
`
`having three processing nodes and a “global coherence unit”:
`
`
`
`11
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`
`34. The three processing nodes (Node 1, Node 2, and Node 3) in Figure 1 of
`
`Luick are interconnected using a Bus architecture. However, at the time
`
`Kosarju was filed (December 13, 2000), it would have been obvious to one
`
`of ordinary skill in the art to alternatively employ a Fully Connected
`
`architecture to interconnect the three processing nodes (and the GCU).
`
`This is so because as described above, the Fully Connected architecture has
`
`advantages over the Bus architecture. In a system having only three
`
`processing nodes, use of the Fully Connected architecture would be the
`
`most efficient because it offers low latency for small-scale systems with
`
`relatively few processing nodes. Indeed, use of a Bus architecture in Luick
`
`results in needless broadcasting of messages that need only be sent from
`
`one node to another.
`
`12
`
`
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`35. The ’121 Patent defines the term “probe” as “[a] mechanism for eliciting a
`
`response from a node to maintain cache coherency in a system.” ’121
`
`patent at 5:45-48.
`
`36. Luick discloses a “request” for data that is received by the GCU. Luick at
`
`7:16. The GCU then determines which node has the most current copy of
`
`the data to be read and determines which node is to respond to the request.
`
`Luick at 7:17-21. The GCU then sends the request to the node which the
`
`GCU has determined should respond. Luick at 7:23-25. This functionality
`
`of the system disclosed in Luick maintains cache coherency because it
`
`prevents a node with a stale copy of the requested data from responding to
`
`the request. Accordingly, the “request” disclosed in Luick (either received
`
`by the GCU or sent from the GCU) is a “mechanism for eliciting a
`
`response from a node to maintain cache coherency in a system.” Therefore,
`
`the “request” disclosed in Luick meets the definition of “probe” as used in
`
`the ’121 Patent.
`
`37. At the time Luick was issued (July 11, 2000), it would have been obvious to
`
`one of ordinary skill in the art to implement the GCU disclosed in Luick on
`
`an integrated circuit, and more specifically, on an application-specific
`
`integrated circuit. This is so because such an implementation was the most
`
`
`
`13
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`common, most performant, and least burdensome of the known methods at
`
`the time.
`
`38. Furthermore, at the time Luick was issued (July 11, 2000), integrated
`
`circuits were necessarily created with a set of semiconductor processing
`
`masks. Accordingly, it thus would have been obvious to one of ordinary
`
`skill in the art at this time to use a set of semiconductor processing masks
`
`representative of at least a portion of the GCU disclosed in Luick to create
`
`an integrated circuit implementing the GCU of Luick.
`
`IX. LUICK, KOSARAJU, AND KUSKIN
`39. At the time Luick was issued (July 11, 2000), it would have been obvious to
`
`one of ordinary skill in the art to combine the teachings of Luick and
`
`Kuskin. Both Luick and Kuskin disclose solutions to solving cache
`
`coherency problems. Luick discloses a cache coherent computer system
`
`with a plurality of microprocessors. Luick at Fig. 1, 3:65-4:4, 5:14-35.
`
`Kuskin, similarly, discloses a cache coherency architecture, but with a
`
`“programmable protocol processor for flexibility.” Kuskin at 303.
`
`Furthermore, prior to July 11, 2000, it was known to those of ordinary skill
`
`in the art that implementing cache coherence using a programmable
`
`microprocessor afforded more flexibility than, for example, an ASIC.
`
`Accordingly, it would have been obvious to combine Luick with Kuskin in
`
`
`
`14
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`
`order to obtain the flexibility taught by Kuskin while retaining the increased
`
`performance taught by Luick. As previously described, it would have been
`
`obvious to combine the teachings of Luick and Kosaraju. Accordingly, at
`
`the time Luick was issued (July 11, 2000), it would have been obvious to
`
`combine the teachings of Luick, Kosaraju, and Kuskin.
`
`40. Prior to July 11, 2000, it would have been obvious to a person of ordinary
`
`skill in the art to implement the GCU of Luick in a hardware description
`
`language (as shown in Kuskin) because that was the only commonly used
`
`method in the industry for programming hardware.
`
`X. LUICK, KOSARAJU, KUSKIN, AND PARK
`41. At the time Luick was issued (July 11, 2000), it would have been obvious to
`
`combine the teachings of Kuskin with Park. Kuskin discloses the Stanford
`
`FLASH multiprocessor, which implements its own cache-coherence
`
`protocol. Kuskin at 304 (“This section presents a base cache-coherence
`
`protocol and a base block-transfer protocol we have designed for
`
`FLASH.”). Park includes a more detailed description of this same Stanford
`
`FLASH cache coherency protocol. Park at § 4. Therefore, it would have
`
`been obvious to one of ordinary skill in the art to combine the teachings of
`
`Kuskin and Park because anyone studying the FLASH cache coherency
`
`protocol disclosed in Kuskin would look to Park for a more detailed
`
`
`
`15
`
`

`

`No. IPR2015-158
`Expert Declaration of Professor Daniel J. Sorin
`
`description of the same system. As previously described, it would have
`
`been obvious to combine the teachings of Luick, Kosaraju, and Kuskin.
`
`Accordingly, at the time Luick was issued Guly 11, 2000), it would have
`
`been obvious to combine the teachings of Luick, Kosaraju, Kuskin, and
`
`Park.
`
`42. Luick discloses a solution to solving cache coherency problems. Cache
`
`coherency problems do not exist in a "read-only" system where the data
`
`cannot be changed, i.e., it is not possible for the cached data to become
`
`incoherent. Indeed, Luick discloses embodiments concerning the writing of
`
`data. Luick at 7:65-9:51. Accordingly, anyone of ordinary skill in the art
`
`would understand the system disclosed in Luick to allow a node to request
`
`"read/write" access as well as "read" access.
`
`I declare under penalty of perjury that the foregoing is true and correct.
`
`Dated November __ , 2014
`
`Daniel]. Sorin
`
`16
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket