`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`_____________________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`_____________________________
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`CALIFORNIA INSTITUTE OF TECHNOLOGY,
`Patent Owner.
`
`_____________________________
`
`IPR2017-00700, -00701, and -00728
`Patent No. 7,421,032
`_____________________________
`
`
`DECLARATION OF DR. MICHAEL MITZENMACHER
`
`
`
`
`
`
`
`
`
`CALTECH - EXHIBIT 2004
`Apple Inc. v. California Institute of Technology
`IPR2017-00728
`
`
`
`TABLE OF CONTENTS
`
`INTRODUCTION TO CHANNEL CODING AND
`
`ENGAGEMENT........................................................................................... 1
`I.
`QUALIFICATIONS ..................................................................................... 1
`II.
`III. COMPENSATION AND PRIOR TESTIMONY .......................................... 6
`IV. LEGAL PRINCIPLES .................................................................................. 7
`V.
`TERMINOLOGY ....................................................................................... 10
`VI. OVERVIEW OF THE ART AND CITED REFERENCES ........................ 20
`A.
`Turbo Codes ..................................................................................... 22
`B.
`Gallager/LDPC codes ....................................................................... 24
`C.
`Cited References ............................................................................... 27
`1. MacKay .................................................................................. 27
`2.
`Ping ........................................................................................ 28
`3.
`Divsalar .................................................................................. 30
`VII. PERSON OF ORDINARY SKILL IN THE ART ....................................... 31
`VIII. PROCEDURAL BACKGROUND ............................................................. 32
`IX. CLAIM CONSTRUCTION ........................................................................ 33
`“irregular” ......................................................................................... 33
`A.
`B.
`“Tanner Graph” ................................................................................ 34
`X.
`DIVSALAR ................................................................................................ 35
`A.
`graph” ............................................................................................... 35
`B.
`or MacKay ........................................................................................ 38
`C. MacKay does not teach nonuniform row weights.............................. 44
`D. A Person of Ordinary Skill in the Art would not be motivated to
`combine Ping with MacKay .............................................................. 45
`
`REBUTTAL TO -00700 CASE GROUND 1: CLAIMS 11, 12, AND
`14-16 ARE NOT OBVIOUS OVER PING, MACKAY AND
`
`The Petition fails to identify parity bits that are determined “as
`shown by the configuration of nodes and edges of the Tanner
`
`The Petition fails to identify irregular repetition in either Ping
`
`
`
`i
`
`
`
`The proposed modification would eliminate Ping’s stated
`
`The Petition frequently mischaracterizes both Ping and
`
`The similarity in terms between MacKay and Ping do not
`
`Dr. Davis’s testimony is inconsistent with the Petition’s
`
`Ping is already irregular as defined by MacKay ...................... 45
`1.
`2.
`improvement ........................................................................... 54
`3.
`MacKay .................................................................................. 56
`4.
`establish a motivation to combine ........................................... 61
`5.
`motivation to combine ............................................................ 63
`6.
`modification would be accomplished ...................................... 65
`7.
`reasonable expectation of success ........................................... 70
`E.
`motivated to combine Ping with Divsalar ......................................... 80
`XI. REBUTTAL TO -00700 GROUND 2: CLAIM 13 IS NOT
`OBVIOUS OVER PING, MACKAY, DIVSALAR, AND LUBY97 .......... 85
`XII. REBUTTAL TO -00701 GROUND 1: CLAIMS 1, 4-10 ARE NOT
`OBVIOUS OVER PING, MACKAY, DIVSALAR, AND LUBY97 .......... 86
`XIII. REBUTTAL TO -00728 GROUND 1: CLAIMS 18-23 ARE NOT
`OBVIOUS OVER PING, MACKAY, DIVSALAR, AND LUBY97 .......... 90
`A.
`be used with the Petition’s proposed combination ............................. 91
`XIV. SECONDARY CONSIDERATIONS OF NON-OBVIOUSNESS .............. 92
`A. Nexus between the objective evidence and the claims....................... 94
`B.
`Long-felt need and failure of others ................................................ 101
`Industry Praise ................................................................................ 104
`C.
`D. Unexpected Results......................................................................... 107
`E.
`Commercial Success ....................................................................... 108
`XV. CONCLUDING STATEMENTS .............................................................. 110
`
`The Petition fails to explain how their proposed
`
`Ping combined with MacKay would not have any
`
`A Person of Ordinary Skill in the Art would not have been
`
`The Petition does not provide any explanation for how
`Divsalar’s, MacKay’s, or Luby97’s decoding algorithms are to
`
`
`
`
`
`ii
`
`
`
`I, Michael Mitzenmacher, declare as follows:
`
`I.
`
`ENGAGEMENT
`
`1.
`
`I have been retained by counsel for the California Institute of
`
`Technology as an expert witness in the above-captioned proceeding. I have been
`
`asked to provide my opinion about the state of the art of the technology described
`
`in U.S. Patent No. 7,421,032 (the “’032 patent”) and on the patentability of the
`
`claims of this patent, specifically with regard to the grounds of institution in the
`
`cases IPR2017-00700, IPR2017-00701, and IPR2017-00728. The following is my
`
`written testimony on these topics.
`
`II. QUALIFICATIONS
`
`2.
`
`I am currently employed as a Professor of Computer Science at
`
`Harvard University. Specifically, I am the Thomas J. Watson, Sr. Professor of
`
`Computer Science in the School of Engineering and Applied Sciences. I joined the
`
`faculty of Harvard as an Assistant Professor in January 1999. I was promoted to
`
`Associate Professor in 2002 and to Professor in 2005. In 2010, I began a three-
`
`year term as Area Dean, which is essentially equivalent to what other schools call
`
`Department Chair, of Computer Science, and held that position through June 2013.
`
`My work address is 33 Oxford Street, Cambridge, MA 02138. My primary
`
`research interests include design and analysis of algorithms, networks and data
`
`transmission, and information theory.
`
`
`
`1
`
`
`
`3.
`
`I received my undergraduate degree in Mathematics and Computer
`
`Science from Harvard College in 1991. I received a Certificate of Advanced Study
`
`in Mathematics from Cambridge University in 1992. I received a Ph.D. in
`
`Computer Science from the University of California at Berkeley in 1996. From
`
`August 1996 to January 1999, I was employed as a Research Scientist at Digital
`
`Systems Research Center, where my work included projects on algorithms for the
`
`Internet and error-correcting codes.
`
`4.
`
`I am listed as an inventor or co-inventor on 19 issued patents, and am
`
`the co-author of a textbook entitled “Probability and Computing” published by
`
`Cambridge University Press. I am a Fellow of the Association for Computing
`
`Machinery, and currently serve as the Chair of the ACM Special Interest Group on
`
`Algorithms and Computation Theory (SIGACT).
`
`5.
`
`The fields of endeavor at issue in this case are error-correction coding
`
`methods, including repeat-accumulate codes, Turbo codes, and low-density parity-
`
`check codes. I have published over 200 research papers in computer science and
`
`engineering conferences and journals, many of which have explored algorithms
`
`and data structures for error-correction codes, including both mathematical
`
`analysis and applications.
`
`
`
`2
`
`
`
`6.
`
`I have authored or co-authored a number of papers specifically in the
`
`area of low-density parity-check codes, including papers that have been presented
`
`as potential prior art in these proceedings. For example, the paper “Improved
`
`Low-Density Parity Check Codes Using Irregular Graphs,” received the 2002
`
`IEEE Information Theory Society Best Paper Award. Another paper, “A Digital
`
`Fountain Approach to Reliable Distribution of Bulk Data,” received the 2009
`
`ACM SIGCOMM Test of Time Paper Award.
`
`7.
`
`I have received multiple research grants related to codes and coding
`
`theory, including grant NSF CCR-0118701: Low Density Parity Check Codes for
`
`Channels with Memory, and NSF CCF-0634923: Towards a Basic Understanding
`
`of Channels with Synchronization Errors. My current research on data
`
`synchronization is also related to codes and coding theory.
`
`8.
`
`I am a co-inventor on several patents related to error-correcting codes,
`
`including several patents specifically related to low-density parity-check codes.
`
`9.
`
`I regularly serve on program committees for conferences in
`
`networking, algorithms, and communication, including conferences that consider
`
`work on error-correcting codes. For example, I have served on the program
`
`committee multiple times for the SIGCOMM conference, which is described on
`
`the conference homepage as follows: “SIGCOMM is the flagship annual
`
`
`
`3
`
`
`
`conference of the ACM Special Interest Group on Data Communication
`
`(SIGCOMM) on the applications, technologies, architectures, and protocols for
`
`computer communication.” I have also served on the program committee of the
`
`International Symposium on Information Theory.
`
`10. My graduate course entitled “Algorithms at the end of the wire”
`
`covers topics in error-correcting codes and their applications. My graduate course
`
`on randomized algorithms and probabilistic analysis has a short unit on the theory
`
`of error-correction, which is included in the textbook I have co-authored.
`
`11.
`
`I have done substantial work in the area of low-density parity-check
`
`codes, which began in mid-1990s. I started a collaboration with Michael Luby,
`
`Amin Shokrollahi, Daniel Spielman, and Volker Stemann in 1996, while I was
`
`finishing my PhD thesis. We were interested in making data transmission over the
`
`Internet more efficient, by finding ways to deal with lost information. On the
`
`Internet, data is packaged into an ordered sequence of numbered packets, but
`
`packets for various reasons may not reach the destination. When packets do not
`
`arrive, the receiver must ask again for the numbered packets that are missing,
`
`causing delay from the extra round-trips required to ask for additional information.
`
`12. Working as a group to solve this problem we independently developed
`
`variations of what are referred to as Gallager codes, based on random graphs, and
`
`
`
`4
`
`
`
`methods to analyze them. See Ex. 1008. Our work provided codes that could
`
`correct erasures. An erasure channel provides a good model for data transmission
`
`schemes like the Internet, where data is packaged into an ordered sequence of
`
`numbered packets, but packets may not reach the destination. One can send extra
`
`encoded data beyond the original data to the destination. On an erasure channel,
`
`some of the data will be missing, but one knows what data is missing based on
`
`what numbered packets are missing. Using the extra encoded data, if enough
`
`packets arrive at the destination, the original data can all be recovered. In this
`
`setting, the code can only deal with erasures, not errors, in the transmitted data. In
`
`this work, we found that using a layered sequence of irregular random graphs
`
`could provide better results for the codes that we developed than using a layered
`
`sequence of regular graphs. Each layer in the sequence of layered graphs could be
`
`viewed as a low-density parity-check code. This result was a surprise to us – we
`
`had initially focused on regular random graphs – but arose out of our mathematical
`
`analysis.
`
`13. After this work, we were inspired to consider codes for error channels,
`
`where each bit is flipped with a fixed probability. See Ex. 1009. We were
`
`surprised again to find that codes using irregular graphs, which we called irregular
`
`codes, performed better than their regular counterparts. This work achieved
`
`
`
`5
`
`
`
`significant prominence; a later, significantly more complete version of the work,
`
`published in the IEEE Transactions on Information Theory, received the 2002
`
`IEEE Information Theory Society Award.
`
`14. With another group of researchers (initially Michael Luby, John
`
`Byers, and Ashutosh Rege), we researched using our erasure code technology for
`
`practical data transmission in distributed, networked systems. Our initial paper on
`
`this theme appeared in the 1998 SIGCOMM conference, which was then and
`
`remains a premier venue for networking research. The paper received the 2009
`
`ACM SIGCOMM Test of Time Paper Award, an award given looking backward
`
`for a publication “that is deemed to be an outstanding paper whose contents are
`
`still a vibrant and useful contribution today.”
`
`15. As part of this work we filed several patents which eventually issued,
`
`all of which are listed on my CV. Sometime around the 1999-2000 timeframe
`
`Michael Luby started a company seeking to commercialize this type of coding
`
`technology (including further advances), and I consulted for that company for a
`
`number of years.
`
`16. My Curriculum Vitae is submitted herewith as Ex. 2005.
`
`III. COMPENSATION AND PRIOR TESTIMONY
`
`17.
`
`I am being compensated at a rate of $875 per hour for my study and
`
`other work in this matter. I am also being reimbursed for reasonable and
`
`
`
`6
`
`
`
`customary expenses associated with my work in this investigation. My
`
`compensation is not contingent on the outcome of this matter or the specifics of my
`
`testimony.
`
`IV. LEGAL PRINCIPLES
`
`18.
`
`I have been advised that a claimed invention is not patentable under
`
`35 U.S.C. § 103 if it is obvious. A patent claim is unpatentable if the claimed
`
`invention would have been obvious to a person of ordinary skill in the field at the
`
`time the claimed invention was made. This means that even if all of the
`
`requirements of the claim cannot be found in a single prior art reference that would
`
`anticipate the claim, a person of ordinary skill in the relevant field who knew about
`
`all this prior art would have come up with the claimed invention.
`
`19.
`
`I have further been advised that the ultimate conclusion of whether a
`
`claim is obvious should be based upon several factual determinations. That is, a
`
`determination of obviousness requires inquiries into: (1) the level of ordinary skill
`
`in the field; (2) the scope and content of the prior art; (3) what difference, if any,
`
`existed between the claimed invention and the prior art; and (4) any secondary
`
`evidence bearing on obviousness.
`
`20.
`
`I have been advised that, in determining the level of ordinary skill in
`
`the field that someone would have had at the time the claimed invention was made,
`
`I should consider: (1) the levels of education and experience of persons working in
`
`
`
`7
`
`
`
`the field; (2) the types of problems encountered in the field; and (3) the
`
`sophistication of the technology.
`
`21.
`
`I have also been advised that, in determining the scope and content of
`
`the prior art, in order to be considered as prior art, a reference must be reasonably
`
`related to the claimed invention of the patent. A reference is reasonably related if it
`
`is in the same field as the claimed invention or is from another field to which a
`
`person of ordinary skill in the field would look to solve a known problem.
`
`22.
`
`I have been advised that a patent claim composed of several elements
`
`is not proved obvious merely by demonstrating that each of its elements was
`
`independently known in the prior art. In evaluating whether such a claim would
`
`have been obvious, I may consider whether there is a reason that would have
`
`prompted a person of ordinary skill in the field to combine the elements or
`
`concepts from the prior art in the same way as in the claimed invention.
`
`23.
`
`I have been further advised that there is no single way to define the
`
`line between true inventiveness on the one hand (which is patentable) and the
`
`application of common sense and ordinary skill to solve a problem on the other
`
`hand (which is not patentable). For example, market forces or other design
`
`incentives may be what produced a change, rather than true inventiveness. I may
`
`consider whether the change was merely the predictable result of using prior art
`
`
`
`8
`
`
`
`elements according to their known functions, or whether it was the result of true
`
`inventiveness. I may also consider whether there is some teaching or suggestion in
`
`the prior art to make the modification or combination of elements claimed in the
`
`patent. I may consider whether the innovation applies a known technique that had
`
`been used to improve a similar device or method in a similar way. I may also
`
`consider whether the claimed invention would have been obvious to try, meaning
`
`that the claimed innovation was one of a relatively small number of possible
`
`approaches to the problem with a reasonable expectation of success by those
`
`skilled in the art.
`
`24.
`
`I have also been advised, however, that I must be careful not to
`
`determine obviousness using the benefit of hindsight; many true inventions might
`
`seem obvious after the fact. I should put myself in the position of a person of
`
`ordinary skill in the field at the time the claimed invention was made and I should
`
`not consider what is known today or what is learned from the teaching of the
`
`patent.
`
`25. Finally, I have been advised that any obviousness rationale for
`
`modifying or combining prior art must include a showing that a person of ordinary
`
`skill would have had a reasonable expectation of success.
`
`
`
`9
`
`
`
`26. With regard to secondary considerations of nonobviousness, I have
`
`been advised that any objective evidence may be considered as an indication that
`
`the claimed invention would not have been obvious at the time the claimed
`
`invention was made. I understand that the purpose of secondary considerations is
`
`to prevent a hindsight analysis of the obviousness of the claims.
`
`27.
`
`I have been advised that there are several factors that may be
`
`considered as a secondary consideration. These factors include the commercial
`
`success of the invention, industry praise for the invention, skepticism of the
`
`invention, licensing of the invention, copying of the invention, any long-felt need
`
`that the invention solved, failure of others, and unexpected results of the invention.
`
`28.
`
`I have been further advised that in order for evidence of secondary
`
`considerations to be significant, there must be a sufficient nexus between the
`
`claimed invention and the evidence of secondary considerations. I understand that
`
`this nexus serves to provide a link between the merits of the claimed invention and
`
`the evidence of secondary considerations provided.
`
`V.
`
`INTRODUCTION TO CHANNEL CODING AND TERMINOLOGY
`
`29. The ’032 patent and the prior art relate to the field of channel coding
`
`through the use of error-correction codes. Channel coding is incredibly important
`
`for any situation where information data (typically a series of bits) is transmitted
`
`over a potentially noisy channel, because in many use cases, the data must be
`
`
`
`10
`
`
`
`received without any corruption. Since the presence of noise in the transmission
`
`channel can cause the value of any bits to be flipped, channel coding is the process
`
`of adding redundancy to the data so that such errors can be identified and
`
`corrected. This is done by using an encoder on the information data to generate
`
`parity bits. The output of the encoder is known as a codeword. A systematic
`
`encoder includes both the information bits and parity bits within its output
`
`codeword, whereas a non-systematic encoder includes only the parity bits. An
`
`encoder has a rate that is the ratio of the number of input bits to the number of bits
`
`output by the encoder. Generally, higher rate codes are preferable to lower rates
`
`because it means less data needs to be transmitted and decoded.
`
`30. One way to achieve a higher rate is a technique referred to as
`
`puncturing. Puncturing generally refers to removing some of the parity bits from
`
`the encoding, so that there are fewer bits in the output. Since the rate is the ratio of
`
`the number of input bits to the number of bits output by the encoder, removing
`
`some of the parity bits increases the rate. Puncturing can be utilized if a pre-
`
`defined pattern of bits is punctured at the encoder and this is known at the decoder.
`
`However, puncturing is not an arbitrary practice and can raise challenging
`
`questions, such as which bits to remove to obtain suitable performance.
`
`
`
`11
`
`
`
`31. The encoded output is transmitted over a channel, which represents
`
`the path from the sender to the receiver. In a typical example, the sender and
`
`receiver might be cell phones, and the channel represents the full effects of the
`
`transmission of the signal, including through cell towers and any intermediate
`
`transmission mechanism. In a more unusual case, if I use a program to store a file
`
`on a computer in encoded form, and use the program to read it back later, the
`
`program is both the sender and the receiver, and the writing to and reading from
`
`memory constitutes the channel.
`
`32. On the receiving end, the received information and parity bits, which
`
`may have been altered as they go through the channel, are used by a decoder to
`
`identify and correct any errors that may have occurred during transmission.
`
`Sometimes, errors cannot be corrected—typically, the ratio of uncorrected bits to
`
`the total number of information bits is known as a bit error rate or BER. A
`
`standard measure of performance is the average BER for a given channel at a given
`
`signal-to-noise ratio, where the average may be found mathematically or measured
`
`by experiment. The variance in the BER over multiple experiments can also be
`
`important, with a high variance generally being worse, as it can mean there is a
`
`substantial risk of poor performance.
`
`
`
`12
`
`
`
`33. Those of ordinary skill at the time would have understood the
`
`difference between a generator matrix and a parity-check matrix. The class of
`
`linear codes (which includes the codes at issue in this proceeding) can be defined
`
`by a generator matrix. An (n,k) code maps inputs that are k-bit messages into n-bit
`
`codewords. A (n,k) linear block code can be represented by a generator matrix that
`
`defines the codewords. If we think of the input x as a column vector, the output
`
`codeword c as a column vector, and G as an n by k generator matrix, then their
`
`relationship is c = Gx. That is, one generates the codeword by multiplying the
`
`generator matrix by the input vector of bits. The generator matrix not only
`
`represents a code, it also gives us a way to compute the output codeword given an
`
`input.
`
`34. As an example, the generator matrix for what is known as the (7,4)
`
`Hamming code (one of the very first codes developed) is given by the following
`
`matrix:
`
`
`
`
`
`13
`
`
`
`35.
`
`If the above Hamming code takes a four bit input vector, such as [1 1
`
`0 1], it will produce a seven bit output by multiplying the generator matrix by a
`
`column vector containing the input, modulo 2:
`
`
`
`
`
`36. The output codeword is [1 0 0 1 1 0 1]. The Hamming code is a
`
`systematic code, with the last four bits of the output repeating the input bits. The
`
`generator matrix reflects this particular feature in the diagonal of 1’s in the bottom
`
`four rows.
`
`37. Linear codes can also be represented by a parity-check matrix.
`
`Whereas the generator matrix provides a way to generate an output, the parity-
`
`check matrix provides a way to check if the codeword is valid. For an output
`
`codeword c and a parity-check matrix H, we would expect Hc = 0 (modulo 2).
`
`Only valid codewords would satisfy this equation. For the above (7,4) Hamming
`
`code, the corresponding parity-check matrix is:
`
`
`
`14
`
`
`
`
`
`38. Parity-check matrices can be viewed as a series of parity-check
`
`equations that require the sum of certain bits within the codeword to add up to zero
`
`modulo 2.1 In the above (7,4) Hamming code, the first row of the parity-check
`matrix corresponds to the equation (cid:1)(cid:2) (cid:3) (cid:1)(cid:4) (cid:3) (cid:1)(cid:5) (cid:3) (cid:1)(cid:6) (cid:7) 0 modulo 2.
`39. Using the above codeword [1 0 0 1 1 0 1], we can use the parity-check
`
`matrix to determine if it is a valid codeword. The result is a 0 vector, which
`
`indicates it is a valid codeword.
`
`
`
`40. A parity-check matrix can also be represented as a Tanner graph,
`
`which is a bipartite graph containing two sets of nodes: message nodes and check
`
`
`
`
`1 Addition modulo 2 refers to taking the remainder after dividing the sum by 2.
`
`The even numbers 0, 2, 4, 6, … are equal to 0 modulo 2, since each of them gives a
`
`remainder of 0 when divided by 2. Similarly, 1, 3, 5,… are equal to 1 modulo 2.
`
`Another way of thinking about this summation is that 1+1 = 0.
`
`
`
`15
`
`
`
`nodes. Edges between the nodes can only be between a message node and a check
`
`node. Each message node corresponds to a bit in the codeword, and each check
`
`node corresponds to a parity-check equation. An edge indicates that the particular
`
`message bit is used in the particular parity-check equation. In terms of the parity-
`
`check matrix, message nodes represent each of the matrix’s columns and check
`
`nodes represent each of the matrix’s rows, and if a particular row and column
`
`contains a 1, then there is also an edge between the corresponding check node and
`
`message node in the Tanner graph. For example, in the above (7,4) Hamming code,
`
`the first, fourth, fifth and seventh message nodes have edges to the first check
`
`node. The complete Tanner graph for the above (7,4) Hamming code is:
`
`
`
`16
`
`
`
`
`
`41.
`
`It is important to understand the distinction between a generator
`
`matrix and a parity-check matrix. While they both represent a code, one represents
`
`how the code is generated, whereas the other represents the constraints of the code.
`
`It would be atypical and unexpected for a code’s generator matrix to be identical to
`
`its parity-check matrix.
`
`42. One thing to keep in mind as one peruses the art is to be cautious as to
`
`which representation the art is discussing. At the time of the patent, low-density
`
`parity-check (LDPC) codes were concerned with the parity-check matrix and the
`
`distribution of 1s in its matrix (or in the context of a Tanner graph, the distribution
`
`
`
`17
`
`
`
`of edges between message nodes and check nodes). On the other hand, turbo
`
`codes and turbo-like codes tend to focus on the how the code is generated, and not
`
`the resulting parity-check constraints.
`
`43. Those of ordinary skill at the time would have been familiar with the
`
`phenomenon of an “error floor” encountered in error correcting codes. Ex.2030.
`
`Typically, error correction codes achieve better performance as conditions
`
`improve—that is, when there is less noise. Some codes, however, reach a limit
`
`where performance does not improve as quickly as one would expect or desire.
`
`When plotting the noise (SNR) against the bit error rate (BER), the curve flattens
`
`out horizontally. The flattening effect or flatter region of the curve is referred to as
`
`the error floor. Error floors are viewed as undesirable and are typically caused by
`
`low-weight codewords. Those in the art at the time generally viewed a BER error
`
`floor of about 10-5 as a threshold target for acceptable performance of an error
`
`correcting code. See, e.g., Ex. 1006, p. 1068. For example, in the below graph
`
`from MacKay (Ex. 1002), one of the dotted-line curves shows an error floor
`
`between 10-2 and 10-3 (circled in blue).
`
`
`
`18
`
`
`
`
`
`44. Low-weight codewords can cause errors in decoding. In a binary
`
`linear code, typically one of the codewords is the “all-zero” codeword, where all
`
`bits are 0. (If the codeword c is “all-zero,” then whatever the parity-check matrix
`
`H is, Hc = 0; hence, the all-zero codeword is always a valid codeword.) The
`
`weight of a codeword is the number of 1s in the codeword. If a codeword has low,
`
`nonzero weight, then it only has a very small number of 1s, and therefore its
`
`distance from the all-zero codeword is small. In a code that results in a lot of low-
`
`weight codewords, there is a significant non-zero chance that the decoder will
`
`erroneously decode an all-zero codeword into a low-weight codeword (or vice
`
`versa), given that the channel could introduce errors in the codeword during
`
`transmission.
`
`
`
`19
`
`
`
`45. For example, if the encoder sends the all-zero codeword, and the code
`
`has a valid codeword of the form 1111100000000000, then if there is a burst of
`
`errors affecting the beginning of the sent codeword, the decoder could receive
`
`1110000000000000, in which case it would incorrectly conclude that the sent
`
`codeword was 1111100000000000 instead of the all-zero codeword (because it
`
`corresponds to only two errors, instead of three, in transmission).
`
`VI. OVERVIEW OF THE ART AND CITED REFERENCES
`
`46. The field of channel coding and error-correction codes has historically
`
`been highly unpredictable, including at the time of the ’032 patent invention.
`
`Much of Petitioner’s cited prior art corroborates this, as is discussed in further
`
`detail below. Since it was often extremely challenging (or seemingly impossible)
`
`to mathematically prove how well a code would perform a priori, discovering
`
`effective codes required laborious experimenting and testing to determine whether
`
`the new code was an improvement. The math and statistics behind functioning
`
`codes, and particular aspects of codes, are sometimes not fully understood.
`
`Experimentation in the field showed that performance of a code was typically very
`
`sensitive to the specific constraints used to define the code. Thus, those of
`
`ordinary skill in the art did not view error-correction codes as an assembly of
`
`interchangeable parts. Rather, those of skill viewed the prospect of incorporating
`
`aspects from one code into a different code as one of experimentation, the results
`
`
`
`20
`
`
`
`of which would be unpredictable—sometimes leading to non-functional codes,
`
`other times leading to surprising improvements in performance. Even when a new
`
`code was discovered to exhibit improvements in performance, it was difficult to
`
`determine what specific feature of the new code resulted in the improvement, or to
`
`mathematically understand the performance and the reason for the improvement.
`
`47. Channel coding began in 1948, when Dr. Claude Shannon formulated
`
`the theoretical maximum rate that a channel may have in the presence of noise. In
`
`the decades since, code theorists have endeavored to develop codes that could
`
`approach this “Shannon limit,” also known as the “Shannon capacity” or “channel
`
`capacity.” For several decades, research focused on a class of codes known as
`
`algebraic block coding, which are quite different than the convolutional codes
`
`found in the cited prior art. Although algebraic block codes could be guaranteed to
`
`be correct, decoding complexity increased significantly as codes became larger.
`
`Also, in almost all cases, they were unable to reach the Shannon limit.
`
`48. Algebraic block codes were based on taking advantage of the structure
`
`of mathematical objects known as finite fields—codes used this mathematical
`
`structure to design relatively efficient encoding and decoding mechanisms. The
`
`emphasis of this approach was to design structured codes with a large guaranteed
`
`minimum distance. That is, if two different inputs are guaranteed to produce
`
`
`
`21
`
`
`
`outputs where the bits differ in at least d locations, then the minimum distance
`
`between the outputs is said to be at least d, and as long as there are fewer than d/2
`
`errors in transmission, the encoder output that is closest