`
`
`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`_____________________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`_____________________________
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`CALIFORNIA INSTITUTE OF TECHNOLOGY,
`Patent Owner.
`
`_____________________________
`
`IPR2017-00210
`Patent No. 7,116,710
`_____________________________
`
`
`DECLARATION OF DR. MICHAEL MITZENMACHER
`
`
`
`CALTECH - EXHIBIT 2004
`Apple Inc. v. California Institute of Technology
`IPR2017-00210
`
`
`
`TABLE OF CONTENTS
`
`I.
`
`II.
`
`ENGAGEMENT........................................................................................... 1
`
`QUALIFICATIONS ..................................................................................... 1
`
`III. COMPENSATION AND PRIOR TESTIMONY .......................................... 6
`
`IV. LEGAL PRINCIPLES .................................................................................. 7
`
`V.
`
`INTRODUCTION TO CHANNEL CODING AND
`TERMINOLOGY ....................................................................................... 10
`
`VI. OVERVIEW OF THE ART AND CITED REFERENCES ........................ 15
`
`A.
`
`Turbo Codes ..................................................................................... 18
`
`1.
`
`Berrou’s turbo codes ............................................................... 18
`
`B.
`
`Gallager/LDPC codes ....................................................................... 20
`
`1.
`
`Gallager Codes are rediscovered ............................................. 20
`
`C.
`
`Cited References ............................................................................... 23
`
`1.
`
`2.
`
`3.
`
`Frey ........................................................................................ 23
`
`Luby et al. ............................................................................... 26
`
`Divsalar .................................................................................. 27
`
`VII. PERSON OF ORDINARY SKILL IN THE ART ....................................... 28
`
`VIII. CLAIM CONSTRUCTION ........................................................................ 29
`
`A.
`
`“rate” ................................................................................................ 29
`
`IX. REBUTTAL TO GROUND 1: CLAIMS ARE NOT ANTICIPATED
`BY FREY ................................................................................................... 30
`
`A.
`
`The Petition and Dr. Davis both fail to identify “obtaining a
`block of data in the signal to be encoded” as recited in Claim 1 ........ 30
`
`
`
`i
`
`
`
`B.
`
`C.
`
`Frey does not disclose partitioning into sub-blocks as recited in
`Claim 1 ............................................................................................. 31
`
`Frey does not disclose a second encoder that has a “rate close to
`one” or a “rate substantially close to one” ......................................... 35
`
`X.
`
`REBUTTAL TO GROUND 2: CLAIMS ARE NOT OBVIOUS
`OVER DIVSALAR IN VIEW OF FREY ................................................... 45
`
`A. Divsalar does not cure Frey’s lack of anticipation of the
`“partitioning” step ............................................................................. 46
`
`B.
`
`A Person of Ordinary Skill in the Art would not be motivated to
`combine Divsalar with Frey .............................................................. 46
`
`1.
`
`2.
`
`3.
`
`4.
`
`Frey’s irregular codes performed poorly ................................. 46
`
`Frey and Divsalar are not “similar codes” ............................... 54
`
`The Petition’s proposed modification to Divsalar is not
`supported by the teachings of Frey .......................................... 55
`
`A person of ordinary skill would not have expected the
`proposed modification to Divsalar to be successful ................. 60
`
`XI. REBUTTAL TO GROUND 3: CLAIMS ARE NOT OBVIOUS
`OVER DIVSALAR, FREY AND LUBY97 ................................................ 67
`
`XII. SECONDARY CONSIDERATIONS OF NON-OBVIOUSNESS .............. 68
`
`A. Nexus between the objective evidence and the claims....................... 69
`
`B.
`
`C.
`
`Long-felt need and failure of others .................................................. 75
`
`Industry Praise .................................................................................. 78
`
`D. Unexpected Results........................................................................... 80
`
`E.
`
`Commercial Success ......................................................................... 81
`
`XIII. CONCLUDING STATEMENTS ................................................................ 84
`
`
`
`
`
`ii
`
`
`
`I, Michael Mitzenmacher, declare as follows:
`
`I.
`
`ENGAGEMENT
`
`1.
`
`I have been retained by counsel for the California Institute of
`
`Technology as an expert witness in the above-captioned proceeding. I have been
`
`asked to provide my opinion about the state of the art of the technology described
`
`in U.S. Patent No. 7,116,710 (the “’710 patent”) and on the patentability of the
`
`claims of this patent. The following is my written testimony on these topics.
`
`II. QUALIFICATIONS
`
`2.
`
`I am currently employed as a Professor of Computer Science at
`
`Harvard University. Specifically, I am the Thomas J. Watson, Sr. Professor of
`
`Computer Science in the School of Engineering and Applied Sciences. I joined the
`
`faculty of Harvard as an Assistant Professor in January 1999. I was promoted to
`
`Associate Professor in 2002 and to Professor in 2005. In 2010, I began a three-
`
`year term as Area Dean, which is essentially equivalent to what other schools call
`
`Department Chair, of Computer Science, and held that position through June 2013.
`
`My work address is 33 Oxford Street, Cambridge, MA 02138. My primary
`
`research interests include design and analysis of algorithms, networks and data
`
`transmission, and information theory.
`
`3.
`
`I received my undergraduate degree in Mathematics and Computer
`
`Science from Harvard College in 1991. I received a Certificate of Advanced Study
`
`
`
`1
`
`
`
`in Mathematics from Cambridge University in 1992. I received a Ph.D. in
`
`Computer Science from the University of California at Berkeley in 1996. From
`
`August 1996 to January 1999, I was employed as a Research Scientist at Digital
`
`Systems Research Center, where my work included projects on algorithms for the
`
`Internet and error-correcting codes.
`
`4.
`
`I am listed as an inventor or co-inventor on 19 issued patents, and am
`
`the co-author of a textbook entitled “Probability and Computing” published by
`
`Cambridge University Press. I am a Fellow of the Association for Computing
`
`Machinery, and currently serve as the Chair of the ACM Special Interest Group on
`
`Algorithms and Computation Theory (SIGACT).
`
`5.
`
`The fields of endeavor at issue in this case are error-correction coding
`
`methods, including repeat-accumulate codes, Turbo codes, and low-density parity-
`
`check codes. I have published over 200 research papers in computer science and
`
`engineering conferences and journals, many of which have explored algorithms
`
`and data structures for error-correction codes, including both mathematical
`
`analysis and applications.
`
`6.
`
`I have authored or co-authored a number of papers specifically in the
`
`area of low-density parity-check codes, including papers that have been presented
`
`as potential prior art in these proceedings. For example, the paper "Improved
`
`
`
`2
`
`
`
`Low-Density Parity Check Codes Using Irregular Graphs", received the 2002 IEEE
`
`Information Theory Society Best Paper Award. Another paper, "A Digital Fountain
`
`Approach to Reliable Distribution of Bulk Data", received the 2009 ACM
`
`SIGCOMM Test of Time Paper Award.
`
`7.
`
`I have received multiple research grants related to codes and coding
`
`theory, including grant NSF CCR-0118701: Low Density Parity Check Codes for
`
`Channels with Memory, and NSF CCF-0634923: Towards a Basic Understanding
`
`of Channels with Synchronization Errors. My current research on data
`
`synchronization is also related to codes and coding theory.
`
`8.
`
`I am a co-inventor on several patents related to error-correcting codes,
`
`including several patents specifically related to low-density parity-check codes.
`
`9.
`
`I regularly serve on program committees for conferences in
`
`networking, algorithms, and communication, including conferences that consider
`
`work on error-correcting codes. For example, I have served on the program
`
`committee multiple times for the SIGCOMM conference, which is described on
`
`the conference homepage as follows: “SIGCOMM is the flagship annual
`
`conference of the ACM Special Interest Group on Data Communication
`
`(SIGCOMM) on the applications, technologies, architectures, and protocols for
`
`
`
`3
`
`
`
`computer communication.” I have also served on the program committee of the
`
`International Symposium on Information Theory.
`
`10. My graduate course entitled “Algorithms at the end of the wire”
`
`covers topics in error-correcting codes and their applications. My graduate course
`
`on randomized algorithms and probabilistic analysis has a short unit on the theory
`
`of error-correction, which is included in the textbook I have co-authored.
`
`11.
`
`I have done substantial work in the area of low-density parity-check
`
`codes, which began in mid 1990s. I started a collaboration with Michael Luby,
`
`Amin Shokrollahi, Daniel Spielman, and Volker Stemann in 1996, while I was
`
`finishing my PhD thesis. We were interested in making data transmission over the
`
`Internet more efficient, by finding ways to deal with lost information. On the
`
`Internet, data is packaged into an ordered sequence of numbered packets, but
`
`packets for various reasons may not reach the destination. When packets do not
`
`arrive, the receiver must ask again for the numbered packets that are missing,
`
`causing delay from the extra round-trips required to ask for additional information.
`
`12. Working as a group to solve this problem we independently developed
`
`variations of what are referred to as Gallager codes, based on random graphs, and
`
`methods to analyze them. See Ex. 1011. Our work provided codes that could
`
`correct erasures. An erasure channel provides a good model for data transmission
`
`
`
`4
`
`
`
`schemes like the Internet, where data is packaged into an ordered sequence of
`
`numbered packets, but packets may not reach the destination. One can send extra
`
`encoded data beyond the original data to the destination. On an erasure channel,
`
`some of the data will be missing, but one knows what data is missing based on
`
`what numbered packets are missing. Using the extra encoded data, if enough
`
`packets arrive at the destination, the original data can all be recovered. In this
`
`setting, the code can only deal with erasures, not errors, in the transmitted data. In
`
`this work, we found that using a layered sequence of irregular random graphs
`
`could provide better results for the codes that we developed than using a layered
`
`sequence of regular graphs. Each layer in the sequence of layered graphs could be
`
`viewed as a low-density parity-check code. This result was a surprise to us – we
`
`had initially focused on regular random graphs – but arose out of our mathematical
`
`analysis.
`
`13. After this work, we were inspired to consider codes for error channels,
`
`where each bit is flipped with a fixed probability. See Ex. 1004 (“Luby”). We
`
`were surprised again to find that codes using irregular graphs, which we called
`
`irregular codes, performed better than their regular counterparts. This work
`
`achieved significant prominence; a later, significantly more complete version of
`
`
`
`5
`
`
`
`the work, published in the IEEE Transactions on Information Theory, received the
`
`2002 IEEE Information Theory Society Award.
`
`14. With another group of researchers (initially Michael Luby, John
`
`Byers, and Ashutosh Rege), we researched using our erasure code technology for
`
`practical data transmission in distributed, networked systems. Our initial paper on
`
`this theme appeared in the 1998 SIGCOMM conference, which was then and
`
`remains a premier venue for networking research. The paper received the 2009
`
`ACM SIGCOMM Test of Time Paper Award, an award given looking backward
`
`for a publication “that is deemed to be an outstanding paper whose contents are
`
`still a vibrant and useful contribution today.”
`
`15. As part of this work we filed several patents which eventually issued,
`
`all of which are listed on my CV. Sometime around the 1999-2000 timeframe
`
`Michael Luby started a company seeking to commercialize this type of coding
`
`technology (including further advances), and I consulted for that company for a
`
`number of years.
`
`16. My Curriculum Vitae is submitted herewith as Ex. 2005.
`
`III. COMPENSATION AND PRIOR TESTIMONY
`
`17.
`
`I am being compensated at a rate of $875 per hour for my study and
`
`other work in this matter. I am also being reimbursed for reasonable and
`
`customary expenses associated with my work in this investigation. My
`
`
`
`6
`
`
`
`compensation is not contingent on the outcome of this matter or the specifics of my
`
`testimony.
`
`IV. LEGAL PRINCIPLES
`
`18.
`
`I have been advised that a claimed invention is not patentable under
`
`35 U.S.C. § 103 if it is obvious. A patent claim is unpatentable if the claimed
`
`invention would have been obvious to a person of ordinary skill in the field at the
`
`time the claimed invention was made. This means that even if all of the
`
`requirements of the claim cannot be found in a single prior art reference that would
`
`anticipate the claim, a person of ordinary skill in the relevant field who knew about
`
`all this prior art would have come up with the claimed invention.
`
`19.
`
`I have further been advised that the ultimate conclusion of whether a
`
`claim is obvious should be based upon several factual determinations. That is, a
`
`determination of obviousness requires inquiries into: (1) the level of ordinary skill
`
`in the field; (2) the scope and content of the prior art; (3) what difference, if any,
`
`existed between the claimed invention and the prior art; and (4) any secondary
`
`evidence bearing on obviousness.
`
`20.
`
`I have been advised that, in determining the level of ordinary skill in
`
`the field that someone would have had at the time the claimed invention was made,
`
`I should consider: (1) the levels of education and experience of persons working in
`
`
`
`7
`
`
`
`the field; (2) the types of problems encountered in the field; and (3) the
`
`sophistication of the technology.
`
`21.
`
`I have also been advised that, in determining the scope and content of
`
`the prior art, in order to be considered as prior art, a reference must be reasonably
`
`related to the claimed invention of the patent. A reference is reasonably related if it
`
`is in the same field as the claimed invention or is from another field to which a
`
`person of ordinary skill in the field would look to solve a known problem.
`
`22.
`
`I have been advised that a patent claim composed of several elements
`
`is not proved obvious merely by demonstrating that each of its elements was
`
`independently known in the prior art. In evaluating whether such a claim would
`
`have been obvious, I may consider whether there is a reason that would have
`
`prompted a person of ordinary skill in the field to combine the elements or
`
`concepts from the prior art in the same way as in the claimed invention.
`
`23.
`
`I have been further advised that there is no single way to define the
`
`line between true inventiveness on the one hand (which is patentable) and the
`
`application of common sense and ordinary skill to solve a problem on the other
`
`hand (which is not patentable). For example, market forces or other design
`
`incentives may be what produced a change, rather than true inventiveness. I may
`
`consider whether the change was merely the predictable result of using prior art
`
`
`
`8
`
`
`
`elements according to their known functions, or whether it was the result of true
`
`inventiveness. I may also consider whether there is some teaching or suggestion in
`
`the prior art to make the modification or combination of elements claimed in the
`
`patent. I may consider whether the innovation applies a known technique that had
`
`been used to improve a similar device or method in a similar way. I may also
`
`consider whether the claimed invention would have been obvious to try, meaning
`
`that the claimed innovation was one of a relatively small number of possible
`
`approaches to the problem with a reasonable expectation of success by those
`
`skilled in the art.
`
`24.
`
`I have also been advised, however, that I must be careful not to
`
`determine obviousness using the benefit of hindsight; many true inventions might
`
`seem obvious after the fact. I should put myself in the position of a person of
`
`ordinary skill in the field at the time the claimed invention was made and I should
`
`not consider what is known today or what is learned from the teaching of the
`
`patent.
`
`25. Finally, I have been advised that any obviousness rationale for
`
`modifying or combining prior art must include a showing that a person of ordinary
`
`skill would have had a reasonable expectation of success.
`
`
`
`9
`
`
`
`26. With regard to secondary considerations of nonobviousness, I have
`
`been advised that any objective evidence may be considered as an indication that
`
`the claimed invention would not have been obvious at the time the claimed
`
`invention was made. I understand that the purpose of secondary considerations is
`
`to prevent a hindsight analysis of the obviousness of the claims.
`
`27.
`
`I have been advised that there are several factors that may be
`
`considered as a secondary consideration. These factors include the commercial
`
`success of the invention, industry praise for the invention, skepticism of the
`
`invention, licensing of the invention, copying of the invention, any long-felt need
`
`that the invention solved, failure of others, and unexpected results of the invention.
`
`28.
`
`I have been further advised that in order for evidence of secondary
`
`considerations to be significant, there must be a sufficient nexus between the
`
`claimed invention and the evidence of secondary considerations. I understand that
`
`this nexus serves to provide a link between the merits of the claimed invention and
`
`the evidence of secondary considerations provided.
`
`V.
`
`INTRODUCTION TO CHANNEL CODING AND TERMINOLOGY
`
`29. The ’710 patent and the prior art relate to the field of channel coding
`
`through the use of error-correction codes. Channel coding is incredibly important
`
`for any situation where information data (typically a series of bits) is transmitted
`
`over a potentially noisy channel, because in many use cases, the data must be
`
`
`
`10
`
`
`
`received without any corruption. Since the presence of noise in the transmission
`
`channel can cause the value of any bits to be flipped, channel coding is the process
`
`of adding redundancy to the data so that such errors can be identified and
`
`corrected. This is done by using an encoder on the information data to generate
`
`parity bits. The output of the encoder is known as a codeword. A systematic
`
`encoder includes both the information bits and parity bits within its output
`
`codeword, whereas a non-systematic encoder includes only the parity bits. An
`
`encoder has a rate that is the ratio of the number of input bits to the number of bits
`
`output by the encoder. Generally, higher rate codes are preferable to lower rates
`
`because it means less data needs to be transmitted and decoded.
`
`30. One way to achieve a higher rate is a technique referred to as
`
`puncturing. Puncturing generally refers to removing some of the parity bits from
`
`the encoding, so that there are fewer bits in the output. Since the rate is the ratio of
`
`the number of input bits to the number of bits output by the encoder, removing
`
`some of the parity bits increases the rate. Puncturing can be utilized if a pre-
`
`defined pattern of bits is punctured at the encoder and this is known at the decoder.
`
`However, puncturing is not an arbitrary practice and can raise challenging
`
`questions, such as which bits to remove to obtain suitable performance.
`
`
`
`11
`
`
`
`31. The encoded output is transmitted over a channel, which represents
`
`the path from the sender to the receiver. In a typical example, the sender and
`
`receiver might be cell phones, and the channel represents the full effects of the
`
`transmission of the signal, including through cell towers and any intermediate
`
`transmission mechanism. In a more unusual case, if I use a program to store a file
`
`on a computer in encoded form, and use the program to read it back later, the
`
`program is both the sender and the receiver, and the writing to and reading from
`
`memory constitutes the channel.
`
`32. On the receiving end, the received information and parity bits, which
`
`may have been altered as they go through the channel, are used by a decoder to
`
`identify and correct any errors that may have occurred during transmission.
`
`Sometimes, errors cannot be corrected—typically, the ratio of uncorrected bits to
`
`the total number of information bits is known as a bit error rate or BER. A
`
`standard measure of performance is the average BER for a given channel at a given
`
`signal-to-noise ratio, where the average may be found mathematically or measured
`
`by experiment. The variance in the BER over multiple experiments can also be
`
`important, with a high variance generally being worse, as it can mean there is a
`
`substantial risk of poor performance.
`
`
`
`12
`
`
`
`33. Those of ordinary skill at the time would have been familiar with the
`
`phenomenon of an “error floor” encountered in error correcting codes. Ex.2030.
`
`Typically, error correction codes achieve better performance as conditions
`
`improve—that is, when there is less noise. Some codes, however, reach a limit
`
`where performance does not improve as quickly as one would expect or desire.
`
`When plotting the noise (SNR) against the bit error rate (BER), the curve flattens
`
`out horizontally. The flattening effect or flatter region of the curve is referred to as
`
`the error floor. Error floors are viewed as undesirable and are typically caused by
`
`low-weight codewords. Those in the art at the time generally viewed a BER error
`
`floor of about 10-5 as a threshold target for acceptable performance of an error
`
`correcting code. See, e.g., Ex. 1008, p. 1068. For example, in the below annotated
`
`graph from Ex. 1013, the middle curve (Frey’s code) shows an error floor of about
`
`10-4 (circled in blue).
`
`
`
`13
`
`
`
`
`
`34. Low-weight codewords can cause errors in decoding. In a binary
`
`linear code, typically one of the codewords is the “all-zero” codeword, where all
`
`bits are 0. (If the codeword c is “all-zero,” then whatever the parity-check matrix
`
`H is, Hc = 0; hence, the all-zero codeword is always a valid codeword.) The
`
`weight of a codeword is the number of 1s in the codeword. If a codeword has low,
`
`nonzero weight, then it only has a very small number of 1s, and therefore its
`
`distance from the all-zero codeword is small. In a code that results in a lot of low-
`
`weight codewords, there is a significant non-zero chance that the decoder will
`
`erroneously decode an all-zero codeword into a low-weight codeword (or vice
`
`versa), given that the channel could introduce errors in the codeword during
`
`transmission.
`
`
`
`14
`
`
`
`35. For example, if the encoder sends the all-zero codeword, and the code
`
`has a valid codeword of the form 1111100000000000, then if there is a burst of
`
`errors affecting the beginning of the sent codeword, the decoder could receive
`
`1110000000000000, in which case it would incorrectly conclude that the sent
`
`codeword was 1111100000000000 instead of the all-zero codeword (because it
`
`corresponds to only two errors, instead of three, in transmission).
`
`36. One thing to keep in mind as one peruses the art is to be cautious as to
`
`which matrix the art is discussing. Most low-density parity-check (LDPC) codes
`
`are concerned with the parity-check matrix, and making it low-density, and do not
`
`necessarily discuss how the code is generated. On the other hand, turbo codes tend
`
`to focus on the how the code is generated, and not the resulting parity-check
`
`constraints.
`
`VI. OVERVIEW OF THE ART AND CITED REFERENCES
`
`37. The field of channel coding and error-correction codes has historically
`
`been highly unpredictable, including at the time of the ’710 patent invention.
`
`Much of Petitioner’s cited prior art corroborates this, as is discussed in further
`
`detail below. Since it was often extremely challenging (or seemingly impossible)
`
`to mathematically prove how well a code would perform a priori, discovering
`
`effective codes required laborious experimenting and testing to determine whether
`
`the new code was an improvement. The math and statistics behind functioning
`
`
`
`15
`
`
`
`codes, and particular aspects of codes, are sometimes not fully understood.
`
`Experimentation in the field showed that performance of a code was typically very
`
`sensitive to the specific constraints used to define the code. Thus, those of
`
`ordinary skill in the art did not view error-correction codes as an assembly of
`
`interchangeable parts. Rather, those of skill viewed the prospect of incorporating
`
`aspects from one code into a different code as one of experimentation, the results
`
`of which would be unpredictable—sometimes leading to non-functional codes,
`
`other times leading to surprising improvements in performance. Even when a new
`
`code was discovered to exhibit improvements in performance, it was difficult to
`
`determine what specific feature of the new code resulted in the improvement, or to
`
`mathematically understand the performance and the reason for the improvement.
`
`38. Channel coding began in 1948, when Dr. Claude Shannon formulated
`
`the theoretical maximum rate that a channel may have in the presence of noise. In
`
`the decades since, code theorists have endeavored to develop codes that could
`
`approach this “Shannon limit,” also known as the “Shannon capacity” or “channel
`
`capacity.” For several decades, research focused on a class of codes known as
`
`algebraic block coding, which are quite different than the convolutional codes
`
`found in the cited prior art. Although algebraic block codes could be guaranteed to
`
`
`
`16
`
`
`
`be correct, decoding complexity increased significantly as codes became larger.
`
`Also, in almost all cases, they were unable to reach the Shannon limit.
`
`39. Algebraic block codes were based on taking advantage of the structure
`
`of mathematical objects known as finite fields—codes used this mathematical
`
`structure to design relatively efficient encoding and decoding mechanisms. The
`
`emphasis of this approach was to design structured codes with a large guaranteed
`
`minimum distance. That is, if two different inputs are guaranteed to produce
`
`outputs where the bits differ in at least d locations, then the minimum distance
`
`between the outputs is said to be at least d, and as long as there are fewer than d/2
`
`errors in transmission, the encoder output that is closest in distance to what is
`
`received through the channel is guaranteed to be correct.
`
`40. More recently developed types of codes, such as the turbo codes and
`
`low-density parity-check codes found in the cited prior art, were developed by
`
`utilizing ideas from probability instead of finite field algebra. In particular, they
`
`took the counterintuitive step of eschewing the regimented structure of algebraic
`
`codes, and instead embraced randomness in what were at the time unusual and
`
`surprising ways.
`
`
`
`17
`
`
`
`A. Turbo Codes
`
`1.
`
`Berrou’s turbo codes
`
`41.
`
`In 1993, at the IEEE Geneva Conference, a team led by Claude
`
`Berrou introduced a completely new class of codes, which they called “turbo
`
`codes.” In their presentation, they claimed a code that could achieve near-Shannon
`
`limit performance with much faster decoding than the previous algebraic codes.
`
`The audience’s initial reaction to their claims was one of deep skepticism—at least
`
`some comments were to the effect of “It can’t be true; they must have made a 3 dB
`
`error.”1 The skepticism was driven by the fact that Berrou and his team were
`
`engineers rather than mathematical theorists, and could not provide a formal
`
`explanation for why their code worked so well, having discovered it through
`
`lengthy experimentation.2 But it was not until Berrou’s results were confirmed by
`
`later tests that the research and use of turbo codes became widespread. For
`
`
`
`
`1 Costello, et al., “Channel Coding: The Road to Channel Capacity,” June 2007
`
`(https://pdfs.semanticscholar.org/d937/4a5a8bc486907ecd36362ffbd7dc5dedfa13.
`
`pdf)
`
`2 Hardesty, “Explained: Gallager Codes,” January 21, 2010
`
`(http://news.mit.edu/2010/gallager-codes-0121)
`
`
`
`18
`
`
`
`example, turbo codes were used in the UMTS 3G/4G Mobile standards and the
`
`IEEE 802.16 WiMax standard. For this work, Berrou was able to obtain a patent on
`
`his code.
`
`42. Berrou’s turbo code involved generating two sets of parity bits using
`
`two identical convolutional sub-coders, where one of the sub-coders received as
`
`input information bits that were randomly permuted. Turbo codes are able to
`
`decode very quickly because of the use of “iterative decoding,” where successive
`
`guesses are fed back into the decoder to create more refined guesses. An analogy
`
`would be like a crossword puzzle, where one iterates between row clues and
`
`column clues. That is, one starts with the row clues, and uses them to guess
`
`possible letters for each square; then one turns to the column clues, and uses them
`
`to improve their guesses for each square, using the information from the guesses
`
`from the column clues. One then repeatedly returns to the row clues, and then the
`
`column clues, at each step improving the guesses until finding a consistent solution
`
`for the whole puzzle. Berrou’s turbo code works the same way, repeatedly
`
`iterating between decoding the two convolutional sub-codes, using better
`
`information from the last iteration on each iteration, until it reaches a consistent
`
`answer for the original input.
`
`
`
`19
`
`
`
`B. Gallager/LDPC codes
`
`1. Gallager Codes are rediscovered
`
`43.
`
`It was not until after Berrou shared his revolutionary turbo codes that
`
`there was renewed interest in a 1960 paper by Dr. Robert Gallager, which
`
`described a code that also had fast decoding and could reach a similar level of
`
`performance as turbo codes. These codes involved using randomly-generated low-
`
`density parity-check (“LDPC”) matrices—in other words, the parity-check matrix
`
`for the code had relatively few 1’s in each column. Such codes are now known as
`
`either Gallager codes or LDPC codes.
`
`44. Despite being first described around 1960, Gallager’s codes remained
`
`unnoticed for nearly 40 years because not even Dr. Gallager himself knew whether
`
`or not these codes were good codes until the computer hardware capable of testing
`
`such codes was available in the 1990s. As Dr. Gallager stated, “I had a little bit of
`
`an inkling [they could be good], but I also had a suspicion that they well might not
`
`be. And I spent a long time trying to resolve whether they were or not.”3 It took
`
`
`
`
`3 Hardesty, “Explained: Gallager Codes,” January 21, 2010
`
`(http://news.mit.edu/2010/gallager-codes-0121).
`
`
`
`20
`
`
`
`the empirical testing by subsequent researchers to show the potential for near-
`
`Shannon-limit performance of Gallager codes. E.g., Ex. 1004.
`
`45. Gallager/LDPC codes 4 are quite different than turbo codes. At a
`
`structural level, turbo codes are formed from the combination of two (or more)
`
`constituent convolutional codes, while low-density parity-check codes are
`
`constructed from random bipartite graphs or, in Gallager’s original work, a fixed
`
`graph construction. Because of the convolutional code structure, their decoding
`
`algorithms are different compared to turbo codes. Turbo codes utilize the BCJR
`
`algorithm that is generally used for convolutional codes, and low-density parity-
`
`
`
`
`4 At the time of the patent (late 1990s), “LDPC codes” were understood to mean
`
`Gallager codes, although Gallager codes were only a subset of all possible codes
`
`tha