IPR2022-00602, No. 1029 Exhibit - EX1029 Bricolage Data Compression Morse Code, httpsperlplovercomHuffmanhuffmanhtml, 1998 (P.T.A.B. Feb. 23, 2022)

7/19/2021
`
`Bricolage: Data Compression
`
`Bricolage: Data Compression
`
`© Copyright 1998 The Perl Journal. Reprinted with permission.
`
`Bricolage: Data Compression
`Morse Code
`Ambiguous codes
`Huffman Coding
`The Code
`The Rub
`Another Rub
`Other Methods
`Other Directions
`Bibliography
`Notes
`
`Bricolage: Data Compression
`You are probably familiar with Unix compress, gzip, or bzip2 utilities, or the DOS
`pkzip utility. These programs all make ﬁles smaller; we say that such ﬁles are
`compressed. Compressed ﬁles take less disk space and less network bandwidth. The
`downside of compressed ﬁles is that it they are full of unreadable gibberish; you usually
`have to run another program to uncompress them before you can use them again. In this article we'll see how ﬁle
`compression works, and I'll show a simple module that includes functions for compressing and uncompressing
`ﬁles.
`
`Morse Code
`The idea behind data compression is very simple. In a typical ﬁle, say a text ﬁle, every character takes up the
`same amount of space: 8 bits. The letter e is represented by the 8 bits 01100101; the letter Z is represented by the
`8 bits 010101010. But in a text ﬁle, e occurs much more frequently than Z---maybe about 75 times as frequently.
`If you could give the common symbols short codes and the uncommon symbols long codes, you'd have a net
`gain.
`This isn't a new idea. It was exploited by Samuel Morse in the Morse Code, a very early digital data transmission
`protocol. Morse Code was designed to send text ﬁles over telegraph wires. A telegraph is very simple; it has a
`switch at one end, and when you close the switch, an electric current travels through a wire to the other end,
`where there is a relay that makes a click. By tapping the switch at one end, you make the relay at the other end
`click. Letters and digits are encoded as sequences of short and long clicks. A short click is called a dot, and a
`long click is called a dash.
`The two most common letters in English text are E and T; in Morse code these are represented by a single dot and
`a single dash, respectively. The codes for I, A, N, and M, all common letters, are **, *-, -*, and --. In contrast, the
`codes for the uncommon letters Q and Z are --*- and --**.
`In computer ﬁle compression, we do a similar thing. We analyze the contents of the data, and ﬁgure out which
`symbols are frequent and which are infrequent. Then we assign short codes to the frequent symbols and long
`codes to the infrequent symbols. We write out the coded version of the ﬁle, and that usually makes it smaller.
`
`https://perl.plover.com/Huffman/huffman.html
`
`1/8
`
`IPR2022-00602
`Apple EX1029 Page 1
`
`

`7/19/2021
`
`Bricolage: Data Compression
`
`Ambiguous codes
`There's a problem with Morse Code: You need a third symbol, typically a long pause, to separate the dots and
`dashes that make up one letter from the dats and dashes that make up the next. Otherwise, if you get *-, you
`don't know whether it's the single letter A or the two letters ET---or it might be the ﬁrst bit of the letter R or L. In a
`long message, all the dots and dashes run together and you get a big mess that can't be turned back into text. In
`Morse code, it can be hard to tell `Eugenia' from `Soﬁa': Without the interletter pauses, they're both:
` ***---**-****-
`Those interletter spaces take up a lot of transmission time, and it would be nice if you didn't need them. It turns
`out that if you arrange the code properly, you don't. The ambiguity problem with Morse Code occurs because
`some codes are preﬁxes of others: There are some letters where the code for the ﬁrst part of one letter is just the
`same as the code for the other letter, but with something extra tacked on. When you see the shorter code, you
`don't know if it's complete or if it should be combined with the following symbols. It turns out that if no code is a
`preﬁx of any other, then the code is unambiguous.
`Suppose for simplicity that we only needed to send the letters A, C, E, and S over the telegraph. Instead of Morse
`code, we could use the following code table:
` A -
` C **
` E *-*
` S *--
`Suppose we receive the message -*****-**--*--*-**--. What was the message? Well, the ﬁrst symbol is -, so
`the ﬁrst letter in the message must be A, because that's the only letter with a code that starts with a -. Then the
`next two symbols are **, so the second letter must be a C, because all the other codes that start with * have a -
`after the * instead of another *. Similar reasoning shows that the third letter is also C. After that, the code is *-*;
`it must be an E. We continue through the message, reading off one letter at a time, and eventually we get the
`whole thing this way.
`It's so simple that a computer can decode it, if the computer is equipped with a decision tree like this one:
`
`Start at Start, and look at the symbols in the message one by one. At each stage, follow the appropriate labelled
`branch to the next node. If there's a letter at that node, output the letter and go back to the start node. If there's no
`letter at the node, look at the next symbol in the input and continue down the tree towards the leaves.
`
`https://perl.plover.com/Huffman/huffman.html
`
`2/8
`
`IPR2022-00602
`Apple EX1029 Page 2
`
`

`7/19/2021
`
`Bricolage: Data Compression
`
`Huffman Coding
`Obviously, it's important to choose the right code. If Morse had made the * code for Z and **-* the code for E, he
`wouldn't be famous.
`Choosing the right code can be tricky. Consider the example of the previous section, where we only had to code
`messages that contain A, C, E, and S. The code I showed is good when we expect our messages to contain more
`A's than E's or S's. If S were very common, we clearly could have done better; less clearly, if all four letters were
`about equally common, then we could still have done better, by assigning each letter a code of the same length:
`
`Suppose, for example, our message happened to contain 200 of each of the four letters. Then the ﬁrst code would
`use 1800 symbols, and the second code would use only 1600.
`In 1952, David Huffman discovered a method for producing the optimal unambigous code. For a given set of
`symbols, if you know the probability with which each symbol appears in the input, you can use Huffman's
`method to construct an unambiguous code that encodes the typical message with fewer *'s and -'s than any other
`code.
`The method is very simple and ingenious. For concreteness, let's suppose that the (rather silly) message is
` THE_THIRSTIEST_SISTERS_TEETH_RESIST_THIS_STRESS
`(I used _ instead of space so that it'll be easier to see.)
`Start with the table of relative probabilities; you can get this by counting the number of occurrences of every
`symbol in the message. This is called histogramming. (A histogram is a bar chart; histos is Greek for a beam or a
`mast.) Here's the histogram for the symbols in our sample message:
`S 11
`T 10
`E 7
`
`https://perl.plover.com/Huffman/huffman.html
`
`3/8
`
`IPR2022-00602
`Apple EX1029 Page 3
`
`

`7/19/2021
`
`Bricolage: Data Compression
`
`_ 6
`I 5
`H 4
`R 4
`Now take the two least common entries in the table, that's H and R. They'll get the longest codes, because they're
`least common. We'll simplify this by pretending that H and R are the same, and lumping them together into one
`category, which we'll call HR. Then we'll assign codes to all the other letters and to HR. When we're done, we still
`have to distinguish between H and R. Now, HR has some code. We don't know what it is yet, so let's symbolize it
`with <HR>. We don't really need to use <HR> in our message, because the is no such thing as the letter HR, so we'll
`split it in two, and let the code for H be <HR>* and the code for R be <HR>-. As a result of this, the codes for H and
`R will be longer than the codes for the other letters, but if that has to appen, it's better for it to happen for H and R,
`because they are the least common letters in the message.
`So we will lump H and R together and pretend temporarily that they are only one letter. Our table then looks like
`this:
` S 11 R = <HR>-
` T 10 H = <HR>*
` HR 8
` E 7
` _ 6
` I 5
`Now we repeat the process. The two least common symbols are I and _. We'll lump them together into a new
``symbol' called I_, we'll assign ﬁnish assigning the codes to S, T, HR, E, and I_. When we're done, I will get the
`code <I_>* and _ will get the code <I_>-.
` S 11 R = <HR>-
` I_ 11 H = <HR>*
` T 10 _ = <I_>-
` HR 8 I = <I_>*
` E 7
`Then we lump together HR and E:
` HRE 15 R = <HR>-
` S 11 H = <HR>*
` I_ 11 _ = <I_>-
` T 10 I = <I_>*
` HR = <HRE>-
` E = <HRE>*
`Then we lump together T and I_:
` I_T 21 R = <HR>-
` HRE 15 H = <HR>*
` S 11 _ = <I_>-
` I = <I_>*
` HR = <HRE>-
` E = <HRE>*
` I_ = <I_T>-
` T = <I_T>*
`Then we lump together S and HRE:
` SHRE 25 R = <HR>-
` I_T 21 H = <HR>*
` I = <I_>-
`https://perl.plover.com/Huffman/huffman.html
`
`4/8
`
`IPR2022-00602
`Apple EX1029 Page 4
`
`

`Bricolage: Data Compression
`
`7/19/2021
` _ = <I_>*
` HR = <HRE>-
` E = <HRE>*
` I_ = <I_T>-
` T = <I_T>*
` S = <SHRE>-
` HRE = <SHRE>*
`Now we only have two `symbols' left. There's only one way to assign a code to two symbols; one of them gets *
`and the other gets -. It doesn't matter which gets which, so let's say that SHRE gets * and I_T gets -.
`Now the codes fall out of the table we've built up in the right-hand column:
`
`SHRE = *
`
`
`S = *-
`
` HRE = **
`
` HR = **-
`
` R = **--
`
` H = **-*
`
` E = ***
`
`I_T = -
`
`
`I_ = --
`
`
`
`_ = ---
`
`
`
`I = --*
`
`
`T = -*
`We throw away the codes for the ﬁctitious compound symbols, and we're left with the real code:
` S = *-
` T = -*
` E = ***
` _ = ---
` I = --*
` H = **-*
` R = **--
`As promised, the code is unambiguous, because no code is a preﬁx of any other code. Our original message
`encodes like this:
` -***-****----***-*--***--*--*--*****--*---
` *---**--******--*-----*******-***-*---
` **--****---**--*----***-*--**----
` *--***--****-*-
`For a total of 128 *'s and -'s, an average of 2.72 symbols per character, and a 9.3% improvement over the 141
`symbols we would have had to use if we had given every letter a three-symbol code.
`
`The Code
`For this article, I implemented a demonstration module that compresses ﬁles. You can retrieve it from the perl
`Journal web site at http://www.tpj.com/ or from my Perl Paraphernalia web site at
`http://www.plover.com/~mjd/perl/Huffman/. The program htest.pl compresses an input and saves it to the ﬁle
`/tmp/htest.out; then it opens this ﬁle, reads in the compressed data, decompresses it, and prints the result to
`standard output.
`Most of the real work is in the Huffman module that htest uses. Here are the important functions that htest
`calls:
` my $hist = Huffman::histogram(\@symbols);
`https://perl.plover.com/Huffman/huffman.html
`
`5/8
`
`IPR2022-00602
`Apple EX1029 Page 5
`
`

`7/19/2021
`Bricolage: Data Compression
`This generates the histogram of the input text. The histogram is the tally of the number of occurrences of each
`symbol. The argument to histogram is an array of symbols, passed by reference for efﬁciency, because it is
`likely to be very large. It might have been simpler to pass the input as a single string, but this way we can assign
`codes per-word instead of per-character if we want to, just by splitting the input into an array in a different way.
` my $codes = Huffman::tabulate($hist);
`The tabulate function generates the code table from thie histogram using Huffman's method. (This has the side
`effect of destroying the histogram.) The code table is just a hash that maps symbols to codes; the codes
`themselves are strings like 0010011.
` Huffman::save_code_table(*FILE, $codes);
`This writes the code table to the ﬁle.
` Huffman::encode(*FILE, $codes, \@symbols);
`This encodes the input text and writes the result to the ﬁle.
` my $codes = Huffman::load_code_table(*FILE);
` my $coded_data = <FILE>;
` my $text = Huffman::decode($codes, $coded_data);
`This is the decompression process. The return value from load_code_table is a code table in a rather interesing
`form. It's a decision tree, just like the ones we saw earlier in the article. Each node of the tree is either a single
`string (which is the symbol that the decoder should output) or it's an array with two elements; the 0th element is
`the part of the tree on the branch labeled 0, and the 1st element is the part of the tree on the branch labeled 1. For
`example, taking * as equivalent to 0 and - as equivalent to 1, the decision tree from earlier in the article would be
`represented like this:
` [[C,
` [E,
` S
` ]
` ],
` A
` ]
`
`The Rub
`We can compress the ﬁle, but unless we include the code table in the compressed ﬁle, we won't be able to
`uncompress it again. Files that can't be uncompressed are not very useful. (Sometimes you don't need to be able
`to decompress the ﬁle; in these cases the Perl unlink function implements a much simpler and more efﬁcient
`form of compression.) But sometimes the code table is bigger than the savings that we got from compressing the
`ﬁle, especially if the original ﬁle is small. On a sample ﬁle of comp.lang.perl.misc articles, the compressor
`reduced the ﬁle size by 32%, from 42733 bytes to 29114. But when I ran it on its own source code, the ﬁle size
`increased from 987 bytes to 1321. The compressed data was only 631 bytes long, but the code table and other
`overhead took up 690 bytes.
`
`Another Rub
`
`https://perl.plover.com/Huffman/huffman.html
`
`6/8
`
`IPR2022-00602
`Apple EX1029 Page 6
`
`

`7/19/2021
`Bricolage: Data Compression
`Huffman coding is optimal in the sense of compressing a given set of symbols into the smallest space possible.
`But if you readjust your idea of a `symbol', you can get must better compression.
`Suppose you're compressing English text. If you histogram the characters, and use Huffman's method, you'll get
`the optimal way to encode the characters. Because each character has its own code, your code would also serve
`to code arbitrary random gibberish. It should be clear that this expressiveness has a price. For English text, we
`don't need so much expressiveness; it's clearly more efﬁcient to assign a code to each word instead of to each
`character. In doing so, we lost the ability to encode random gibberish, but our compressed data becomes much
`smaller. Suppose there are about 2**17 words in English; if we assign each one a different 17-bit code, we can
`then encode our original seven-word message:
` THE THIRSTIEST SISTERS TEETH RESIST THIS STRESS
`into 7*17 = 119 bits, alreasy an improvement on the 128 we used before. And if we use Huffman's method to
`assign long codes to rare words and short codes to common words, we can do even better.
`Using this method, I compressed the comp.lang.perl.misc articles by 63%. Unfortunately, the code table blew
`up as a result, and took up 43739 bytes, which was larger than the original uncompressed data.
`
`Other Methods
`Most modern data compression doesn't use Huffman coding directly. A better method was proposed in 1977 and
`1978 by Jakob Ziv and Abraham Lempel. The Lempel-Ziv methods scan the input data, and when they ﬁnd a
`substring that occurs twice, the replace the second occurrence with a reference back to the ﬁrst. The references
`can then be Huffman-compressed.
`These methods have several advantages over the basic scheme I implemented. One important beneﬁt of Lempel-
`Ziv compression methods is that they don't have to read and analyze the entire input in advance; they can
`construct a code table as they go along, basing it on only the portion of the input seen so far. This is important for
`practical applications. The data compression in your modem wouldn't be very useful if the modem had to gather
`and analyze a week's worth of trafﬁc before it could send or receive anything.
`Another beneﬁt of this method is that because the code table is imlpicit in the input seen so far, the code table
`doesn't need to be recorded explicitly. The decoding process can infer the code table from the coded data itself.
`This avoids the embarrassing inﬂamations that we saw where the code table took up more space than was
`actually saved by the compression process.
`One extra payoff of building the code table on the ﬂy is that if the algorithm notices that the code table it's using
`isn't performing well, it can throw it away and start over in the middle. If the data is in several pieces that have
`very different characters, say a graphic image embedded in a text ﬁle, this method will be a win over straight
`Huffman coding because it'll be able to apply an appropriate code table to each part of the data. LZW, the
`compression algorithm used by the DOS lha program, doesn't do this, but the variation of it employed by the
`Unix compress program does. The algorithm used by the GNU project's gzip and zlib is different but also
`periodically throws away the code tables and starts over.
`The best thing about Lempel-Ziv and related methods is that they don't need to decide in advance how to break
`up the input into symbols. LZW, for example, puts any string that it hasn't seen before into the code table, under
`the assumption that if it appeared once, it'll probably appear again. As the input comes in, longer and longer
`substrings of the input go into the code table; long substrings go in only when their shorter preﬁxes have
`appeared multiple times. This means that if the ﬁle is best broken up into words, the algorithm will eventually
`detect that; if it is best broken up into single characters, the algorithm will detect that too.
`
`https://perl.plover.com/Huffman/huffman.html
`
`7/8
`
`IPR2022-00602
`Apple EX1029 Page 7
`
`

`7/19/2021
`
`Bricolage: Data Compression
`
`Other Directions
`If you want an easy but possibly amusing project, try altering the code table functions in the module so that the
`code tables are smaller. At present, the module does not assume that the coding is done per-character, so you
`shouldn't break that.
`Actually, though, assigning the codes per-word seems to blow up the code table more than it saves. I haven't
`found a case yet where it's worthwhile. I'd be interested in seeing more analysis of this.
`Finally, the decision-tree data structure in the decoder is probably over-clever. It would be simpler to represent
`the same information with a state table, this way:
` 0 1
` --------------------
` 0 1 A
` 1 C 2
` 2 E S
`Each internal node in the decision tree is assigned a state, which is a number; the root node gets the number 0. To
`decode, the decoder keeps track of the state number, which says what internal node it's currently at. Each time it
`reads a bit, it looks in the table to decide where to go next. If it sees a symbol, that means it erached a leaf, and
`should output that symbol and start over at state 0; if it sees a number, that means it's now at a different internal
`node and should read more bits. I'm really not sure whether this would be more efﬁcient than the way I did do it,
`but it would probably be easier to understand.
`
`Bibliography
`Symbols, Signals, and Noise, J.R. Pierce, pp. 94--97. Harper, New York, 1961.
`The comp.compression FAQ, is an excellent starting source of information, especially part 2. It is available from
`ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/.
`
`Notes
`My column ﬁnally has a name! Bricolage is a hot word these days in the computer learning business; it's French
`for tinkering or pottering. Thanks to Deven Ullman, who suggested it, and to the other people who suggested
`other names.
`
`Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia| File
`Compression Page
`mjd-tpj-huffman@plover.com
`
`https://perl.plover.com/Huffman/huffman.html
`
`8/8
`
`IPR2022-00602
`Apple EX1029 Page 8
`
`

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases