throbber
Featuring fast, efficient data
`compression techniques in C
`
`Mark Nelson
`
`M_§£Fm
`El;
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 001
`
`

`
`M&T Books
`115 West 18th Street
`New York, NY 10011
`
`Slflflfllllllll
`
`© 1992 by M&T Publishing, Inc.
`
`Printed in the United States of America
`
`All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means,
`electronic or mechanical, including photocopying, recording, or by any information storage and
`retrieval sytem, without prior written pemission from the Publisher. Contact the Publisher for
`information on foreign rights.
`
`Limits of Liability and Disclaimer of Warranty
`The Author and Publisher of this book have used their best efforts in preparing the book and the
`programs contained in it. These efforts include the development, research, and testing of the theories
`and programs to determine their effectiveness.
`
`The Author and Publisher make no warranty of any kind, expressed or implied, with regard to these
`programs or the documentation contained in this book. The Author and Publisher shall not be liable in
`any event for incidental or consequential damages in connection with, or arising out of, the furnishing,
`performance, or use of these programs.
`
`Library of Congress Cataloging-in-Publication Data
`
`Nelson, Mark, 1958 —
`The Data Compression Book/by Mark Nelson
`p. cm.
`Includes index.
`ISBN 1-55851-2144: $29.95
`1. Data Compression (Computer Science)
`QA76.9.D33N46
`1991
`005.74’6--dc20
`
`ISBN l-55851-216-0 (book/disk set): $39.95
`I. Title
`
`9l-23080
`CIP
`
`Project Editor: Tova Fliegel
`Copy Editor: Shayne Bell
`Technical Reviewer: Robert Jung
`
`95
`
`94
`
`93
`
`432
`
`Cover Design: Lauren Smith Design
`Layout: Marni Tapscott
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 002
`
`

`
`Contents
`
`WHY THIS BOOK IS FOR YOU .................................................................. ..1
`
`INTRODUCTION TO DATA COMPRESSION ...................................... ..3
`
`The Audience ........................................................................................................ ..4
`
`Why C? ................................................................................................................. ..4
`Which C? .............................................................................................................. ..5
`
`Keeping Score ....................................................................................................... ..9
`The Structure ...................................................................................................... ..10
`
`THE DATA-COMPRESSION LEXICON, WITH A H|STORY....13
`
`The Two Kingdoms ............................................................................................. .. 13
`
`Data Compression = Modeling + Coding ............................................................ .. 14
`
`The Dawn Age .................................................................................................... .. 15
`
`Coding ................................................................................................................ .. 17
`
`An Improvement .............................................................................................. .. 18
`
`Modeling ............................................................................................................. ..20
`
`Statistical Modeling ......................................................................................... .. 20
`
`Dictionary Schemes ......................................................................................... ..22
`
`Ziv and Lempel ................................................................................................... .. 23
`LZ77 . .............................................................................................................. ..23
`
`LZ78. ............................................................................................................... .. 24
`
`Lossy Compression ............................................................................................. ..24
`Programs to Know .............................................................................................. .. 25
`
`THE DAWN AGE: MINIMUM REDUNDANCY CODING ............. ..29
`
`The Shannon—Fano Algorithm ............................................................................. ..31
`
`The Huffman Algorithm ...................................................................................... ..34
`HuffmaninC ...................................................................................................... ..38
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 003
`
`

`
`BITIO.C.
`A Reminder about Prototypes
`MA1N—C.C and MAIN-E.C
`
`.......
`....
`
`....
`
`.....
`
`.......
`
`........
`
`......
`
`...........
`
`ERRHAND.C...
`Into the Huffman Code.
`Counting the Symbo1s........
`Saving the Counts................. ........ ..
`Building the Tree..........
`.......
`......
`Using the Tree ....
`The CompressionCode .63
`Putting It All Together..
`Performance.....................
`
`..........59
`
`.......
`
`......
`
`A SIGNIFICANT IMPROVEMENT: ADAPTIVE HUFFMAN
`
`Adaptive Coding
`Updating the Huffman Tree.............
`What SwappingDoes
`The Algorithm.
`........
`
`........
`
`.88
`..89
`
`The Escape Code .
`The Overflow Problem
`........ ..
`A Rescaling Bonus.
`TheCode
`97
`Initialization of theArray
`The Compress MainProgram .98
`The Expand Main Program.
`.....
`Encoding the Symbol.
`108
`Decoding the Symbol.
`TheCode ..........109
`
`..
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 004
`
`

`
`HUFFMAN ONE BETTER:
`ARITHMETIC CODING ..................
`
`Difficulties ........................................................................................................ .. 123
`Arithmetic Coding: A Step Forward .................................................................. .. 124
`Practical Matters ............................................................................................ .. 128
`A Complication .............................................................................................. .. 130
`Decoding ....................................................................................................... .. 132
`Where’s the Beef? .......................................................................................... .. 133
`
`The Code .............................................................................................. .., ......... .. 134
`The Compression Program ............................................................................. .. 134
`The Expansion Program ................................................................................. .. 136
`Initializing the Model ..................................................................................... .. 137
`Reading the Model ......................................................................................... .. 141
`Initializirig the Encoder .................................................................................. .. 141
`The Encoding Process .................................................................................... .. 142
`Flushing the Encoder ..................................................................................... .. 145
`The Decoding Process ................................................................................... .. 145
`Summary ........................................................................................................... .. 148
`Code .................................................................................................................. .. 148
`
`STATISTICAL MODELING ......................................................................... ..167
`
`167
`Higher—Order Modeling ....................................................................................
`Finite Context Modeling ................................................................................... .. 168
`Adaptive Modeling ........................................................................................... .. 169
`A Simple Example ......................................................................................... .. 170
`Improvements ................................................................................................ .. 176
`Highest—Order Modeling ................................................................................... .. 176
`Updating the Model ....................................................................................... .. 178
`Escape Probabilities ....................................................................................... .. 179
`Scoreboarding ................................................................................................ .. 180
`Data Structures .............................................................................................. .. 181
`The Finishing Touches: Tables 1 and 2 .......................................................... .. 185
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 005
`
`

`
`Model Flushing .............................................................................................. .. 186
`Implementation .............................................................................................. .. 186
`Conclusions ........................
`....
`................................................................... .. 187
`
`Enhancement ................................................................................................. .. 187
`
`ARITH—N Listing ................................................................................
`
`....... .. 188
`
`DICTIONARY-BASED COMPRESSION ...............................................219
`
`An Example ..................................................................................................... ...219
`Static VS. Adaptive ............................................................................................. .. 220
`Adaptive Methods .......................................................................................... .. 221
`A Representative Example ............................................................................. .. 223
`Israeli Roots .............................................................................
`.................... .. 225
`
`History ........................................................................................................... .. 226
`ARC: The Father of MS—DOS Dictionary Compression ................................... ..227
`
`Dictionary Compression ................................................................................. ..228
`Danger Ahead—Patents ..................
`............................................................... ..230
`Conclusion . ......... . .. ........................................................................................... .. 231
`
`S L I D I N G W I N D O W C O M P R E S S I O N
`
`3 3
`
`........................................................ ..233
`........
`.........
`The Algorithm ................
`Problems with LZ77 ...................................................................................... ..238
`
`............. .....240
`.........
`An Encoding Problem .....................................................
`LZSS Compression .....................................
`......
`.................................. ..240
`Data Structures .............................................................................................. ..242
`
`A Balancing Act ............................................................................................ ..244
`Greedy vs. Best Possible ......................................
`....................................... ..246
`The Code ...................
`.........................
`...........
`......................................... ..247
`Constants and Macros .................................................................................... ..247
`
`Global Variables ...............................................................
`
`........................... .. 251
`
`............................................ ..252
`...........
`The Compression Code ....................
`Initialization. ................... ... ................
`......................................................... . . 25 3
`
`The Main Loop ......................................................................................
`
`..... ..254
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 006
`
`

`
`The Exit Code ................................................................................................ .. 256
`AddSt1ing() .................................................................................................... .. 257
`DeleteString() ................................................................................................. ..26O
`Binary Tree Support Routines ........................................................................ .. 262
`The Expansion Routine ..................................................................................... .. 263
`Improvements ................................................................................................ .. 265
`The Code .......................................................................................................... ..266
`
`L278 COMPRESSION ................................................................................... ..277
`Can LZ77 Improve? .......................................................................................... .. 278
`Enter LZ78 ........................................................................................................ ..279
`LZ78 Details .................................................................................................. ..279
`LZ78 Implementation. ................................................................................... ..282
`An Effective Variant .......................................................................................... .. 285
`Decompression .................................................................................................. .. 287
`The Catch ...................................................................................................... .. 288
`LZW Implementation .................................................................................... ..290
`Tree Maintenance and Navigation .................................................................. ..290
`Compression ..................................................................................................... .. 293
`Decompression .................................................................................................. ..294
`The Code .......................................................................................................... .. 296
`Improvements ................................................................................................... .. 302
`Patents ............................................................................................................... .. 3 ll
`
`SPEECH COMPRESSION .....................................; ..................................... ..313
`Digital Audio Concepts ..................................................................................... .. 313
`Fundamentals ................................................................................................. . . 3 14
`Sampling Variables ........................................................................................ .. 319
`PC—Based Sound ............................................................................................ .. 323
`Lossless Compression of Sound ........................................................................ ..324
`Problems and Results ..................................................................................... .. 325
`Lossy Compression ........................................................................................ .. 328
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 007
`
`

`
`Silence Compression ...................................................................................... .. 329
`Other Techniques .............................................................................................. .. 345
`
`LOSSY GRAPHICS COMPRESSION ................................................... ..347
`Enter Compression ........................................ ..‘ .................................................. ..348
`Statistical and Dictionary Compression Methods ........................................... .. 348
`Lossy Compression ........................................................................................ ..349
`Differential Modulation .................................................................................. .. 350
`Adaptive Coding ............................................................................................ .. 351
`A Standard That Works: JPEG .......................................................................... .. 353
`JPEG Compression ........................................................................................ ..354
`The Discrete Cosine Transform ...................................................................... .. 354
`DCT Specifics ............................................................................................... .. 357
`Why Bother? ..................................................................................................... ..358
`Implementing the DCT ..................................................................................... .. 359
`Matrix Multiplication ..................................................................................... .. 360
`Continued Improvements .................................................................................. .. 362
`Output of the DCT ......................................................................................... .. 363
`Quantization ................................................................................................... .. 364
`Selecting a Quantization Matrix ................................................................... .. 365
`Coding ................................................................ .; ............................................ .. 369
`The Zig-Zag Sequence ................................................................................... .. 369
`Entropy Encoding .......................................................................................... .. 372
`What About Color? ........................................................................................ .. 373
`The Sample Program ......................................................................................... .. 373
`Input Format .................................................................................................. .. 374
`The Code ....................................................................................................... ..375
`Initialization ................................................................................................... .. 376
`The Forward DCT Routine ............................................................................ .. 378
`WriteDCTData() ............................................................................................ .. 379
`File Expansion ............................................................................................... .. 382
`ReadDCTData() ................................................................................ . L ........... . . 383
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 008
`
`

`
`Input DCT Code ............................................................................................ .. 384
`The Inverse DCT ........................................................................................... .. 385
`The Complete Code Listing .............................................................................. .. 387
`Support Programs .............................................................................................. ..400
`Some Compression Results ............................................................................... ..406
`
`AN ARCHIVING PACKAGE .........................................................................409
`
`CAR and CARMAN ......................................................................................... ..409
`The CARMAN Command Set ....................................................................... ..41O
`
`The CAR File ................................................................................................ ..412
`
`The Header .................................................................................................... .. 413
`Storing the Header ......................................................................................... ..413
`The Header CRC ........................................................................................... ..417
`
`Command-Line Processing ............................................................................ .. 418
`Generating the File List ..................................................................................... .. 421
`Opening the Archive‘Files ............................................................................. ..426
`The Main Processing Loop ............................................................................... ..428
`Skipping/Copying Input File .......................................................................... ..434
`File Insertion .................................................................................................. ..435
`
`File Extraction ............................................................................................... .. 436
`
`Cleanup ............................................................................................................438
`The Code .......................................................................................................... ..438
`
`APPENDIX A:STAT|STlCS FOR
`COMPRESSION PROGRAMS .....................................................................489
`
`APPENDIX B:TEST PROGRAMS .......................................................... ..495
`
`GLOSSARY .............................................................................................................505
`
`BIBLIOGRAPHY ..................................................................................................517
`
`Other Resources ................................................................................................ ..518
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 009
`
`

`
`AFTERWORDmmWmmwmmwmmwm”
`
`Wmm.mWmWmwmmm
`
`INDEX.
`
`Wmwmmmwmmmmmmmm
`
`.mmw
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 010
`
`

`
`CHAPTER 1
`
`Introduction to Data
`Compression
`
`The primary purpose of this book is to explain various data—compression tech-
`niques using the C programming language. Data compression seeks to reduce the
`number of bits used to store or transmit information. It encompasses a wide variety of
`software and hardware compression techniques which can be so unlike one another
`that they have little in common except that they compress data. The LZW algorithm
`used in the Compuserve GIF specification, for example. has virtually nothing in
`common with the CCITT G.72l specification used to compress digitized voice over
`phone lines.
`This book will not take a comprehensive look at every variety of data compres-
`sion. The field has grown in the last 25 years to a point where this is simply not
`possible. What this book will cover are the various types of data compression com-
`monly used on personal and midsized computers, including compression of binary .
`programs, data, sound, and graphics.
`Furthermore, this book will either ignore or only lightly cover data—compression
`techniques that rely on hardware for practical use or that require hardware applica-
`tions. Many of today’s Voice—compression schemes were designed for the worldwide
`f1xed—bandwidth digital telecommunications networks. These compression schemes
`are intellectually interesting, but they require a specific type of hardware tuned to the
`fixed bandwidth of the communications channel. Different algorithms that don’t have
`to meet this requirement are used to compress digitized voice on a PC, and these
`algorithms generally offer better performance.
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 011
`
`

`
`THE DATA COMPRESSION BOOK
`
`Some of the most interesting areas in data compression today, however, do concern
`compression techniques just becoming possible with new and more powerful hardware.
`Lossy image compression, like that used in multimedia systems, for example, can now be
`implemented on standard desktop platforms. This book will cover practical ways to both
`experiment with and implement some of the algorithms used in these techniques.
`
`The Audience
`You will need basic programming skills to adequately discuss data»compression code.
`"the ability to follow block-structured code, such as C or Pascal, is a requirement. In
`addition, understanding computer architecture well enough to follow bit—oriented opera-
`tions, such as shifting, logical ORing and ANDing, and so on, will be essential.
`This does not mean that you need to be a C guru for this book to be worthwhile. You
`don’t even have to be a programmer. But the ability to follow code will be essential,
`because the concepts discussed here will be illustrated with portable C programs. The
`C code in this book has been written with an eye toward simplicity in the hopes that
`C novices will still be able to follow the programs. We will avoid the more esoteric
`constructs of C, but the code will be C, not pseudocode or English.
`
`Why C?
`The use of C to illustrate data—compression algorithms may raise some hackles. A more
`traditional way to write this book would have been to use pseudocode to sketch out the
`algorithms. But the lack of rigor in a pseudocode “program” often leads to hazy or
`incomplete definitions full of lines like “PROCESS FILE UNTIL OUT OF DATA.” The
`result is that pseudocode is easy to read, but not so easy to translate into a working program.
`If pseudocode is unsatisfactory, the next best choice is to use a conventional
`programming language. Though hundreds of choices are available, C seems the best
`choice for this type of book for several good reasons. First, in many respects C has
`become the lingua franca of programmers. That C compilers support computers
`ranging from a lowly 8051 microcontroller to supercomputers capable of 100 million
`instructions per second (MIPS) has had much to do with this. It doesn’t mean that C
`is the language of choice for all programmers. What it does mean is that most
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 012
`
`

`
`|NTRODUCTl0N
`
`programmers should have a C compiler available for their machines, and most are
`probably regularly exposed to C code. Because of this, many prograrmners who use
`other languages can still manage to code in C, and even more can at least read C.
`A second reason for using C is that it is a language without too many surprises. The few
`constructs it uses as basic language elements are easily translated to other languages. So a
`datacompression program that is illustrated using C can be converted to a working Pascal
`program through a relatively straightforward translation procedure. Even assembly—lan—
`guage programmers should find the process relatively painless.
`Perhaps the most importantreason for using C is simply one of efficiency. C is often
`thought of as ahigh—level assembly language, since it allows programmers to get close
`to the hardware. Despite the increasing optimization found in recent C compilers, it is
`not likely that C will ever exceed the speed or size possible ir1 hand-coded assembly
`language. That flaw is offset, however, by the ability to easily port C code to other
`machines. So for a book of this type, C is probably the most efficient choice.
`
`Which C?
`Despite being advertised as a “portable” language, a C program that compiles and
`executes on a given machine is not guaranteed to run on any other. It may not even
`compile using a different compiler on the same machine. The important thing to
`remember is not that C is portable, but that it can be portable. The code for this book
`has been written to be portable, and it compiles and runs cleanly using several
`compilers and environments.’Tl1e compilers/environments used here include:
`
`- Microsoft C 6.0, MS—DOS 3.3/5.0
`- Borland C++ 2.0, MS—DOS 3.3/5.0
`= Zortech C++, MS—DOS 3.3/5.0, with 286 and 386 DOS Extenders
`- Interactive Unix System 3.2 with the portable C compiler
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 013
`
`

`
`THE DATA COMPRESSION BOOK
`
`One important portability issue is library function calls. Though the C program-
`ming language was fairly well defined by the original K&R book (Brian W. Kemighan
`and Dennis M. Ritchie, The C Programming Language [Englewood Cliffs, N.J.:
`Prentice-Hall, 1978]), the run-time library implementation was left totally up to the
`whims of the implementor. Fortunately, the American National Standards Institute
`was able to complete the C language specification in l990, and the result was
`published as ANSI standard XJl1.34. This standard not only expanded and pinned
`down the original K&R language specification, but it also took on the definition of a
`standard C run—tirne library. This makes it much easier to write code that works the
`same way from machine to machine. The code in this book will be written with the
`intention of using only ANSI C library calls. Compiler-dependent extensions to either
`the language or the library will be avoided wherever possible.
`Given the standardization of the libraries, the remaining portability issues center
`
`around two things: sizes of the basic data types and dealing with noncompliant
`compilers. The majority of data-type conflicts arise when switching between 16- and
`32-bit machines.
`
`is fairly easy to manage the change between 16- and 32-bit
`it
`Fortunately,
`machines. Though the basic integer data type switches between 16- and 32-bits, both
`machines have a 16-bit “short int” data type. Once again, a “long int” "is generally 32
`bits on both machines. So in cases where the size of an integer clearly matters, it can '
`
`be pirmed down to either 16- or 32—bits with the appropriate declaration.
`On the vast majority of machines used in the world today, the C compiler
`implementation of the “char” data type is 8 bits wide. In this book, we will gloss over
`the possibility that any other size exists and stick with 8-bit characters. In general,
`porting a program shown here to a machine with an unusual char size is not too
`difficult, but spending too much time on it will obscure the important point of the
`
`programs here, which is data compression.
`The final issue to deal with when writing portable code is the problem of
`
`noncompliant compilers. In the MS-DOS world, most C compilers undergo major
`releases and upgrades every two years or so. This means that most compiler vendors
`have been able to release new versions of their compilers that now conform closely to
`
`Veritas Techs. LLC
`Exhibit 1012
`Page 014
`
`

`
`INTRODUCHON
`
`the ANSI C standard. But this is not the case for users of many other operating
`systems. In particular, UNIX users will frequently be using a C compiler which came
`with their system and which confonns to the older K&R language definition. While
`the ANSI C committee went to great lengths to make ANSI C upwardly compatible
`from K&R C, we need to watch out for a few problems.
`The first problem lies in the use of function prototypes. Under K&R C, function
`prototypes were generally used only when necessary. The compiler assumed that any
`unseen function returned an integer, and it accepted this without complaint. If a
`function retumed something unusual—a pointer or along, for instance——the prograrn—
`mer would write a function prototype to inform the compiler:
`
`long locate_string();
`
`the prototype told the compiler to generate code that assumes that the
`Here,
`function returned a long instead of an int. Function prototypes didn’t have much more
`use than that. Because of this, many C programmers working under a K&R regime
`made little or no use of function prototypes, and their appearance in a program was
`
`something of an oddity.
`While the ANSI C committee tried not to alter the basic nature of C, they were
`unable to pass up the potential improvements to the language that were possible
`through the expansion of the prototyping facility. Under ANSI C, a function prototype
`defines not only the return type of a function, but also the type of all the arguments as
`well. The function shown earlier, for example, might have the following prototype
`
`with an ANSI C compiler:
`
`long locate_string< FILE *input_file, char *string );
`
`This lets the compiler generate the correct code for the return type and check for the
`correct type and number of arguments as well. Si

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket