throbber
tit.‘-;t_I'*t-.1Irtngfast, efficient data
`.M..prtession techniques in C
`
`
`
`Mark Nelson
`
`Mgzr 3
`
`SAP AIVIERICA, INC. ET AL.
`
`EXHIBIT 1003
`
`Page 1 of 534 '
`
`Page 1 of 534
`
`

`
`The Data Compression Book
`This authorative guide details various data compression techniques used on personal and
`mid-sized computers.
`it explores different data compression methods, explaining the theory behind
`each and showing E programmers how to apply them to significantly increase the storage capacity
`of their system. Each technique is fully illustrated with complete, working programs written in
`portable C. These programs not only demonstrate how data compression works but can also be
`used to build your own data compression programs.
`Packed with expert tips, tricks, and techniques, The Data Compression Book certain to be
`a book you'll refer to again and again.
`
`Topics include:
`
`- The Shannon-Fano and Huffman coding techniques.
`o Differences between modeling and coding.
`- Expanding and improving Huffman coding with Adaptive Huffman techniques.
`0 Using Arithmetic coding.
`- Implementing powerful statistical models.
`- Evaluating dictionary compression methods including LZW and LZ77.
`- Applying lassy compression techniques to computer graphics and digitized sound data.
`0 The JPEG compression algorithm.
`- Developing a complete archive program.
`
`The code in this book has been written to be portable and to compile and run
`using several compilers and environments. These include Microsoft C 6.0, Borland C++ 2.0,
`Zortech C++ 3.0, and Interactive UNIX System v3.2. All of the source code is available on
`2 low-density disks in ll/lS/PC-DOS format.
`
`Hark Nelson hos more than T0 years experience as a professional programmer. He is tfice President of
`Development for Greenleaf Software, inc. Mark is a regular contributor to various technical publications including
`Dr. Dobb’s Journal, Computer language, Te
`llllllllilllllfli
`lllllllllllllllllllllllllllllllllllllllll
`Why this book is for you—Si DATA COMPRE
`OK w/ DISK
`NELSON 1o 20sgeeseeeeeeeeee N
`llll
`l
`Ill
`ll
`4ll Bollzrls/lvenue ' 9I-1,8155: S;25156E]’_E]’E_D
`
`SSION B0
`
`llliil
`
`MKTB k
`
`San Mateo, CA 94402
`
`HArrawAnE_
`
`33”
`>$3"l-“T5
`
`‘
`
`
`
`Page 2 of 534
`
`Page 2 of 534
`
`

`
`TheData
`
`Compresnon
`Book
`
`Page 3 of 534
`
`

`
`TheData
`
`Compresnon
`Book
`
`Featuring fast, efficient data
`compression techniques in C
`
`Mark Nelson
`
`Page 4 of 534
`
`M&T§
`:
`><:1:
`
`2I
`
`Page 4 of 534
`
`

`
`.
`.
`M&T Books
`A Division of M&T Publishing, Inc.
`411 Bore] Avenue
`San Mateo, CA 94402
`
`© 1992 by M&T Publishing, Inc.
`
`Printed in the United States of America
`
`All rights reserved. No pain of this book may be reproduced or transmitted in any form or by any means,
`electronic or mechanical, including photocopying, recording. or by any information storage and
`retrieval xylem, without prior written pemissiott from the Publisher. Contact the Publisher for
`infomtation on foreign rights.
`
`Limits ol‘ Liability and Disclaimer of Warranty
`The Author and Publisher of this book have used their best efforts in preparing the book and the
`programs contained in it. Tltesc efforts irtclttde Ihe development, research, and testing of the theories
`and programs to detemiine their eftbctiveness.
`
`The Author and Publisher make no warranty of any kind, expressed or implied, with regard to these
`programs or the documentation contained in this book. The Author and Publisher shall not be liable in
`any event for incidental or consequential damages in connection with, or arising out of, the furnishing,
`performance, or use of these programs.
`
`Library of Congress Cataloging-in-Publication Data
`
`Nelson, Mark, 1958 -
`The Data Compression Book/by Mark Nelson
`p. cm.
`Includes index.
`
`ISBN l-55851-214-4: $29.95
`1. Data Compression (Computer Science)
`QA76.9.D33N46
`1991
`005.74’6--dc20
`
`ISBN 1-55851-216-O (book/disk set): $39.95
`I. Title
`
`91-23080
`CIP
`
`Project Editor: Tova Fliegel
`Copy Editor: Shayne Bell
`Technical Reviewer: Robert Jung
`
`95
`
`94 9392
`
`4321
`
`Cover Design: Lauren Smith Design
`Layout: Marni Tapscott
`
`Page 5 of 534
`
`Page 5 of 534
`
`

`
`Contents
`
`WHY THIS BOOK IS FOR YOU .................................................................. ..1
`
`INTRODUCTION TO DATA COMPRESSION ...................................... ..3
`The Audience ........................................................................................................ ..4
`Why C? ................................................................................................................. ..4
`WhichC? .............................................................................................................. ..5
`Keeping Score ....................................................................................................... ..9
`The Structure ...................................................................................................... .. 10
`
`THE DATA-COMPRESSION LEXICON, WITH A H|STORY....13
`The Two Kingdoms ............................................................................................. .. 13
`Data Compression = Modeling + Coding .............................................................. 14
`The Dawn Age .................................................................................................... .. 15
`Coding ................................................................................................................ .. 17
`An Improvement .............................................................................................. .. 18
`Modeling ............................................................................................................. .. 20
`Statistical Modeling ......................................................................................... .. 20
`Dictionary Schemes ......................................................................................... ..22
`Ziv and Lempel ................................................................................................... ..23
`LZ77 . ................................................................................................................23
`
`LZ78. ............................................................................................................... .. 24
`Lossy Compression ..................... .; ...................................................................... .. 24
`Programs to Know .............................................................................................. ..25
`
`THE DAWN AGE: MINIMUM FIEDUNDANCY CODING ............. ..29
`The Shannon—Fano Algorithm ............................................................................. .. 31
`The Huffman Algorithm .....................................................................................
`34
`Huffman in C ...................................................................................................... .. 38
`
`Page 6 of 534
`
`Page 6 of 534
`
`

`
`BITIO.C . ........................................................................................................ .. 38
`
`A Reminder about Prototypes ............................................................................. ..47
`
`MAIN—C.C and MAIN—E.C ................................................................................ ..49
`
`MAIN—C.C ....................................................................................................... .. 55
`
`ERRHAND.C .................................................................................................. .. 56
`
`Into the Huffman Code ....................................................................................... .. 58
`
`Counting the Symbols ...................................................................................... .. 59
`
`Saving the Counts ............................................................................................ ..59
`
`Building the Tree ............................................................................................. .. 61
`
`Using the Tree ................................................................................................. ..62
`
`The Compression Code ....................................................................................... .. 63
`
`Putting It All Together ......................................................................................... .. 79
`
`Performance ..................................................................................................... .. 79
`
`A SIGNIFICANT IMPROVEMENT: ADAPTIVE HUFFMAN
`
`c o D I N G ............................................................................. ..= ..................................... ..a 1
`
`Adaptive Coding ................................................................................................. .. 82
`
`Updating the Huffman Tree ................................................................................. .. 83
`
`What Swapping Does ...................................................................................... .. 87
`
`The Algorithm. ................................................................................................ .. 88
`
`An Enhancement. ............................................................................................. .. 89
`
`The Escape Code . ........................................................................................... ..90
`
`The Overflow Problem .................................................................................... .. 91
`
`A Rescaling Bonus. ......................................................................................... ..95
`
`The Code ............................................................................................................ .. 95
`
`Initialization of the Array ................................................................................. .. 97
`
`The Compress Main Program .......................................................................... ..98
`
`The Expand Main Program .............................................................................. .. 99
`
`Encoding the Symbol. .................................................................................... .. 100
`
`Decoding the Symbol. .................................................................................... .. 108
`
`The Code .......................................................................................................... .. 109
`
`Page 7 of 534
`
`Page 7 of 534
`
`

`
`
`
`HUFFMAN ONE BETTER:
`
`ARITHMETIC CODING ................................................................................. ..123
`
`Difficulties ........................................................................................................ .. 123
`
`Arithmetic Coding: A Step Forward .................................................................. .. 124
`Practical Matters ............................................................................................ .. 128
`
`A Complication .............................................................................................. .. 130
`Decoding ....................................................................................................... .. 132
`Where’s the Beef? .......................................................................................... .. 133
`
`The Code .......................................................................................................... .. 134
`
`The Compression Program .... ..; ........................................................................ 134
`The Expansion Program ................................................................................. .. 136
`Initializing the Model ..................................................................................... .. 137
`Reading the Model ......................................................................................... .. 141
`Initializing the Encoder .................................................................................. ..141
`The Encoding Process .................................................................................... .. 142
`Flushing the Encoder ..................................................................................... .. 145
`The Decoding Process ................................................................................... ..145
`Summary ........................................................................................................... .. 148
`Code .................................................................................................................. .. 148
`
`STATISTICAL MODELING ......................................................................... ..167
`
`Higher—Order Modeling ..................................................................................... .. 167
`Finite Context Modeling ................................................................................... .. 168
`Adaptive Modeling ........................................................................................... .. 169
`A Simple Example ......................................................................................... .. 170
`Improvements ................................................................................................ .. 176
`Highest—Order Modeling ................................................................................... .. 176
`Updating the Model ....................................................................................... .. 178
`Escape Probabilities ....................................................................................... .. 179
`Scoreboarding ................................................................................................ .. 180
`Data Structures .............................................................................................. .. 181
`
`The Finishing Touches: Tables 1 and 2 ............................................................ 185
`
`Page 8 of 534
`
`Page 8 of 534
`
`

`
`Model Flushing .............................................................................................. .. 186
`
`Implementation .............................................................................................. .. 186
`
`Conclusions ....................................................................................................... .. 187
`
`Enhancement ................................................................................................. .. 187
`
`ARITH—N Listing .............................................................................................. .. 188
`
`DICTIONARY-BASED COMPRESSION ............................................. ..219
`
`An Example ...................................................................................................... .. 219
`
`Static vs. Adaptive ............................................................................................. .. 220
`
`Adaptive Methods .......................................................................................... .. 221
`
`A Representative Example ............................................................................. .. 223
`
`Israeli Roots ...................................................................................................... .. 225
`
`History ........................................................................................................... .. 226
`
`ARC: The Father of MS—DOS Dictionary Compression ................................... ..227
`Dictionary Compression ................................................................................. ..228
`
`Danger Ahead—Patents .................................................................................... .. 230
`
`Conclusion ........................................................................................................ .. 23 1
`
`SLIDING WINDOW COMPRESSION ................................................... ..233
`
`The Algorithm ................................................................................................... .. 233
`
`Problems with LZ77 ...................................................................................... .. 238
`
`An Encoding Problem .................................................................................... .. 240
`
`LZSS Compression ........................................................................................... ..240
`
`Data Structures .............................................................................................. ..242
`
`A Balancing Act ............................................................................................ ..244
`
`Greedy vs. Best Possible ................................................................................ ..246
`
`The Code .......................................................................................................... ..247
`
`Constants and Macros .................................................................................... .. 247
`
`Global Variables ............................................................................................. .. 25 I
`
`The Compression Code ..................................................................................... ..252
`
`Initialization. .................................................................................................. .. 253
`
`The Main Loop .............................................................................................. .. 254
`
`Page 9 of 534
`
`Page 9 of 534
`
`

`
`The Exit Code ................................................................................................ .. 256
`
`AddString() .................................................................................................... .. 257
`De1eteString() ................................................................................................. .. 260
`Binary Tree Support Routines ........................................................................ .
`. 262
`The Expansion Routine ....................................................................................... 263
`Improvements
`........................................................................................... ..265
`The Code .......................................................................................................... .. 266
`
`LZ78 COMPRESSION ................................................................................... ..277
`
`Can LZ77 Improve? ............................................................................................278
`Enter LZ78 ........................................................................................................ .. 279
`
`LZ78 Details .................................................................................................. ..279
`
`LZ78 Implementation. ................................................................................... ..282
`An Effective Variant .......................................................................................... ..285
`
`Decompression .................................................................................................. .. 287
`The Catch ...................................................................................................... ..288
`
`LZW Implementation ...................................................................................... 290
`Tree Maintenance and Navigation ................................................ .; ................ ..290
`Compression ..................................................................................................... .. 293
`Decompression .................................................................................................. .. 294
`The Code .......................................................................................................... .. 296
`
`Improvements ................................................................................................... .. 302
`Patents ............................................................................................................... .. 3 1 1
`
`SPEECH COMPRESSION ........................................................................... ..313
`
`Digital Audio Concepts ..................................................................................... .. 313
`Fundamentals ................................................................................................. .. 314
`
`Sampling Variables ........................................................................................ .. 319
`PC—Based Sound ............................................................................................ ..323
`
`Lossless Compression of Sound ........................................................................ .. 324
`Problems and Results ..................................................................................... .. 325
`
`Lossy Compression ........................................................................................ .. 328
`
`Page 10 of 534
`
`Page 10 of 534
`
`

`
`Silence Compression ...................................................................................... .. 329
`
`Other Techniques .............................................................................................. .. 345
`
`LOSSY GRAPHICS COMPRESSION ................................................... ..347
`
`Enter Compression ............................................................................................ .. 348
`
`Statistical and Dictionary Compression Methods ........................................... .. 348
`
`Lossy Compression ........................................................................................ .. 349
`
`Differential Modulation .................................................................................. .. 350
`
`Adaptive Coding ............................................................................................ .. 351
`
`A Standard That Works: JPEG .......................................................................... ..353
`
`JPEG Compression ........................................................................................ .. 354
`
`The Discrete Cosine Transform ...................................................................... .. 354
`
`DCT Specifics ............................................................................................... .. 357
`
`Why Bother? ..................................................................................................... .. 358
`
`Implementing the DCT ..................................................................................... .. 359
`
`Matrix Multiplication ..................................................................................... .. 360
`Continued Improvements .................................................................................. .. 362
`Output of the DCT ......................................................................................... .. 363
`Quantization ................................................................................................... .. 364
`
`Selecting a Quantization Matrix ..................................................................... .. 365
`Coding .............................................................................................................. .. 369
`The Zig—Zag Sequence ................................................................................... .. 369
`
`Entropy Encoding .......................................................................................... .. 372
`
`What About Color? ........................................................................................ .. 373
`
`The Sample Program ......................................................................................... .. 373
`
`Input Fonnat .................................................................................................. .. 374
`
`The Code ....................................................................................................... ..375
`
`Initialization ................................................................................................... .. 376
`
`The Forward DCT Routine ............................................................................ .. 378
`
`WriteDCTData() ............................................................................................ .. 379
`
`File Expansion ............................................................................................... ..382
`
`ReadDCTData() ............................................................................................. .. 383
`
`Page 11 of 534
`
`Page 11 of 534
`
`

`
`Input DCT Code ............................................................................................ .. 384
`The Inverse DCT ........................................................................................... .. 385
`The Complete Code Listing .............................................................................. .. 387
`Support Programs ................................................................................................400
`Some Compression Results ............................................................................... ..406
`
`AN ARCHIVING PACKAGE ....................................................................... ..409
`CAR and CARIVIAN ......................................................................................... ..409
`
`The CARMAN Command Set ....................................................................... ..4l0
`
`The CAR File ................................................................................................ ..412
`
`The Header .................................................................................................... ..4l3
`Storing the Header ......................................................................................... ..4l3
`The Header CRC ........................................................................................... ..4l7
`Command-Line Processing ............................................................................ ..4l8
`Generating the File List .......................................................................................421
`Opening the Archive Files ............................................................................. .. 426
`The Main Processing Loop ............................................................................... ..428
`Skipping/Copying Input File .......................................................................... ..434
`File Insertion .................................................................................................. ..435
`
`File Extraction ............................................................................................... ..436
`Cleanup .......................................................................................................... ..438
`The Code .......................................................................................................... ..438
`
`APPENDIX A:STAT|STICS FOR
`COMPRESSION PROGRAMS ............................................................. ..' .... ..489
`
`APPENDIX B:TEST PROGRAMS .......................................................... ..495
`
`GLOSSARY ........................................................................................................... ..505
`
`BIBLIOGRAPHY ................................................................................................ ..517
`
`Other Resources ................................................................................................ ..518
`
`Page 12 of 534
`
`Page 12 of 534
`
`

`
`AFTERWORD ....................................................................................................... ..521
`
`INDEX ....................................................................................................................... ..523
`
`Page 13 of 534
`
`Page 13 of 534
`
`

`
`Acknowledgments
`
`I would like to take this opportunity to thank all the people who made this book
`possible: Denise, Joseph, and Kaitlin, for their patience and support; Tova and Robert,
`for keeping me honest and on time (sort of); and Robert X. Cringely, for supplying the
`secret formula to writing a book. Most importantly, I want to thank Jon Erickson, who
`
`made it all possible.
`
`Mark Nelson
`
`Page 14 of 534
`
`Page 14 of 534
`
`

`
`I concluded, we kinotropists must be numbered among Britain’s most
`adept programmers of Enginery of any sort, and virtually all advances
`in the compression of data have originated as kinotropic applications.
`
`At this point, he interrupted again, asking if I had indeed said “the
`compression of data,” and was I familiar with the term “algorithmic
`compression”? I assured him I was.
`
`The Diflerence Engine
`
`William Gibson and Bruce Sterling
`
`Page 15 of 534
`
`Page 15 of 534
`
`

`
`Why This Book Is for You
`
`If you want to learn how programs like PKZIP and LHarc work, this book is for
`
`you. The compression techniques used in these programs are described in detail,
`
`accompanied by working code. After reading this book, even the novice C programmer
`
`will be able to write a complete compression/archiving program that can be ported
`
`to virtually any operating system or hardware platform.
`
`If you want to include data compression in other programs you write, this book
`
`will become an invaluable tool. It contains dozens of working programs with C code
`
`that can easily be added to your applications. In—depth discussions of various
`
`compression methods will help you make intelligent decisions when creating
`
`programs that use data compression.
`
`If you want to learn why lossy compression of graphics is the key factor in
`
`enabling the multimedia revolution, you need this book. DCT-based compression
`
`like that used by the JPEG algorithm is described in detail. Working programs let you
`
`experiment with this fascinating new technology.
`
`The Data Compression Book provides you with a comprehensive reference to
`
`this important field. No other book available has the detailed description of
`
`compression algorithms or working C implementations for those algorithms. If you
`
`are planning to work in this field, The Data Compression Book is indispensable.
`
`Data Compression Book Source Code Disk
`Why bother manually typing in the book’s source code when all the information
`
`you need is ready to use on disk? Code examples are used throughout the text, and this
`
`book’s optional disk (PC/MS—DOS format) contains the source code for all the
`
`programming examples listed. Only $20 postage—paid!
`
`To order with your credit card, call TOLL FREE 1-800-533-4372
`
`(in CA 1-800-356-2002). Or mail your order with payment to
`
`M&T Books, 411 Borel Avenue, Suite 100, San Mateo, CA 94402
`
`California residents please add applicable sales tax.
`
`Page 16 of 534
`
`Page 16 of 534
`
`

`
`CHAPTER 1
`
`Introduction to Data
`
`Compression
`
`The primary purpose of this book is to explain various data-compression tech-
`niques using the C programming language. Data compression seeks to reduce the
`number of bits used to store or transmit information. It encompasses a wide variety of
`
`software and hardware compression techniques which can be so unlike one another
`that they have little in common except that they compress data. The LZW algorithm
`used in the Compuserve GIF specification, for example, has virtually nothing in
`common with the CCITT G.72l specification used to compress digitized voice over
`
`phone lines.
`This book will not take a comprehensive look at every variety of data compres-
`
`sion. The field has grown in the last 25 years to a point where this is simply not
`possible. What this book will cover are the various types of data compression com-
`monly used on personal and midsized computers, including compression of binary
`
`programs, data, sound, and graphics.
`Furthermore, this book will either ignore or only lightly cover data-compression
`
`techniques that rely on hardware for practical use or that require hardware applica-
`tions. Many of today’s voice—compression schemes were designed for the worldwide
`fixed—bandwidth digital telecommunications networks. These compression schemes
`are intellectually interesting, but they require a specific type of hardware tuned to the
`fixed bandwidth of the communications charmel. Different algorithms that don’t have
`
`to meet this requirement are used to compress digitized voice on a PC, and these
`
`algorithms generally offer better performance.
`
`Page 17 of 534
`
`Page 17 of 534
`
`

`
`THE DATA COMPRESSION BOOK
`
`Some of the most interesting areas in data compression today, however, do concem
`
`compression techniques just becoming possible with new and more powerful hardware.
`Lossy image compression, like that used in multimedia systems, for example, can now be
`implemented on standard desktop platforms. This book will cover practical ways to both
`experiment with and implement some of the algorithms used in these techniques.
`
`The Audience
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket