`.M..prtession techniques in C
`
`
`
`Mark Nelson
`
`Mgzr 3
`
`SAP AIVIERICA, INC. ET AL.
`
`EXHIBIT 1003
`
`Page 1 of 534 '
`
`Page 1 of 534
`
`
`
`The Data Compression Book
`This authorative guide details various data compression techniques used on personal and
`mid-sized computers.
`it explores different data compression methods, explaining the theory behind
`each and showing E programmers how to apply them to significantly increase the storage capacity
`of their system. Each technique is fully illustrated with complete, working programs written in
`portable C. These programs not only demonstrate how data compression works but can also be
`used to build your own data compression programs.
`Packed with expert tips, tricks, and techniques, The Data Compression Book certain to be
`a book you'll refer to again and again.
`
`Topics include:
`
`- The Shannon-Fano and Huffman coding techniques.
`o Differences between modeling and coding.
`- Expanding and improving Huffman coding with Adaptive Huffman techniques.
`0 Using Arithmetic coding.
`- Implementing powerful statistical models.
`- Evaluating dictionary compression methods including LZW and LZ77.
`- Applying lassy compression techniques to computer graphics and digitized sound data.
`0 The JPEG compression algorithm.
`- Developing a complete archive program.
`
`The code in this book has been written to be portable and to compile and run
`using several compilers and environments. These include Microsoft C 6.0, Borland C++ 2.0,
`Zortech C++ 3.0, and Interactive UNIX System v3.2. All of the source code is available on
`2 low-density disks in ll/lS/PC-DOS format.
`
`Hark Nelson hos more than T0 years experience as a professional programmer. He is tfice President of
`Development for Greenleaf Software, inc. Mark is a regular contributor to various technical publications including
`Dr. Dobb’s Journal, Computer language, Te
`llllllllilllllfli
`lllllllllllllllllllllllllllllllllllllllll
`Why this book is for you—Si DATA COMPRE
`OK w/ DISK
`NELSON 1o 20sgeeseeeeeeeeee N
`llll
`l
`Ill
`ll
`4ll Bollzrls/lvenue ' 9I-1,8155: S;25156E]’_E]’E_D
`
`SSION B0
`
`llliil
`
`MKTB k
`
`San Mateo, CA 94402
`
`HArrawAnE_
`
`33”
`>$3"l-“T5
`
`‘
`
`
`
`Page 2 of 534
`
`Page 2 of 534
`
`
`
`TheData
`
`Compresnon
`Book
`
`Page 3 of 534
`
`
`
`TheData
`
`Compresnon
`Book
`
`Featuring fast, efficient data
`compression techniques in C
`
`Mark Nelson
`
`Page 4 of 534
`
`M&T§
`:
`><:1:
`
`2I
`
`Page 4 of 534
`
`
`
`.
`.
`M&T Books
`A Division of M&T Publishing, Inc.
`411 Bore] Avenue
`San Mateo, CA 94402
`
`© 1992 by M&T Publishing, Inc.
`
`Printed in the United States of America
`
`All rights reserved. No pain of this book may be reproduced or transmitted in any form or by any means,
`electronic or mechanical, including photocopying, recording. or by any information storage and
`retrieval xylem, without prior written pemissiott from the Publisher. Contact the Publisher for
`infomtation on foreign rights.
`
`Limits ol‘ Liability and Disclaimer of Warranty
`The Author and Publisher of this book have used their best efforts in preparing the book and the
`programs contained in it. Tltesc efforts irtclttde Ihe development, research, and testing of the theories
`and programs to detemiine their eftbctiveness.
`
`The Author and Publisher make no warranty of any kind, expressed or implied, with regard to these
`programs or the documentation contained in this book. The Author and Publisher shall not be liable in
`any event for incidental or consequential damages in connection with, or arising out of, the furnishing,
`performance, or use of these programs.
`
`Library of Congress Cataloging-in-Publication Data
`
`Nelson, Mark, 1958 -
`The Data Compression Book/by Mark Nelson
`p. cm.
`Includes index.
`
`ISBN l-55851-214-4: $29.95
`1. Data Compression (Computer Science)
`QA76.9.D33N46
`1991
`005.74’6--dc20
`
`ISBN 1-55851-216-O (book/disk set): $39.95
`I. Title
`
`91-23080
`CIP
`
`Project Editor: Tova Fliegel
`Copy Editor: Shayne Bell
`Technical Reviewer: Robert Jung
`
`95
`
`94 9392
`
`4321
`
`Cover Design: Lauren Smith Design
`Layout: Marni Tapscott
`
`Page 5 of 534
`
`Page 5 of 534
`
`
`
`Contents
`
`WHY THIS BOOK IS FOR YOU .................................................................. ..1
`
`INTRODUCTION TO DATA COMPRESSION ...................................... ..3
`The Audience ........................................................................................................ ..4
`Why C? ................................................................................................................. ..4
`WhichC? .............................................................................................................. ..5
`Keeping Score ....................................................................................................... ..9
`The Structure ...................................................................................................... .. 10
`
`THE DATA-COMPRESSION LEXICON, WITH A H|STORY....13
`The Two Kingdoms ............................................................................................. .. 13
`Data Compression = Modeling + Coding .............................................................. 14
`The Dawn Age .................................................................................................... .. 15
`Coding ................................................................................................................ .. 17
`An Improvement .............................................................................................. .. 18
`Modeling ............................................................................................................. .. 20
`Statistical Modeling ......................................................................................... .. 20
`Dictionary Schemes ......................................................................................... ..22
`Ziv and Lempel ................................................................................................... ..23
`LZ77 . ................................................................................................................23
`
`LZ78. ............................................................................................................... .. 24
`Lossy Compression ..................... .; ...................................................................... .. 24
`Programs to Know .............................................................................................. ..25
`
`THE DAWN AGE: MINIMUM FIEDUNDANCY CODING ............. ..29
`The Shannon—Fano Algorithm ............................................................................. .. 31
`The Huffman Algorithm .....................................................................................
`34
`Huffman in C ...................................................................................................... .. 38
`
`Page 6 of 534
`
`Page 6 of 534
`
`
`
`BITIO.C . ........................................................................................................ .. 38
`
`A Reminder about Prototypes ............................................................................. ..47
`
`MAIN—C.C and MAIN—E.C ................................................................................ ..49
`
`MAIN—C.C ....................................................................................................... .. 55
`
`ERRHAND.C .................................................................................................. .. 56
`
`Into the Huffman Code ....................................................................................... .. 58
`
`Counting the Symbols ...................................................................................... .. 59
`
`Saving the Counts ............................................................................................ ..59
`
`Building the Tree ............................................................................................. .. 61
`
`Using the Tree ................................................................................................. ..62
`
`The Compression Code ....................................................................................... .. 63
`
`Putting It All Together ......................................................................................... .. 79
`
`Performance ..................................................................................................... .. 79
`
`A SIGNIFICANT IMPROVEMENT: ADAPTIVE HUFFMAN
`
`c o D I N G ............................................................................. ..= ..................................... ..a 1
`
`Adaptive Coding ................................................................................................. .. 82
`
`Updating the Huffman Tree ................................................................................. .. 83
`
`What Swapping Does ...................................................................................... .. 87
`
`The Algorithm. ................................................................................................ .. 88
`
`An Enhancement. ............................................................................................. .. 89
`
`The Escape Code . ........................................................................................... ..90
`
`The Overflow Problem .................................................................................... .. 91
`
`A Rescaling Bonus. ......................................................................................... ..95
`
`The Code ............................................................................................................ .. 95
`
`Initialization of the Array ................................................................................. .. 97
`
`The Compress Main Program .......................................................................... ..98
`
`The Expand Main Program .............................................................................. .. 99
`
`Encoding the Symbol. .................................................................................... .. 100
`
`Decoding the Symbol. .................................................................................... .. 108
`
`The Code .......................................................................................................... .. 109
`
`Page 7 of 534
`
`Page 7 of 534
`
`
`
`
`
`HUFFMAN ONE BETTER:
`
`ARITHMETIC CODING ................................................................................. ..123
`
`Difficulties ........................................................................................................ .. 123
`
`Arithmetic Coding: A Step Forward .................................................................. .. 124
`Practical Matters ............................................................................................ .. 128
`
`A Complication .............................................................................................. .. 130
`Decoding ....................................................................................................... .. 132
`Where’s the Beef? .......................................................................................... .. 133
`
`The Code .......................................................................................................... .. 134
`
`The Compression Program .... ..; ........................................................................ 134
`The Expansion Program ................................................................................. .. 136
`Initializing the Model ..................................................................................... .. 137
`Reading the Model ......................................................................................... .. 141
`Initializing the Encoder .................................................................................. ..141
`The Encoding Process .................................................................................... .. 142
`Flushing the Encoder ..................................................................................... .. 145
`The Decoding Process ................................................................................... ..145
`Summary ........................................................................................................... .. 148
`Code .................................................................................................................. .. 148
`
`STATISTICAL MODELING ......................................................................... ..167
`
`Higher—Order Modeling ..................................................................................... .. 167
`Finite Context Modeling ................................................................................... .. 168
`Adaptive Modeling ........................................................................................... .. 169
`A Simple Example ......................................................................................... .. 170
`Improvements ................................................................................................ .. 176
`Highest—Order Modeling ................................................................................... .. 176
`Updating the Model ....................................................................................... .. 178
`Escape Probabilities ....................................................................................... .. 179
`Scoreboarding ................................................................................................ .. 180
`Data Structures .............................................................................................. .. 181
`
`The Finishing Touches: Tables 1 and 2 ............................................................ 185
`
`Page 8 of 534
`
`Page 8 of 534
`
`
`
`Model Flushing .............................................................................................. .. 186
`
`Implementation .............................................................................................. .. 186
`
`Conclusions ....................................................................................................... .. 187
`
`Enhancement ................................................................................................. .. 187
`
`ARITH—N Listing .............................................................................................. .. 188
`
`DICTIONARY-BASED COMPRESSION ............................................. ..219
`
`An Example ...................................................................................................... .. 219
`
`Static vs. Adaptive ............................................................................................. .. 220
`
`Adaptive Methods .......................................................................................... .. 221
`
`A Representative Example ............................................................................. .. 223
`
`Israeli Roots ...................................................................................................... .. 225
`
`History ........................................................................................................... .. 226
`
`ARC: The Father of MS—DOS Dictionary Compression ................................... ..227
`Dictionary Compression ................................................................................. ..228
`
`Danger Ahead—Patents .................................................................................... .. 230
`
`Conclusion ........................................................................................................ .. 23 1
`
`SLIDING WINDOW COMPRESSION ................................................... ..233
`
`The Algorithm ................................................................................................... .. 233
`
`Problems with LZ77 ...................................................................................... .. 238
`
`An Encoding Problem .................................................................................... .. 240
`
`LZSS Compression ........................................................................................... ..240
`
`Data Structures .............................................................................................. ..242
`
`A Balancing Act ............................................................................................ ..244
`
`Greedy vs. Best Possible ................................................................................ ..246
`
`The Code .......................................................................................................... ..247
`
`Constants and Macros .................................................................................... .. 247
`
`Global Variables ............................................................................................. .. 25 I
`
`The Compression Code ..................................................................................... ..252
`
`Initialization. .................................................................................................. .. 253
`
`The Main Loop .............................................................................................. .. 254
`
`Page 9 of 534
`
`Page 9 of 534
`
`
`
`The Exit Code ................................................................................................ .. 256
`
`AddString() .................................................................................................... .. 257
`De1eteString() ................................................................................................. .. 260
`Binary Tree Support Routines ........................................................................ .
`. 262
`The Expansion Routine ....................................................................................... 263
`Improvements
`........................................................................................... ..265
`The Code .......................................................................................................... .. 266
`
`LZ78 COMPRESSION ................................................................................... ..277
`
`Can LZ77 Improve? ............................................................................................278
`Enter LZ78 ........................................................................................................ .. 279
`
`LZ78 Details .................................................................................................. ..279
`
`LZ78 Implementation. ................................................................................... ..282
`An Effective Variant .......................................................................................... ..285
`
`Decompression .................................................................................................. .. 287
`The Catch ...................................................................................................... ..288
`
`LZW Implementation ...................................................................................... 290
`Tree Maintenance and Navigation ................................................ .; ................ ..290
`Compression ..................................................................................................... .. 293
`Decompression .................................................................................................. .. 294
`The Code .......................................................................................................... .. 296
`
`Improvements ................................................................................................... .. 302
`Patents ............................................................................................................... .. 3 1 1
`
`SPEECH COMPRESSION ........................................................................... ..313
`
`Digital Audio Concepts ..................................................................................... .. 313
`Fundamentals ................................................................................................. .. 314
`
`Sampling Variables ........................................................................................ .. 319
`PC—Based Sound ............................................................................................ ..323
`
`Lossless Compression of Sound ........................................................................ .. 324
`Problems and Results ..................................................................................... .. 325
`
`Lossy Compression ........................................................................................ .. 328
`
`Page 10 of 534
`
`Page 10 of 534
`
`
`
`Silence Compression ...................................................................................... .. 329
`
`Other Techniques .............................................................................................. .. 345
`
`LOSSY GRAPHICS COMPRESSION ................................................... ..347
`
`Enter Compression ............................................................................................ .. 348
`
`Statistical and Dictionary Compression Methods ........................................... .. 348
`
`Lossy Compression ........................................................................................ .. 349
`
`Differential Modulation .................................................................................. .. 350
`
`Adaptive Coding ............................................................................................ .. 351
`
`A Standard That Works: JPEG .......................................................................... ..353
`
`JPEG Compression ........................................................................................ .. 354
`
`The Discrete Cosine Transform ...................................................................... .. 354
`
`DCT Specifics ............................................................................................... .. 357
`
`Why Bother? ..................................................................................................... .. 358
`
`Implementing the DCT ..................................................................................... .. 359
`
`Matrix Multiplication ..................................................................................... .. 360
`Continued Improvements .................................................................................. .. 362
`Output of the DCT ......................................................................................... .. 363
`Quantization ................................................................................................... .. 364
`
`Selecting a Quantization Matrix ..................................................................... .. 365
`Coding .............................................................................................................. .. 369
`The Zig—Zag Sequence ................................................................................... .. 369
`
`Entropy Encoding .......................................................................................... .. 372
`
`What About Color? ........................................................................................ .. 373
`
`The Sample Program ......................................................................................... .. 373
`
`Input Fonnat .................................................................................................. .. 374
`
`The Code ....................................................................................................... ..375
`
`Initialization ................................................................................................... .. 376
`
`The Forward DCT Routine ............................................................................ .. 378
`
`WriteDCTData() ............................................................................................ .. 379
`
`File Expansion ............................................................................................... ..382
`
`ReadDCTData() ............................................................................................. .. 383
`
`Page 11 of 534
`
`Page 11 of 534
`
`
`
`Input DCT Code ............................................................................................ .. 384
`The Inverse DCT ........................................................................................... .. 385
`The Complete Code Listing .............................................................................. .. 387
`Support Programs ................................................................................................400
`Some Compression Results ............................................................................... ..406
`
`AN ARCHIVING PACKAGE ....................................................................... ..409
`CAR and CARIVIAN ......................................................................................... ..409
`
`The CARMAN Command Set ....................................................................... ..4l0
`
`The CAR File ................................................................................................ ..412
`
`The Header .................................................................................................... ..4l3
`Storing the Header ......................................................................................... ..4l3
`The Header CRC ........................................................................................... ..4l7
`Command-Line Processing ............................................................................ ..4l8
`Generating the File List .......................................................................................421
`Opening the Archive Files ............................................................................. .. 426
`The Main Processing Loop ............................................................................... ..428
`Skipping/Copying Input File .......................................................................... ..434
`File Insertion .................................................................................................. ..435
`
`File Extraction ............................................................................................... ..436
`Cleanup .......................................................................................................... ..438
`The Code .......................................................................................................... ..438
`
`APPENDIX A:STAT|STICS FOR
`COMPRESSION PROGRAMS ............................................................. ..' .... ..489
`
`APPENDIX B:TEST PROGRAMS .......................................................... ..495
`
`GLOSSARY ........................................................................................................... ..505
`
`BIBLIOGRAPHY ................................................................................................ ..517
`
`Other Resources ................................................................................................ ..518
`
`Page 12 of 534
`
`Page 12 of 534
`
`
`
`AFTERWORD ....................................................................................................... ..521
`
`INDEX ....................................................................................................................... ..523
`
`Page 13 of 534
`
`Page 13 of 534
`
`
`
`Acknowledgments
`
`I would like to take this opportunity to thank all the people who made this book
`possible: Denise, Joseph, and Kaitlin, for their patience and support; Tova and Robert,
`for keeping me honest and on time (sort of); and Robert X. Cringely, for supplying the
`secret formula to writing a book. Most importantly, I want to thank Jon Erickson, who
`
`made it all possible.
`
`Mark Nelson
`
`Page 14 of 534
`
`Page 14 of 534
`
`
`
`I concluded, we kinotropists must be numbered among Britain’s most
`adept programmers of Enginery of any sort, and virtually all advances
`in the compression of data have originated as kinotropic applications.
`
`At this point, he interrupted again, asking if I had indeed said “the
`compression of data,” and was I familiar with the term “algorithmic
`compression”? I assured him I was.
`
`The Diflerence Engine
`
`William Gibson and Bruce Sterling
`
`Page 15 of 534
`
`Page 15 of 534
`
`
`
`Why This Book Is for You
`
`If you want to learn how programs like PKZIP and LHarc work, this book is for
`
`you. The compression techniques used in these programs are described in detail,
`
`accompanied by working code. After reading this book, even the novice C programmer
`
`will be able to write a complete compression/archiving program that can be ported
`
`to virtually any operating system or hardware platform.
`
`If you want to include data compression in other programs you write, this book
`
`will become an invaluable tool. It contains dozens of working programs with C code
`
`that can easily be added to your applications. In—depth discussions of various
`
`compression methods will help you make intelligent decisions when creating
`
`programs that use data compression.
`
`If you want to learn why lossy compression of graphics is the key factor in
`
`enabling the multimedia revolution, you need this book. DCT-based compression
`
`like that used by the JPEG algorithm is described in detail. Working programs let you
`
`experiment with this fascinating new technology.
`
`The Data Compression Book provides you with a comprehensive reference to
`
`this important field. No other book available has the detailed description of
`
`compression algorithms or working C implementations for those algorithms. If you
`
`are planning to work in this field, The Data Compression Book is indispensable.
`
`Data Compression Book Source Code Disk
`Why bother manually typing in the book’s source code when all the information
`
`you need is ready to use on disk? Code examples are used throughout the text, and this
`
`book’s optional disk (PC/MS—DOS format) contains the source code for all the
`
`programming examples listed. Only $20 postage—paid!
`
`To order with your credit card, call TOLL FREE 1-800-533-4372
`
`(in CA 1-800-356-2002). Or mail your order with payment to
`
`M&T Books, 411 Borel Avenue, Suite 100, San Mateo, CA 94402
`
`California residents please add applicable sales tax.
`
`Page 16 of 534
`
`Page 16 of 534
`
`
`
`CHAPTER 1
`
`Introduction to Data
`
`Compression
`
`The primary purpose of this book is to explain various data-compression tech-
`niques using the C programming language. Data compression seeks to reduce the
`number of bits used to store or transmit information. It encompasses a wide variety of
`
`software and hardware compression techniques which can be so unlike one another
`that they have little in common except that they compress data. The LZW algorithm
`used in the Compuserve GIF specification, for example, has virtually nothing in
`common with the CCITT G.72l specification used to compress digitized voice over
`
`phone lines.
`This book will not take a comprehensive look at every variety of data compres-
`
`sion. The field has grown in the last 25 years to a point where this is simply not
`possible. What this book will cover are the various types of data compression com-
`monly used on personal and midsized computers, including compression of binary
`
`programs, data, sound, and graphics.
`Furthermore, this book will either ignore or only lightly cover data-compression
`
`techniques that rely on hardware for practical use or that require hardware applica-
`tions. Many of today’s voice—compression schemes were designed for the worldwide
`fixed—bandwidth digital telecommunications networks. These compression schemes
`are intellectually interesting, but they require a specific type of hardware tuned to the
`fixed bandwidth of the communications charmel. Different algorithms that don’t have
`
`to meet this requirement are used to compress digitized voice on a PC, and these
`
`algorithms generally offer better performance.
`
`Page 17 of 534
`
`Page 17 of 534
`
`
`
`THE DATA COMPRESSION BOOK
`
`Some of the most interesting areas in data compression today, however, do concem
`
`compression techniques just becoming possible with new and more powerful hardware.
`Lossy image compression, like that used in multimedia systems, for example, can now be
`implemented on standard desktop platforms. This book will cover practical ways to both
`experiment with and implement some of the algorithms used in these techniques.
`
`The Audience
`