throbber
Useful Tee/aniqitesfrom Sorting to Encryption
`
`Mastering
`
`A
`
`..
`
`A
`
`Q
`‘H
`
`-5.:
`
`t.
`
`.
`
`’\
`
`i
`3 .
`‘
`‘
`
`\
`
`.1‘
`
`I
`
`.,
`
`—\
`
`-¢
`
`.
`~v&‘
`:\'.
`
`
`
`K-rL\—~\\u>"<.4Ar“S§\‘:Ms>.&£x(:Mn\'¢—.‘.£by-:44»
`
`O§RE|LLY®
`
`Kyle London
`
`1
`
`APPLE 1023
`
`

`
`Mastering Algorithms with C
`
`Kyle Loudon
`
`O§RE|LLY®
`Beijing - Cambridge - Famham - Kéiln - Sebczstopol - Tokyo
`
`2
`
`

`
`Mastering Algorithms with C
`by Kyle Loudon
`
`Copyright © 1999 O’Reilly Media, Inc. All rights reserved.
`Printed in the United States of America.
`
`Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
`
`Editor: Andy Oram
`
`Production Editor: Jeffrey Liggett
`
`Printing History:
`
`August 1999:
`
`First Edition.
`
`Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered
`trademarks of O‘Reil1y Media, Inc. Mczstering Algorti/ams with C, the image of sea horses, and
`related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by
`manufacturers and sellers to distinguish their products are claimed as trademarks. Where
`those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark
`claim, the designations have been printed in caps or initial caps.
`
`While every precaution has been taken in the preparation of this book, the publisher assumes
`no responsibility for errors or omissions, or for damages resulting from the use of the
`information contained herein.
`
`ISBN: 978-1-565-92453-6
`[LS1]
`
`[201o—11-30]
`
`3
`
`

`
`Data Compression
`
`Data compression is the process of reducing the number of bits used to represent
`data, It
`is one of the most significant results of information theory, an area of
`mathematics that addresses various ways to manage and manipulate information.
`Data compression entails wvo processes: in one process the data is compressed, or
`encoded, to reduce its size; in a second process it is uncompressed, or decoded, to
`return it to its original state.
`
`*\
`
`To understand why data compression is possible, we must first understand that all
`data can be characterized by some informational content, called its entropy (a term
`borrowed from thermodynamics). Compression is possible because most data is
`represented with more bits than its entropy suggests is optimal. To gauge the
`effectiveness of Compression, we look at the ratio of the size of the Compressed
`data divided by its original size, and subtract this from 1. This value is known as
`the data’s compression ratio.
`
`In the broadest sense, data compression methods are divided into two classes:
`105532 and lossless. In lossy compression we accept a certain loss of accuracy in
`exchange for greater compression ratios. This is acceptable in some applications,
`such as graphics and sound processing, provided the degradation is managed
`carefully. However, frequently we use lossless compression, which ensures that an
`exact copy of the original data is reproduced when uncompressed.
`
`This chapter focuses on lossless compression, for which there are two general
`approaches: mmimmn redzmdomcy coding and difczfi/‘on.dry~based metlaods. Mini-
`mum redundancy coding achieves Compression by encoding symbols that occur
`with great frequency using fewer bits than for
`those that occur less often.
`Dictionaiybased methods encode data in terms of tokens that take the place of
`redundant phrases. Example 14-1 is a header for the compression methods pre-
`sented in this chapter.
`
`4
`
`

`
`356
`
`Chapter 14: Data Compression
`
`This chapter covers:
`
`Bit operations
`
`An important part of data compression because most methods require operat-
`ing on data one bit at a time to some degree, C provides a number of bitwise
`operators that can be used to implement an extended class of bit operations.
`
`Huffman coding
`
`One of the oldest and most elegant forms of compression based on minimum
`redundancy coding. Fundamental to Huffman coding is the construction of a
`Huffman tree, which is used both to encode and decode the data. Huffman
`
`coding is not the most effective form of compression, but it runs fast both
`when compressing and uncompressing data.
`
`[Z 77 (Lempel—Zz'v—1977)
`One of the fundamental methods of dictionary—based compression. LZ77 uses
`a sliding window and a look—ahead buffer to encode symbols in terms of
`phrases encountered earlier in the data. LZ77 generally results in better com-
`pression ratios than Huffman coding, but with longer compression times.
`However, uncompressing data is generally very fast.
`
`Some applications of lossless data compression are:
`
`Software aism'butz‘on
`
`The process of delivering software on various media. When distributing soft-
`ware on physical media, such as compact discs or magnetic tapes and dis-
`kettes, reducing the amount of storage required can produce considerable cost
`savings in mass distributions.
`
`Arcbzmng
`Collecting groups of files into organized libraries. Typically, archives contain
`large amounts of data. Thus, after creating archives, frequently we compress
`them.
`
`Mobile computing
`
`An area of computing in which devices typically have limited amounts of
`memory and secondary storage. Mobile computing generally refers to comput-
`ing with small, portable devices such as advanced programmable calculators,
`electronic organizers, and other personal computing devices.
`
`Optiwmfzea’ netzuot‘/«mg (illztstrafed in this cbaptefl
`Compression is used especially when sending large amounts of data across
`wide—area networks. Bandwidth at certain points along wide-area networks is
`often limited. Although compressing and uncompressing data does require
`time, in many network applications the cost is well justified.
`
`Embedded applications
`An area of computing similar to mobile computing in that devices typically have
`somewhat
`limited amounts of memory and secondary storage. Examples of
`
`5
`
`

`
`Data Compression
`
`3 67
`
`embedded applications are lab instruments, avionics (aircraft electronics), VCRs,
`home stereos, and other pieces of equipment built around microcontrollers.
`
`Database systems
`
`Typically, large systems that can be optimized by reducing their size to some
`extent. Databases may be compressed at the record or file level.
`Online manuals
`
`Manuals that are accessed directly on a computer. Online manuals are typi~
`cally of considerable size, but many sections are not accessed on a regular
`basis. Therefore,
`it
`is common to store them in a compressed form and
`uncompress sections only as they are needed.
`
`Exarniple 14-]. Headerfor Data Compressloiz
`/******************************k**********************************************
`*
`*
`
`———————————————————————————— —— compress.h ———————————————————————————— ——
`
`*
`*
`
`*
`*
`
`*****************************************************************************/
`
`#ifndef COMPRESS_H
`#define COMPRESS_H
`
`#include "bitree.h"
`
`/*************************************************x***************************
`*
`*
`
`* Define a structure for nodes of Huffman trees.
`*
`
`*
`*
`
`*****************************************************************************/
`
`typedef struct HuffNode_ (
`
`unsigned char
`int
`
`symbol;
`freq;
`
`) HuffNode;
`
`/*****************************************************************************
`*
`*
`
`* Define a structure for entries in Huffman code tables.
`*
`
`*
`*
`
`*****************************************************************************/
`
`typedef struct HuffCode_ (
`
`unsigned char
`unsigned short
`unsigned char
`
`used;
`code;
`size;
`
`} Huffcode;
`
`6
`
`

`
`368
`
`Chapter I4: Data Compress/012
`
`Example 14-]. Headerfor Data Compresszmz (commued)
`/************************************************************************w****
`*
`*
`* Define the number of bits required for LZ77 token members.
`*
`*
`*
`*****************************************************************************/
`
`#define
`#define
`#define
`#define
`
`LZ77_TYPE_BITS
`lZ77_WINOFF_BITS
`LZ77_BUFLEN_BITS
`iJz7 7__NEXT_BI‘I‘S
`
`1
`
`5
`
`**
`
`Define the size of the sliding window and the look—ahead buffer for
`LZ77. Each must be less than or equal to 2 raised to LZ77_wINOFF_BITS
`and LZ77_BUFLEN_BITS respectively.
`
`****************************************************************************/
`
`#define
`#define
`
`LZ77__WINDOW_SIZE
`LZ77_BUFFER_SIZE
`
`4096
`32
`
`/*k***************************************************************************
`*
`*
`* Define the number of bits for LZ77 phrase tokens.
`*
`*
`*
`**************************************************k**************************/
`
`#define
`
`LZ77__PHRASE_BI'I‘S
`
`(LZ7 7_'I‘YPE_BI'I‘S+LZ77_WINOFF__BI’I‘S\
`+LZ 7 7_NEX'I‘_BI’I‘S+LZ 7 7___BU'E‘LEN__BITS)
`
`/*****************************************************************************
`*
`*
`* Define the number of bits for LZ77 symbol tokens.
`*
`*
`*
`*****************************************************************************/
`
`#define
`
`LZ77_SYMBOL__BI'I‘S
`
`(LZ7 7_TYPE__BITS+LZ7 7_NEX’I‘_BI'I‘S )
`
`I
`
`/**************************************************k**************************
`*
`*
`*
`*
`*
`*
`*****************************************************************************/
`
`————————————————————————— —— Public Interface ————————————————————————— ——
`
`int huffman_compress(const unsigned char *original, unsigned char
`**compressed,
`int size);
`
`int huffman_uncompress(const unsigned char *compressed, unsigned char
`**original);
`
`int 1z77_compress(const unsigned char *original, unsigned char **compressed,
`int size);
`
`7
`
`

`
`Interface for Bit Operations
`
`Example I4-1. Headerfor Data Compression (continued)
`
`int lz77_uncompress(const unsigned char *compressed, unsigned char
`**original) ;
`
`iiendi f
`
`Descrzptiorz of Bit Operations
`
`When compressing and uncompressing data, often we need to perform opera-
`tions on less than a single byte. Therefore, before discussing various methods of
`
`it is important to become familiar with some basic operations
`data compression,
`for working with data one bit at a time. These operations are necessary because
`bit operators in C work only with intrinsic integral operands, which are small. The
`
`operations presented in this section work with buffers containing any number of
`bits. Note that the set of operations presented here is rather incomplete. Specifi-
`cally, only those that are used in this chapter and in Chapter 15, Data Encryption,
`are defined.
`
`Interface for Bit Operations
`
`I9z't__get
`
`int; bit_get(const unsigned char *b1‘ ts, int pos);
`
`Return Value
`
`State of the desired bit: 1 or 0.
`
`Gets the state of the bit at position pos in the buffer bi ts. The
`Description
`leftmost position in the buffer is 0. The state returned is either 1 or 0,
`
`Complexity
`
`0(1)
`
`bz't_set
`
`void bit__set (unsigned char *bits, int pos, int state);
`
`Return Value None.
`
`Sets the state of the bit at position pos in the buffer bi ts to the
`Description
`Value specified by state. The leftmost position in the buffer is 0. The state must
`be 1 or 0.
`
`Complexity
`
`0(1)
`
`8
`
`

`
`3 70
`
`l9z't_.xor
`
`Chapter 14: Data Compression
`
`void bit_xor(c0nst unsigned char *bitsl, const unsigned char *bits2,
`unsigned char *bitsx, int size);
`
`Return Value None.
`
`Computes the bitwise XOR (exclusive OR) of the two buffers
`Description
`bi tel and bi ts2, each containing size bits, and returns the result in bi tsx. The
`bitwise XOR of two binary operands yields 0 in each position 17 of the result where
`in position 1‘ of the operands the bits are the same, and 1 in each position where
`the bits are different. For example, 11010 G 01011 = 10001 (CD denotes XOR). It is
`the responsibility of the caller to manage the storage required by bi tsx.
`
`Complexity
`
`O([3), where B is the number of bits in each buffer.
`
`bz't_r0t_left
`
`void bit_rot__left(unsigned char *bits,
`Return Value None.
`
`int size, int count);
`
`Rotates the buffer bits, containing size bits, to the left count
`Description
`bits. After the operation, the leftmost count bits become the count rightmost bits
`in the buffer, and all other bits are shifted accordingly.
`
`00213), Where n is the number of bits rotated to the left and [3 is
`Complexity
`the number of bits in the buffer.
`
`Implementation and Analysis
`of Bit Operations
`
`Each bit operation works with a buffer of data defined as a pointer to an unsigned
`Character. This pointer points to as many bytes as are required to represent the
`number of bits in the buffer. If the number of bits in the buffer is not a multiple of
`8, some bits in the final byte are not used.
`
`I9z't_get
`
`The bz't_get operation gets the state of a bit in a buffer (see Example 14-22). To do
`this, we determine in which byte the desired bit resides and then use a mask to
`get the specific bit from that byte. The bit set to 1 in mask determines which bit
`will be read from the byte. We use a loop to shift this bit into the proper position.
`We fetch the desired bit by indexing to the appropriate byte in bi ts and apply-
`ing the mask.
`
`9
`
`

`
`Inzplemenmtion and Analysis of Bit Operations
`
`3 71
`
`The runtime complexity of bi't_get is O(1). This is because all of the steps in get-
`
`ting the state of a bit in a buffer run in a constant amount of time.
`
`bz't__set
`
`The bz't_se1f operation sets the state of a bit in a buffer (see Example 14-2). This
`operation works similarly to bz't_get, except that it uses the mask to set the state of
`the specified bit rather than to get it.
`
`The runtime complexity of b1Tt_set is O(1). This is because all of the steps in get—
`ting the state of a bit in a buffer run in a constant amount of time.
`
`bz't_xor
`
`The bit_xor operation computes the bitwise XOR (exclusive OR) of two buffers,
`bi tsl and bi ts2, and places
`the
`result
`in another buffer, bi tsx (see
`
`Example 14-2). To do this, we compare the bit in position i of bi tsl With the bit
`in position i of bits2. If the bits are the same, we set the bit in position i of
`bitsx to 0; otherwise, we set the bit in position i of bitsx to 1. This process
`continues for as many bits are in each buffer, as specified by size.
`
`The runtime complexity of bz'z.‘_xor is O([3), where B is the number of bits in each
`buffer. This is because the loop in the operation iterates once for each bit.
`
`bz’t_rot_left
`
`The b1T1.‘_rozf_Zeft operation rotates a buffer a specified number of bits to the left (see
`Example 14-2). We begin by saving the leftmost bit of the leftmost byte and then
`shifting each byte one bit to the left. As we shift each byte, we set the rightmost
`bit of the preceding byte to the bit shifted off the left of the current byte. Once we
`have shifted the last byte, we set its rightmost bit to the bit shifted off the first
`byte. This process is repeated as many times as the number of bits to be rotated.
`
`The runtime complexity of bi't_7'ot_Zeft is OCHB), where 11 is the number of bits
`rotated to the left and [3 is the number of bits in the buffer. This is because for
`each rotation, ([3/8) + 1 shifts are performed to the left.
`
`Example 14-2. Implemematlon ofB-It Operations
`/*****************************************************************************
`*
`*
`
`*
`*
`
`——————————————————————————————— —— bit:.c —————————————————————————————— _— *
`*
`
`*****************************************************************************/
`
`iiinclude <string . h>
`
`itinclude "bit .h"
`
`10
`
`

`
`11
`
`

`
`Implementation and Analysis of Bit Operations
`
`Example 14-2. Implementatzbn of Bit Operations (continued)
`for (i = O; i < (pos % 8);
`i++)
`mask = mask >> 1;
`
`/********************k********************************************************
`*
`*
`
`* Set the bit.
`*
`
`*
`*
`
`*****************************************************************************/
`
`if (state)
`bits[pos / 8]
`else
`bits[pos / 8]
`
`return;
`
`bits[pos / 8]
`
`| mask;
`
`bits[pos / 8] & (~mask);
`
`/***************W*****************************************************t***k***
`*
`*
`
`*
`*
`
`—————————————————————————————— —— bit_xor ————————————————————————————— ——
`
`*
`*
`
`*****************************************************************************/
`
`void bit_xor(const unsigned char *bits1, const unsigned char *bits2, unsigned
`char *bitsx, int size)
`{
`
`int
`
`i ;
`
`/***************k*************************************************************
`*
`*
`
`* Compute the bitwise XOR (exclusive OR) of the two buffers.
`*
`
`*
`*
`
`*****************************************************************************/
`
`for (i = O; i < size;
`
`i++)
`
`(
`
`l= bit_get(bits2,
`i)
`if (bit_get(bitsl,
`bit_set(bitsx, i, 1);
`else
`bit_set(bitsx, i, O);
`
`i))
`
`/*****************************************************************************
`*
`*
`
`——————————————————————————— —— bit_rot_left ——————————————————————————— ——
`
`*
`*
`
`*
`*
`
`*****************************************************************************/
`
`12
`
`

`
`3 74
`
`Chapter I 4: Data Compression
`
`Example 14-2. Implemeizfatzbiz qfBtt Operations (con/tmted)
`
`void bit_rot_left(unsigned char *bits,
`
`int size, int count)
`
`(
`
`int
`
`fbit,
`
`/*****************************************************************************
`*
`*
`
`* Rotate the buffer to the left the specified number of bits.
`*
`
`*
`‘k
`
`*****************************************************************************/
`
`if (size > 0)
`
`{
`
`for (j = 0;
`
`j < count;
`
`j++)
`
`(
`
`for (i = 0; i <= ((size — 1)
`
`/ 8);
`
`i++)
`
`(
`
`/***w***************t************************************************
`‘k
`~k
`
`* Get the bit about to be shifted off the current byte.
`*
`
`*
`*
`
`***********************+*+***+***************************w******w***/
`
`lbit = bit_get(&bits[i], 0);
`
`if (i == 0)
`
`{
`
`/************************************************k****************
`*
`*
`
`* Save the bit shifted off the first byte for later.
`*
`
`*
`*
`
`*****************************************************************/
`
`fbit = lbit
`
`else {
`
`/*********************************W*******************************
`*
`*
`
`Set the rightmost bit of the previous byte to the leftmost
`* bit about to be shifted off the current byte.
`*
`
`‘
`
`*
`*
`*
`
`*‘k***‘k****‘k****‘k*******************~k*****7\'*k'k*******'k***'k***‘k‘)r‘k‘k**/
`
`bit_set(&bits[i — 1], 7, lbit);
`
`13
`
`

`
`Description ofHufiman Coding
`
`Example Z 4-2. Implernenlatton of Bit O_/Jercmons (co1m'nned)
`/********************************************************************
`*
`*
`
`* Shift the current byte to the left.
`*
`l
`
`*
`*
`
`********************************************************************/
`
`bits[i] = bits[i] << 1,-
`
`} /
`
`***********************************************************************
`*
`*
`
`* Set the rightmost bit of the buffer to the bit shifted off the
`*
`first byte.
`*
`
`*
`*
`*
`
`***********************************************************************/
`
`bit__set(bits, size — 1, fbit);
`
`Description ofHnfiman Coding
`
`One of the oldest and most elegant forms of data compression is Huffman coding,
`an algorithm based on minimum redundancy coding. Minimum redundancy cod-
`ing suggests that if we know how often different symbols occur in a set of data,
`we can represent the symbols in a Way that makes the data require less space. The
`idea is to encode symbols that occur more frequently with fewer bits than those
`that occur less frequently. It is important to realize that a symbol is not necessarily
`a character of text: a symbol can be any amount of data We choose, but it is often
`one byte’s worth.
`
`Entropy and Minimum Redundancy
`
`let’s revisit the concept of entropy introduced at the beginning of the
`To begin,
`Chapter. Recall that every set of data has some informational content, which is
`called its entropy. The entropy of a set of data is the sum of the entropies of each
`of its symbols. The entropy S of a symbol z is defined as:
`
`Sz = ~lgPz
`
`14
`
`

`
`3 76
`
`Chapter I 4: Data Compression
`
`where P2 is the probability of z being found in the data. If it is known exactly how
`many times 2 occurs, P2 is referred to as the frequency of 2. As an example, if z
`occurs 8 times in 32 symbols, or one—fourth of the time, the entropy of z is:
`
`—lg(1/4) = 2 bits
`
`This means that using any more than two bits to represent z is more than we
`need. If we consider that normally we represent a symbol using eight bits (one
`byte), we see that compression here has the potential to improve the representa—
`tion a great deal.
`
`Table 14-1 presents an example of calculating the entropy of some data contain-
`ing 72 instances of five different symbols. To do this, we sum the entropies con-
`tributed by each symbol. Using “U” as an example, the total entropy for a symbol
`is computed as follows. Since “U” occurs 12 times out of the 72 total, each
`instance of “U” has an entropy that is calculated as:
`
`—lg(12/72) = 2.584963 bits
`
`Consequently, because “U” occurs 12 times in the data,
`entropy of the data is calculated as:
`
`its contribution to the
`
`(2.584963)(12) = 31.01955 bits
`
`In order to calculate the overall entropy of the data, we sum the total entropies
`contributed by each symbol. To do this for the data in Table 14—1, we have:
`
`3101955 + 56.00000 + 23.53799 + 53.94552 + 36.95994 = 16146300 bits
`
`If using 8 bits to represent each symbol yields a data size of (72)(8) = 576 bits, we
`should be able to compress this data, in theory, by up to:
`
`1 — (161.463000/576) = 720%
`
`Table 14-1. 779e Entropy of a Set oj'Datct C'0mm‘nt1zg 72 Insrcmces of5 Different Symbols
`
`Probability
`12/72
`
`Entropy of Each Instance
`2.584963
`
`l Total Entropy
`51.01955
`
`18/72
`
`7/72
`
`15/72
`
`20/72
`
`2.000000 ‘
`
`5.362570
`
`2.265034
`
`1.847997
`
`36.00000
`
`25.53799
`
`53.94552
`
`56.95994
`
`Building a Huffman Tree
`
`Huffman coding presents a way to approximate the optimal representation of data
`based on its entropy. It works by building a data structure called a Huffmcm tree,
`
`15
`
`

`
`Description ofHuflmcm Coding
`
`3 77
`
`which is a binary tree (See Chapter 9, Trees) organized to generate Hufi"mom codes.
`Huffman codes are the codes assigned to symbols in the data to achieve compres-
`sion. However, Huffman codes result in compression that only approximates the
`data’s entropy because, as you may have noticed in Table 14-1, the entropies of
`symbols often come out to be fractions of bits. Since the actual number of bits
`used in Huffman codes cannot be fractions in practice, some codes end up with
`slightly too many bits to be optimal.
`
`Figure 14—1 illustrates the process of building a Huffman tree from the data in
`Table 14-1. Building a Huffman tree proceeds from its leaf nodes upward. To
`begin, we place each symbol and its frequency in its own tree (see Figure 14-1,
`step 1). Next, we merge the two trees whose root nodes have the smallest fre-
`quencies and store the sum of the frequencies in the new tree's root (see
`Figure 14-1, step 2). This process is then repeated until we end up with a single
`tree (see Figure 141, step 5), which is the final Huffman tree. The root node of
`this tree contains the total number of symbols in the data, and its leaf nodes con-
`tain the original symbols and their frequencies. Because Huffman coding continu-
`ally seeks out the two trees that appear to be the best to merge at any given time,
`it is a good example of a greedy algorithm (see Chapter 1, Introduction).
`
`Compressing and Uncompressing Data
`
`Building a Huffman tree is part of both compressing and uncompressing data. To
`compress data using a Huffman tree, given a specific symbol, we start at the root
`of the tree and trace a path to the symbol’s leaf. As we descend along the path,
`whenever we move to the left, we append O to the current code; whenever we
`move to the right, we append 1. Thus,
`in Figure 14-1, step 6,
`to determine the
`Huffman code for “U” we move to the right (1), then to the left (10), and then to
`the right again (101). The Huffman codes for all of the symbols in the figure are:
`
`U=101,V=O1,\W=100,X=00,Y=11
`
`To uncompress data using a Huffman tree, we read the compressed data bit by bit.
`Starting at the tree’s root, whenever we encounter 0 in the data, we move to the
`left in the tree; Whenever we encounter 1, We move to the right. Once we reach a
`leaf node, we generate the symbol it contains, move back to the root of the tree,
`and repeat the process until we exhaust the compressed data. Uncompressing data
`in this manner is possible because Huffman codes are prefix free, which means
`that no code is a prefix of any other. This ensures that once we encounter a
`sequence of bits that matches a code, there is no ambiguity as to the symbol it
`represents. For example, notice that 01, the code for “V,” is not a prefix of any of
`the other codes. Thus, as soon as we encounter O1 in the compressed data, we
`know that the code must represent “V.”
`
`16
`
`

`
`Chapter 14.‘ Data Compression
`
`0 "1l'!E'll.Y‘!l'l!.?E€l‘.?l!‘?l‘?l.’!!.ll€}?‘f!!‘.?!§9
`
`................
`
`0 Mmmergingthemes=°n'°!vin9.?r&9H9n€!%?.Z9'J!!.F2
`
`Q symbol O frequency
`rool nodes ofbinary trees
`
`0 A!!9r.!isr9J.".9.!l9.!t9s%.s9n!9@!1in9.f!s9H9!!si9é.!.$.9.'3lW 0 MM merainathe "‘?.°.‘..€9H!9i!1'19.lff%‘!E‘?flE‘.‘??.l.?.f‘.'!fl2°
`9
`
`0 After merging the trees containing frequencies 33 and 39 G Generating the Huffman code for the symbol ”U”
`:, ................................................................3
`!, ...............................................................
`%%%%%%%%%%%%%%%%%%%%%%%%%%
`
`Figure 14-1. Bufldmg a Huffman treefrom the symbols cmdfrequezzcles in. Table 14-1
`
`Effectiveness ofHuffman Coding
`
`To determine the reduced size of data compressed using Huffman coding, We cal-
`culate the product of each symbol’s frequency times the number of bits in its Huff—
`man code, then add them together. Thus, to calculate the compressed size of the
`data presented in Table 14-1 and Figure 14-1, we have:
`
`(12)(?>) + (18)(2) + (7)(3) + (15)(2) + (20)(2) = 165 bits
`
`Assuming that without compression each of the 72 symbols would be represented
`with 8 bits, for 21 total data size of 576 bits, We end up with the following compres-
`sion ratio:
`
`1~(163/576) = 71.7%
`
`17
`
`

`
`Inte7fczcefo7'Hufi‘man Coding
`
`379
`
`Once again, considering the fact that we cannot take into account fractional bits in
`Huffman coding, in many cases this Value will not be quite as good as the data’s
`entropy suggests, although in this case it is Very close.
`
`In general, Huffman coding is not the most effective form of compression, but it
`runs fast both when compressing and uncompressing data. Generally,
`the most
`time-consuming aspect of compressing data with Huffman coding is the need to
`scan the data twice: once to gather frequencies, and a second time actually to
`compress the data. Uncompressing the data is particularly efficient because decod-
`ing the sequence of bits for each symbol requires only a brief scan of the Huff—
`man tree, which is bounded.
`
`Interface for Hujjfman Coding
`
`l9itfi°mcin_compress
`
`int huffman_compress(const unsigned char *original, unsigned char **compressed,
`int size);
`
`Return Value Number of bytes in the compressed data if compressing the data
`is successful, or —1 otherwise.
`
`Uses Huffman coding to compress a buffer of data specified by
`Description
`original, which contains size bytes. The compressed data is written to a buffer
`returned in compressed. Since the amount of storage required in compressed is
`unknown to the caller, buffmoin_compress dynamically allocates the necessary
`storage using mizlloc. It is the responsibility of the caller to free this storage using
`free when it is no longer needed.
`
`Complexity
`
`002), where n is the number of symbols in the original data.
`
`loitjfinczn_iincompress
`
`int huffman__uncompress(const: unsigned char *compressed, unsigned
`char **original);
`
`Return Value Number of bytes in the restored data if uncompressing the data is
`successful, or -1 otherwise.
`‘
`
`Uses Huffman coding to uncompress a buffer of data specified by
`Description
`compressed. It is assumed that the buffer contains data previously compressed
`with /9tiffIt’l6l71_CO17’ip7‘eSS. The restored data is written to a buffer returned in
`original. Since the amount of storage required in original may not be known
`to the caller, biijj"7nan_ilncompress dynamically allocates the necessary storage
`using mozlloc. It is the responsibility of the caller to free this storage using free
`when it is no longer needed.
`
`Complexity
`
`O(n), where n is the number of symbols in the original data.
`
`18
`
`

`
`380
`
`Chapter I 4." Data Compression
`
`Implementation and Analysis
`ofH1/tfifman Coding
`
`With Huffman coding, we try to compress data by encoding symbols as Huffman
`codes generated in a Huffman tree. To uncompress the data, we rebuild the Huff~
`man tree used in the compression process and convert each code back to the sym-
`bol it represents. In the implementation presented here, a symbol in the original
`data is one byte.
`
`lonfi”1nan_co1npress
`
`The /9uffman_co1npress operation (see Example 14-3) compresses data using Huff-
`man coding. It begins by scanning the data to determine the frequency of each
`symbol. The frequencies are placed in an array, freqs. After scanning the data,
`the frequencies are scaled so that each can be represented in a single byte. This is
`done by determining the maximum number of times any symbol occurs in the
`data and adjusting the other frequencies accordingly. Since symbols that do not
`occur in the data should be the only ones with frequencies of O, we perform 21
`simple test to ensure that any nonzero frequencies that scale to less than 1 end up
`being set to 1 instead of 0.
`
`l9uz'Id_tree to build
`Once we have determined and sealed the frequencies, we call
`the Huffman tree. The bm'ld_z.‘ree function begins by inserting into a priority queue
`one binary tree for each symbol occurring at least once in the data. Nodes in the
`trees are HuffNode structures (see Example 14-1), This structure consists of two
`members: symbol is a symbol from the data (used only in leaf nodes), and freq is
`a frequency. Each tree initially contains only a single node, which stores one sym-
`bol and its scaled frequency as recorded and sealed in the freqs array.
`
`To build the Huffman tree, we use a loop to perform size — 1 merges of the trees
`Within the priority queue. On each iteration, we call pqneue_extmct twice to
`extract the two binary trees whose root nodes have the smallest frequencies. We
`then sum the frequencies, merge the trees into a new one, store the sum of the
`frequencies in the new tree’s root, and insert the new tree back into the priority
`queue. We continue this process until, after size —- 1 iterations,
`the only tree
`remaining in the priority queue is the final Huffman tree.
`
`Using the Huffman tree built in the previous step, we call butldjable to build a
`
`table of the Huffman codes assigned to every symbol. Each entry in the table is a
`Huffcode structure. This structure consists of three members: used is a flag set to
`
`1 or 0 indicating whether the entry has a code stored in it, code is the Huffman
`code stored in the entry, and size is the number of bits the code contains. Each
`code is a short integer because it can be proven (although this is not shown here)
`
`19
`
`

`
`Implementation and Analysis of Hujfmczn Coding
`
`381
`
`that when all frequencies are scaled to fit within one byte, no code will be longer‘
`than 16 bits.
`
`We build the table by traversing the Huffman tree using a preorder traversal (see
`Chapter 9). In each activation of Zn/.iTlcl_tabZe, code keeps track of the current Huff-
`man code being generated, and size maintains the number of bits it contains. As
`
`we traverse the tree, each time we move to the left, we append 0 to the code;
`each time we move to the right, we append 1. Once we encounter a leaf node, we
`store the Huffman code into the table of codes at the appropriate entiy. As we
`store each code, we call the network function /otons as a convenient way to ensure
`that the code is stored in big—endian format. This is the format required when we
`actually generate the compressed data in the next step as well as when we uncom-
`press it.
`
`While generating the compressed data, we use ipos to keep track of the current
`byte being processed in the original data, and opos to keep track of the current bit
`we are writing to the buffer of compressed data. To begin, we write a header that
`
`will help to rebuild the Huffman tree in buffmcm__tmc0mp7‘ess. The header con-
`tains a four—byte value for the number of symbols about to be encoded followed
`by the scaled frequencies of all 256 possible symbols, including those that are 0.
`Finally,
`to encode the data, we read one symbol at a time, look up its Huffman
`code in the table, and write each code to the compressed buffer. We allocate
`space for each byte in the compressed buffer as we need it.
`
`The runtime complexity of /9ZQ]_(fI?2clIZ_C0171p7"€SSlS 001), where n is the number of
`symbols in the original data. Only two parts of the algorithm depend on the size
`of the data: the part in which we determine the frequency of each symbol, and the
`part in which we read the data so we can compress it. Each of these runs in O(n)
`time. The time to build the Huffman tree does not affect
`the complexity of
`buff2nczn_compress because the running time of this process depends only on the
`number of different symbols in the data, which in this implementation i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket