`
`DROPBOX EX. 1018
`
`
`
`
`
`THE
`
`c
`
`PROGRAMMING
`LANGUAGE
`
`Second Edition
`
`Brian W. Kernighan • Dennis M. Ritchie
`
`AT&T Bell Laboratories
`Murray Hill, New Jersey
`
`PTR Prentice Hall, Englewood Cliffs, New Jersey 07632
`
`Dropbox Ex. 1018
`
`
`
`library of Conpess Cataloging-411.Publication Data
`
`Kernighan, Brian W.
`The C programming language.
`Includes index.
`1. C (Computer program language)
`Dennis M.
`II. Title.
`QA76.73.C15K47 1988
`005.13'3
`ISBN 0-13-110370-9
`ISBN 0-13-110362-8 (pbk.)
`
`I. Ritchie,
`
`88-5934
`
`Copyright c 1988, 1978 by Bell Telephone Laboratories, Incorporated.
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval
`system, or transmitted, in any form or by any means, electronic, mechanical, photocopy·
`ing, recording, or otherwise, without the prior written permission of the publisher.
`Printed in the United States of America. Published simultaneously in Canada.
`
`UNIX is a registered trademark of AT&T.
`
`This book was typeset (pie I tbl l eqn l troff -ms) in Times Roman and Courier by
`the authors, using an Autologic APS-5 phototypesetter and a DEC VAX 8550 running
`the 9th Edition of the UNIX• operating system.
`
`Prentice Hall Software Series
`Brian Kernighan, Advisor
`
`Printed in the United States of America
`
`IO
`
`ISBN
`ISBN
`
`0-13-110362-8
`0-3'3-110370-9
`
`{PBK}
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited, Sydney
`Prentice-Hall Canada Inc., Toronto
`Prentice-Hall Hispanoamericana, S.A:, Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Simon & Schuster Asia Pte. Ltd., Singapore
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`Dropbox Ex. 1018
`
`
`
`Contents
`
`ix
`
`xi
`
`l
`
`5
`s
`8
`13
`14
`lS
`22
`24
`27
`28
`31
`
`35
`3S
`36
`37
`40
`41
`41
`42
`46
`48
`so
`Sl
`S2
`
`SS
`SS
`SS
`
`Preface
`
`Preface to the Flrst Edition
`
`Introduction
`
`Chapter l. A Tutorial Introduction
`1.1 Getting Started
`1.2 Variables and Arithmetic Expressions
`1.3 The For Statement
`1.4 Symbolic Constants
`1.5 Character Input and Output
`1.6 Arrays
`1.7 Functions
`1.8 Arguments-Call by Value
`1.9 Character Arrays
`1.10 External Variables and Scope
`
`Chapter l. Types, Operators, and Expressions
`2.1 Variable Names
`2.2 Data Types and Sizes
`2.3 Constants
`2.4 Declarations
`2.5 Arithmetic Operators
`2.6 Relational and Logical Operators
`2.7 Type Conversions
`2.8
`Increment and Decrement Operators
`2.9 Bitwise Operators
`2.10 Assignment Operators and Expressions
`2.11 Conditional Expressions
`2.12 Precedence and Order of Evaluation
`
`Chapter 3. Control Flow
`3.1 Statements and Blocks
`3.2
`If-Else
`
`'
`
`Dropbox Ex. 1018
`
`
`
`vi
`
`THE C PROGRAMMING LANGUAGE
`
`CONTENTS
`
`3.3 Else-If
`3.4 Switch
`3.5 Loops- While and For
`3.6 Loops-Do-while
`3.7 Break and Continue
`3.8 Goto and Labels
`
`Chapter 4. Functions and Program Structure
`4.1 Basics of Functions
`4.2 Functions Returning Non-integers
`4.3 External Variables
`4.4 Scope Rules
`4.5 Header Files
`4.6 Static Variables
`4.7 Register Variables
`4.8 Block Structure
`4.9
`Initialization
`4.10 Recursion
`4.11 The C Preprocessor
`
`Chapter 5. Pointers and Arrays
`5.1
`Pointers and Addresses
`5.2 Pointers and Function Arguments
`5.3 Pointers and Arrays
`5.4 Address Arithmetic
`5.5 Character Pointers and Functions
`5.6 Pointer Arrays; Pointers to Pointers
`5.7 Multi-dimensional Arrays
`5.8
`Initialization of Pointer Arrays
`5.9 Pointers vs. Multi-dimensional Arrays
`5.10 Command-line Arguments
`5.11 Pointers to Functions
`5.12 Complicated Declarations
`
`Chapter 6. Structures
`6.1 Basics of Structures
`6.2 Structures and Functions
`6.3 Arrays of Structures
`6.4 Pointers to Structures
`6.5 Self-referential Structures
`6.6 Table Lookup
`6.7 Typedef
`6.8 Unions
`6.9 Bit-fields
`
`Chapter 7. Input and Output
`7.1
`Standard Input and Output
`7.2 Formatted Output-Printf
`
`57
`58
`60
`63
`64
`65
`
`67
`67
`71
`73
`80
`81
`83
`83
`84
`85
`86
`88
`
`93
`93
`95
`97
`100
`104
`107
`110
`113
`113
`114
`118
`122
`
`127
`127
`129
`132
`136
`139
`143
`146
`147
`149
`
`151
`151
`153
`
`Dropbox Ex. 1018
`
`
`
`CHAPTER e: Structures
`
`A structure is a collection of one or more variables, possibly of different
`types, grouped together under a single name for convenient handling. (Struc(cid:173)
`tures are called "records" in some languages, notably Pascal.) Structures help
`to organize complicated data, particularly in large programs, because they per(cid:173)
`mit a group of related variables to be treated as a unit instead of as separate
`entities.
`One traditional example of a structure is the payroll record: an employee is
`described by a set of attributes such as name, address, social security number,
`salary, etc. Some of these in turn could be structures: a name has several com(cid:173)
`ponents, as does an address and even a salary. Another example, more typical
`for C, comes from graphics: a point is a pair of coordinates, a rectangle is a pair
`of points, and so on.
`The main change made by the ANSI standard is to define structure
`assignment-structures may be copied and assigned to, passed to functions, and
`returned by functions. This has been supported by most compilers for many
`years, but the properties are now precisely defined. Automatic structures and
`arrays may now also be initialized.
`
`6. 1 Basics of Structures
`Let us create a few structures suitable for graphics. The basic object is a
`point, which we will assume has an x coordinate and a y coordinate, both
`integers.
`
`y
`
`(0,0)
`
`• (4,3)
`
`x
`
`127
`
`Dropbox Ex. 1018
`
`
`
`128
`
`STRUCTURES
`
`CHAPTER 6
`
`The two components can be placed in a structure declared like this:
`atruct point {
`int x;
`int y;
`
`} ;
`
`The keyword struct introduces a structure declaration, which is a list of
`declarations enclosed in braces. An optional name called a structure tag may
`follow the word struct (as with point here). The tag names this kind of
`structure, and can be used subsequently as a shorthand for the part of the
`declaration in braces.
`The variables named in a structure are called members. A structure
`member or tag and an ordinary (i.e., non-member) variable can have the same
`name without conflict, since they can always be distinguished by contexe
`Furthermore, the same member names may occur in different structures,
`although as a matter of style one would normally use the same names only for
`closely related objects.
`A struct declaration defines a type. The right brace that terminates the
`list of members may be followed by a list of variables, just as for any basic type.
`That is,
`atruct { .. . } x, y, z ;
`
`is syntactically analogous to
`int x, y, z;
`in the sense that each statement declares x. y and z to be variables of the
`named type and causes space to be set aside for them.
`A structure declaration that is not followed by a list of variables reserves no
`storage; it merely describes a template or the shape of a structure. If the
`declaration is tagged, however, the tag can be used later in definitions of
`instances of the structure. For example, given the declaration of point above,
`atruct point pt;
`defines a variable pt which is a structure of type struct point. A structure
`can be initialized by following its definition with a list of initializers, each a con(cid:173)
`stant expression, for the members:
`atruct point maxpt = { 320, 200 };
`An automatic structure may also be initialized by assignment or by calling a
`function that returns a structure of the right type.
`A member of a particular structure is referred to in an expression by a con(cid:173)
`struction of the form
`structure-name • member
`The structure member operator " • " connects the structure name and the
`member name. To print the coordinates of the point pt, for instance,
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.2
`
`STRUCTURES AND FUNCTIONS
`
`Jl9
`
`printf("~,Xd", pt.x, pt.y);
`
`or to compute the distance from the origin (O,O) to pt,
`
`double dist, sqrt(double);
`
`dist= sqrt((double)pt.x * pt.x + (double)pt.y * pt.y);
`Structures can be nested. One representation of a rectangle is a pair of
`points that denote the diagonally opposite corners:
`
`y Dpt2
`
`pt1
`
`x
`
`struct rect {
`struct point pt1;
`struct point pt2;
`
`} ;
`
`The rect structure contains two point structures. If we declare screen as
`
`struct rect screen;
`
`then
`
`screen.pt1 .x
`
`refers to the x coordinate of the pt 1 member of screen.
`
`6.2 Structures and Functions
`The only legal operations on a structure are copying it or assigning to it as a
`unit, taking its address with &, and accessing its members. Copy and assign(cid:173)
`ment include passing arguments to functions and returning values from func(cid:173)
`tions as well. Structures may not be compared. A structure may be initialized
`by a list of constant member values; an automatic structure may also be initial(cid:173)
`ized by an assignment.
`Let us investigate structures by writing some functions to manipulate points
`and rectangles. There are at least three possible approaches: pass components
`separately, pass an entire structure, or pass a pointer to it. Each has its good
`points and bad points.
`The first function, makepoint, will take two integers and return a point
`structure:
`
`Dropbox Ex. 1018
`
`
`
`130
`
`STRUCTURES
`
`CHAPTER 6
`
`/* makepoint: make a point from x and y components *'
`struct point makepoint(int x, int y)
`{
`
`struct point temp;
`temp.x = x;
`temp.y • y;
`return temp;
`
`}
`
`Notice that there is no conflict between the argument name and the member
`with the same name~ indeed the re-use of the names stresses the relationship.
`makepoint can now be used to initialize any structure dynamically, or to
`provide structure arguments to a function:
`
`struct rect screen;
`struct point middle;
`struct point makepoint(int, int);
`screen.pt1 = makepoint(O, O);
`screen.pt2 • makepoint(XMAX, YMAX);
`middle= makepoint((screen.pt1.x + screen.pt2.x)/2,
`(screen.pt1.y + screen.pt2.y)/2);
`
`The next step is a set of functions to do arithmetic on points. For instance,
`
`add two points *'
`
`I* addpoint:
`atruct point addpoint(struct point p1, struct point p2)
`{
`
`p1.x += p2.x;
`p1.y += p2.y;
`return p1;
`
`}
`
`Here both the arguments and the return value are structures. We incremented
`the components in p 1 rather than using an explicit temporary variable to
`emphasize that structure parameters are passed by value like any others.
`As another example, the function ptinrect tests whether a point is inside
`a rectangle, where we have adopted the convention that a rectangle includes its
`left and bottom sides but not its top and right sides:
`/* ptinrect:
`return 1 if p in r, 0 if not *'
`int ptinrect(struct point p, struct rect r)
`{
`
`return p.x >= r.pt1.x && p.x < r.pt2.x
`&& p.y >= r.pt1.y && p.y < r.pt2.y;
`
`}
`
`This assumes that the rectangle is represented in a standard form where the
`pt 1 coordinates are less than the· pt2 coordinates. The following function
`returns a rectangle guaranteed to be in canonical form:
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.2
`
`STRUCTURES AND FUNCTIONS
`
`131
`
`#define min( a, b) ((a) < (b) ? (a)
`#define max( a, b) ((a) > (b) ? (a)
`
`(b))
`(b))
`
`I• canonrect: canonicalize coordinates of rectangle •/
`struct rect canonrect(struct rect r)
`{
`
`struct rect temp;
`
`temp.pt1.x • min(r.pt1.x, r.pt2.x);
`temp.pt1.y = min(r.pt1.y, r.pt2.y);
`temp.pt2.x = max(r.pt1.x, r.pt2.x);
`temp.pt2.y • max(r.pt1.y, r.pt2.y);
`return temp;
`
`}
`
`If a large structure is to be passed to a function, it is generally more efficient
`to pass a pointer than to copy the whole structure. Structure pointers are just
`like pointers to ordinary variables. The declaration
`
`struct point •pp;
`
`says that pp is a pointer to a structure of type struct point. If pp points to
`a point structure, *PP is the structure. and (*PP) . x and ( *PP) . y are the
`members. To use pp. we might write, for example.
`
`struct point origin, •pp;
`pp = &.oriqin;
`printf(worigin is (%d,%d)\nw, (•pp).x, (•pp).y);
`
`The parentheses are necessary in ( *PP) • x because the precedence of the struc(cid:173)
`is higher than *. The expression *PP. x means
`ture member operator
`.
`* (pp. x), which is illegal here because x is not a pointer.
`Pointers to structures are so frequently used that an alternative notation is
`provided as a shorthand. If p is a pointer to a structure, then
`
`p->member-of-structure
`
`refers to the particular member. (The operator - > is a minus sign immediately
`followed by > .) So we could write instead
`
`printf(woriqin is (%d,%d)\nw, pp->x, pp->y);
`
`Both . and -> associate from left to right, so if we have
`struct rect r, •rp = &.r;
`then these four expressions are equivalent:
`
`r. pt 1.x
`rp->pt1 .x
`( r. pt 1). x
`( rp->pt 1 ) . x
`
`Dropbox Ex. 1018
`
`
`
`132
`
`STRUCTURES
`
`CHAPTER 6
`
`The structure operators • and ->, together with ( ) for function calls and [ ]
`for subscripts, are at the top of the precedence hierarchy and thus bind very
`tightly. For example, given the declaration
`
`struct {
`int len;
`char •str;
`} •p;
`
`then
`
`++p->len
`
`increments len, not p, because the implied parenthesization is ++ ( p->len).
`Parentheses can be used to alter the binding: ( ++p )->len increments p before
`(This last set of
`accessing len, and ( p++ )->len increments p afterward.
`parentheses is unnecessary.)
`In the same way, •p->str fetches whatever str points to; •p->str++
`increments str after accessing whatever it points
`to (just like *S++);
`( *P->str) ++ increments whatever str points to; and *P++->str increments
`p after accessing whatever str points to.
`
`6.3 Arrays of Structures
`Consider writing a program to count the occurrences of each C keyword.
`We need an array of character strings to hold the names, and an array of
`integers for the counts. One possibility is to use two parallel arrays, keyword
`and keycount, as in
`
`char •keyword[NKEYS];
`int keycount[NKEYS];
`
`But the very fact that the arrays are parallel suggests a different organization,
`an array of structures. Each keyword entry is a pair:
`
`char •word;
`int count;
`
`and there is an array of pairs. The structure declaration
`
`struct key {
`char •word;
`int count;
`} keytab[NKEYS];
`
`declares a structure type key, defines an array keytab of structures of this
`type, and sets aside storage for them. Each element of the array is a structure.
`This could also be written
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.3
`
`ARRA VS OF STRUCTURES
`
`133
`
`struct key {
`char •word;
`int count;
`
`} ;
`
`struct key keytab[NICEYS];
`
`Since the structure keytab contains a constant set of names, it is easiest to
`make it an external variable and initialize it once and for all when it is defined.
`The structure initialization is analogous to earlier ones-the definition is fol(cid:173)
`lowed by a list of initializers enclosed in braces:
`
`struct key {
`char *Word;
`int count;
`} keytab[] = {
`"auto", 0,
`"break", O,
`"case", O,
`"char", O,
`"const", O,
`"continue", O,
`"default", O,
`I• ... •I
`"unsigned" , 0,
`"void", O,
`"volatile", O,
`"while", 0
`
`} ;
`
`The initializers are listed in pairs corresponding to the structure members. It
`would be more precise to enclose initializers for each "row" or structure in
`braces, as in
`
`{ "auto", 0 },
`{ "break", O },
`{ "case", O },
`
`but the inner braces arc not necessary when the initializers arc simple variables
`or character strings, and when all are present. As usual, the number of entries
`in the array keytab will be computed if initializers are present and the [ ] is
`left empty.
`The keyword-counting program begins with the definition of keytab. The
`main routine reads the input by repeatedly calling a function qetword that
`fetches one word at a time. Each word is looked up in keytab with a version
`of the binary search function that we wrote in Chapter 3. The list of keywords
`must be sorted in increasing order in the table.
`
`Dropbox Ex. 1018
`
`
`
`134
`
`STRUCTURES
`
`CHAPTER 6
`
`#include <stdio.h>
`#include <ctype.h>
`#include <string.h>
`
`#define MAXWORD 100
`
`int getword(char •, int);
`int binsearch(char •, struct key•, int);
`
`I• count C keywords •I
`main()
`{
`
`int n;
`char word[MAXWORD];
`
`l• EOF)
`
`while (getword(word, MAXWORD)
`if (isalpha(word[O]))
`if ((n • binsearch(word, keytab, NKEYS)) >m 0)
`keytab[n].count++;
`for (n = O; n < NJCEYS; n++)
`if Ckeytab[n].count > 0)
`printf("%4d %s\n",
`keytab[n].count, keytab[n].word);
`
`return O;
`
`}
`
`find word in tab[OJ .•• tab[n-11 •/
`I• binsearch:
`int binsearch(char •word, struct key tab[], int n)
`{
`
`int cond;
`int low, high, mid;
`
`low • O;
`high • n - 1;
`while (low <• high) {
`mid • (low+high) I 2;
`if ((cond • strcmp(word, tab[mid].word)) < 0)
`high• mid - 1;
`else if (cond > 0)
`low = mid + 1;
`else
`return mid;
`
`}
`return -1;
`
`}
`
`We will show the function getword in a moment; for now it suffices to say
`that each call to getword finds a word, which is copied into the array named
`as its first argument.
`The quantity NKEYS is the number of keywords in keytab. Although we
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.3
`
`ARRA VS OF STRUCTURES
`
`135
`
`could count this by hand, it's a lot easier and safer to do it by machine, espe(cid:173)
`cially if the list is subject to change. One possibility would be to terminate the
`list of initializers with a null pointer, then loop along keytab until the end is
`found.
`But this is more than is needed, since the size of the array is completely
`determined at compile time. The size of the array is the size of one entry times
`the number of entries, so the number of entries is just
`
`size of keytab I size of struct key
`
`C provides a compile-time unary operator called sizeof that can be used to
`compute the size of any object. The expressions
`
`sizeof object
`
`and
`
`sizeof (type name)
`
`yield an integer equal to the size of the specified object or type in bytes.
`(Strictly, sizeof produces an unsigned integer value whose type, size_ t, is
`defined in the header <stddef. h>.) An object can be a variable or array or
`structure. A type name can be the name of a basic type like int or double,
`or a derived type like a structure or a pointer.
`In our case, the number of keywords is the size of the array divided by the
`size of one element. This computation is used in a #define statement to set
`the value of NKEYS:
`
`#define NKEYS
`
`(sizeof keytab I sizeof(struct key))
`
`·Another way to write this is to divide the array size by the size of a specific ele(cid:173)
`ment:
`
`#define NICEYS
`
`(sizeof keytab I sizeof keytab[O])
`
`This has the advantage that it does not need to be changed if the type changes.
`A sizeof can not be used in a #if line, because the preprocessor does not
`parse type names. But the expression in the #define is not evaluated by the
`preprocessor, so the code here is legal.
`Now for the function getword. We have written a more general getword
`than is necessary for this program, but it is not complicated. getword fetches
`the next "word" from the input, where a word is either a string of letters and
`digits beginning with a letter, or a single non-white space character. The func(cid:173)
`tion value is the first character of the word, or EOF for end of file, or the char(cid:173)
`acter itself if it is not alphabetic.
`
`Dropbox Ex. 1018
`
`
`
`136
`
`STRUCTURES
`
`CHAPTER 6
`
`I• qetword: qet next word or character from input •/
`int qetword(char •word, int lim)
`{
`
`int c, qetch(void)i
`void unqetch(int);
`char •w • word;
`
`while (isspace(c • qetch()))
`
`if (c I• EOF)
`*W++ = Ci
`if ( lisalpha(c)) {
`'\O, i
`*W •
`return c;
`
`}
`for ( ; --lim > O; w++)
`if (lisalnum(•w = qetch())) {
`unqetch ( •w) i
`break;
`
`}
`'\0' i
`*W •
`return word[O];
`
`}
`
`getword uses the getch and ungetch that we wrote in Chapter 4. When
`the collection of an alphanumeric token stops, getword has gone one character
`too far. The call to ungetch pushes that character back on the input for the
`next call. getword also uses isspace to skip white space, isalpha to iden(cid:173)
`tify letters, and isalnum to identify letters and digits; all are from the stand(cid:173)
`ard header <ctype. h>.
`
`Exercise 6-1. Our version of getword does not properly handle underscores,
`string constants, comments, or preprocessor control lines. Write a better ver(cid:173)
`sion. D
`
`6.4 Pointers to Structures
`To illustrate some of the considerations involved with pointers to and arrays
`of structures, let us write the keyword-counting program again, this time using
`pointers instead of array indices.
`The external declaration of keytab need not change, but main and
`binsearch do need modification.
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.4
`
`POINTERS TO STRUCTURES
`
`137
`
`#include <stdio.h>
`#include <ctype.h>
`#include <strinq.h>
`#define MAXWORD 100
`
`int getword(char *• int);
`struct key *binsearch(char *• struct key*• int);
`
`/* count C keywords; pointer version */
`main()
`{
`
`char word[MAXWORD];
`struct key *P;
`
`I= EOF)
`
`while (qetword(word, MAXWORD)
`if (isalpha(word[O]))
`if ((p=binsearch(word, keytab, NKEYS))
`p->count++;
`= keytab; p < keytab + NKEYS;
`(p->count > 0)
`printf("%4d %s\n", p->count,
`O· '
`
`for (p
`if
`
`return
`
`}
`
`I= NULL)
`
`p++)
`
`p->word);
`
`find word in tab[O] ... tab[n-1] */
`I* binsearch:
`struct key *binsearch(char *Word, struct key *tab, int n)
`{
`
`int cond;
`struct key *low= &tab[O];
`struct key *high= &tab[n];
`struct key *mid;
`
`while (low < high) {
`mid = low + (hiqh-low) I 2;
`if ((cond = strcmp(word, mid->word)) < 0)
`hiqh = mid;
`else if (cond > 0)
`low "' mid + 1;
`else
`return mid;
`
`}
`return NULL;
`
`}
`
`There are several things worthy of note here. First, the declaration of
`binsearch must indicate that it returns a pointer to struct key instead of
`an integer; this is declared both in the function prototype and in binsearch.
`If binsearch finds the word, it returns a pointer to it; if it fails, it returns
`NULL.
`Second, the elements of keytab are now accessed by pointers. This
`
`Dropbox Ex. 1018
`
`
`
`138
`
`STRUCTURES
`
`CHAPTER 6
`
`I• WRONG •/
`
`requires significant changes in binsearch.
`The initializers for low and high are now pointers to the beginning and just
`past the end of the table.
`The computation of the middle element can no longer be simply
`mid = (low+high) I 2
`because the addition of two pointers is illegal. Subtraction is legal, however, so
`hiqh-low is the number of elements, and thus
`mid = low + (hiqh-low) I 2
`sets mid to point to the element halfway between low and high.
`The most important change is to adjust the algorithm to make sure that it
`does not generate an illegal pointer or attempt to access an element outside the
`array. The problem is that &.tab[-1] and &.tab[n] are both outside the lim(cid:173)
`its of the array tab. The former is strictly illegal, and it is illegal to derefer(cid:173)
`ence the latter. The language definition does guarantee, however, that pointer
`arithmetic that involves the first element beyond the end of an array (that is,
`&.tab[n]) will work correctly.
`In main we wrote
`for (p = keytab; p < keytab + NKEYS; p++)
`If p is a pointer to a structure, arithmetic on p takes into account the size of the
`structure, so p++ increments p by the correct amount to get the next element of
`the array of structures, and the test stops the loop at the right time.
`Don't assume, however, that the size of a structure is the sum of the sizes of
`its members. Because of alignment requirements for different objects, there
`may be unnamed "holes" in a structure. Thus, for instance, if a char is one
`byte and an int four bytes, the structure
`struct {
`char c;
`int i;
`
`} ;
`
`might well require eight bytes, not five. The sizeof operator returns the
`proper value.
`Finally, an aside on program format: when a function returns a complicated
`type like a structure pointer, as in
`
`struct key •binsearch(char •word, struct key •tab, int n)
`
`the function name can be hard to see, and to find with a text editor. Accord(cid:173)
`ingly an alternate style is sometimes used:
`struct key *
`binsearch(char •word, struct key •tab, int n)
`
`This is a matter of personal taste; pick the form you like and hold to it.
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.S
`
`SELF-REFERENTIAL STRUCTURES
`
`139
`
`6.5 Self·referentlal Structures
`Suppose we want to handle the more general problem of counting the
`occurrences of all the words in some input. Since the list of words isn't known
`in advance, we can't conveniently sort it and ·use a binary search. Yet we can't
`do a linear search for each word as it arrives, to see if it's already been seen; the
`program would take too long. (More precisely, its running time is likely to grow
`quadratically with the number of input words.) How can we organize the data
`to cope efficiently with a list of arbitrary words?
`One solution is to keep the set of words seen so far sorted at all times, by
`placing each word into its proper position in the order as it arrives. This
`shouldn't be done by shifting words in a linear array, though-that also takes
`too long. Instead we will use a data structure called a binary tree.
`The tree contains one "node" per distinct word; each node contains
`
`a pointer to the text of the word
`a count of the number of occurrences
`a pointer to the left child node
`a pointer to the right child node
`
`No node may have more than two children; it might have only zero or one.
`The nodes are maintained so that at any node the left subtree contains only
`words that are lexicographically less than the word at the node, and the right
`subtree contains only words that are greater. This is the tree for the sentence
`"now is the time for all good men to come to the aid of their party", as built by
`inserting each word as it is encountered:
`
`now
`
`""'
`. /
`ro/ ~n o( "time
`\
`\
`I \
`good
`party their
`to
`
`IS
`
`the
`
`/
`aJJ
`I \
`aid come
`
`To find out whether a new word is already in the tree, start at the root and
`compare the new word to the word stored at that node. If they match, the ques(cid:173)
`tion is answered affirmatively. If the new word is less than the tree word, con(cid:173)
`tinue searching at the left child, otherwise at the right child. If there is no child
`in the required direction, the new word is not in the tree, and in fact the empty
`slot is the proper place to add the new word. This process is recursive, since the
`search from any node uses a search from one of its children. Accordingly,
`recursive routines for insertion and printing will be most natural.
`Going back to the description of a node, it is conveniently represented as a
`structure with four components:
`
`Dropbox Ex. 1018
`
`
`
`140
`
`STRUCTURES
`
`CHAPTER 6
`
`struct tnode {
`char •word;
`int count;
`struct tnode •left;
`struct tnode •right;
`
`} ;
`
`the
`I•
`I•
`I•
`I•
`
`tree node: •I
`points to the text •/
`number of occurrences •I
`left child •/
`right child •/
`
`This recursive declaration of a node might look chancy, but it's correct. It is
`illegal for a structure to contain an instance of itseif, but
`
`struct tnode •left;
`
`declares left to be a pointer to a tnode, not a tnode itself.
`Occasionally, one needs a variation of self-referential structures: two struc(cid:173)
`tures that refer to each other. The way to handle this is:
`
`struct t {
`
`struct s •p;
`
`} ;
`struct s {
`
`I• p points to an s •/
`
`struct t •q;
`
`I• q points to a t •/
`
`} ;
`
`The code for the whole program is surprisingly small, given a handful of sup(cid:173)
`porting routines like getword that we have already written. The main routine
`reads words with getword and installs them in the tree with addtree.
`
`#include <stdio.h>
`#include <ctype.h>
`#include <string.h>
`
`#define MAXWORD 100
`struct tnode •addtree(struct tnode •, char•);
`void treeprint(struct tnode •);
`int getword(char •, int);
`
`I• word frequency count •/
`main()
`{
`
`struct tnode •root;
`char word [ MAXWORD 1 ;
`root = NULL;
`while (getword(word, MAXWORD)
`if (isalpha(word[O]))
`root= addtree(root, word);
`treeprint(root);
`return O;
`
`I= EOF)
`
`}
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.S
`
`SELF-REFERENTIAL STRUCTURES
`
`141
`
`The function addtree is recursive. A word is presented by main to the top
`level (the root) of the tree. At each stage, that word is compared to the word
`already stored at Lhe node, and is percolated down to either the left or right sub(cid:173)
`tree by a recursive call to addtree. Eventually the word either matches some(cid:173)
`thing already in the tree (in which case the count is incremented), or a null
`pointer is encountered, indicating that a node must be created and added to the
`tree. If a new node is created,- addtree returns a pointer to it, which is
`installed in the parent node.
`
`struct tnode •talloc(void);
`char •atrdup(char •);
`
`I• addtree: add a node with w, at or below p •/
`struct tnode •addtree(struct tnode •p, char •w)
`{
`
`int cond;
`
`/• a new word has arrived •/
`if (p •• NULL) {
`I• make a new node•/
`p • talloc();
`p->word • strdup(w);
`p->count = 1;
`p->left = p->right = NULL;
`} else if ((cond = strcmp(w, p->word)) == 0)
`p->count++;
`I• repeated word •/
`else if (cond < 0)
`I• less than into left sUbtree •/
`p->left = addtree(p->left, w);
`else
`I• greater than into right sUbtree •/
`p->right = addtree(p->right, w);
`return p;
`
`}
`
`Storage for the new node is fetched by a routine talloc, which returns a
`pointer to a free space suitable for holding a tree node, and the new word is
`copied to a hidden place by strdup. (We will discuss these routines in a
`moment.) The count is initialized, and the two children are made null. This
`part of the code is executed only at the leaves of the tree, when a new node is
`being added. We have (unwisely) omitted error checking on the values returned
`by strdup and talloc.
`treeprint prints the tree in sorted order; at each node, it prints the left
`subtree (all the words less than this word), then the word itself, then the right
`subtree (all the words greater). If you feel shaky about how recursion works,
`simulate treeprint as it operates on the tree shown above,
`
`Dropbox Ex. 1018
`
`
`
`142
`
`STRUCTURES
`
`CHAPTER 6
`
`in-order print of tree p */
`/* treeprint:
`void treeprint(struct tnode *P)
`{
`
`if (p I= NULL) {
`treeprint(p->left);
`printf(""4d "•\n", p->count, p->word);
`treeprint(p->right);
`
`}
`
`}
`
`A practical note: if the tree becomes "unbalanced" because the words don't
`arrive in random order, the running time of the program can grow too much.
`As a worst case, if the words are already in order, this P"Ogram does an expen·
`sive simulation of linear search. There are generalizations of the binary tree
`that do not suffer from this worst-case behavior, but we will not describe them
`here.
`Before we leave this example, it is also worth a brief digression on a problem
`related to storage allocators. Clearly it's desirable that there be only one
`storage allocator in a program, even though it allocates different kinds of
`objects. But if one allocator is to process requests for, say, pointers to chars
`and pointers to struct tnodes, two questions arise. First, how does it meet
`the requirement of most real machines that objects of certain types must satisfy
`alignment restrictions (for example, integers often must be located at even
`addresses)? Second, what declarations can cope with the fact that an allocator
`must necessarily return different kinds of pointers?
`Alignment requirements can generally be satisfied easily, at the cost of some
`wasted space, by ensuring that the allocator always returns a pointer that meets
`all alignment restrictions. The alloc of Chapter 5 does not guarantee any
`particular alignment, so we will use the standard library function malloc,
`which does. In Chapter 8 we will show one way to implement malloc.
`The question of the type declaration for a function like malloc is a vexing
`one for any language that takes its type-checking seriously. In C, the proper
`method is to declare that malloc returns a pointer to void, then explicitly
`coerce the pointer into the desired type with a cast. malloc and related rou(cid:173)
`tines are declared in the standard header <stdlib. h>. Thus talloc can be
`written as
`
`#include <stdlib.h>
`
`I* talloc: make a tnode */
`struct tnode *talloc(void)
`{
`
`return (struct tnode *> malloc(sizeof(struct tnode));
`
`}
`
`strdup merely copies the string given by its argument into a safe place,
`obtained by a call on malloc:
`
`Dropbox Ex. 1018
`
`
`
`SECTION 6.6
`
`TABLE LOOKUP
`
`143
`
`char •strdup(char •s)
`{
`
`char •p;
`
`I• make a duplicate of s •I
`
`p • (char•) malloc(strlen(s)+1);
`if (p I• NULL)
`strcpy(p, s);
`return p;
`
`}
`
`/• +1 for '\O' •I
`
`malloc returns NULL if no space is available; strdup passes that value on,
`leaving error-han