throbber
not
`
`the
`be
`
`~ith
`
`3a
`
`TCP: Transmission Control
`Protocol
`
`Introduction
`
`The Transmission Control Protocol, or TCP, provides a connection-oriented, reliable,
`byte-stream service between the two end points of an application. This is completely
`different from UDP’s connectionless, unreliable, datagram service.
`The implementation of UDP presented in Chapter 23 comprised 9 functions and
`about 800 lines of C code. The TCP implementation we’re about to describe comprises
`28 functions and almost 4,500 lines of C code. Therefore we divide the presentation of
`TCP into multiple chapters.
`These chapters are not an introduction to TCP. We assume the reader is familiar
`with the operation of TCP from Chapters 17-24 of Volume 1.
`
`24.2
`
`Code Introduction
`
`The TCP functions appear in six C files and numerous TCP definitions are in seven
`headers, as shown in Figure 24.1.
`Figure 24.2 shows the relationship of the various TCP functions to other kernel
`functions. The shaded ellipses are the nine main TCP functions that we coven Eight of
`these functions appear in the TCP protosw structure (Figure 24.8) and the ninth is
`tcp_output.
`
`795
`
`Ex.1013.821
`
`DELL
`
`

`

`796
`
`TCP: Transmission Control Protocol
`
`Chapter
`
`¯ Section 24
`
`~le
`netinet/tcp.h
`netinet/tcp_debug.h
`netinet/tcp_fsm.h
`netinet/tcp_seq.h
`netinet/tcp_timer.h
`netinet/tcp_var.h
`netinet/tcpip.h
`netinet/tcp_debug-c
`netinet/tcp_input.c
`netinet/tcp_output.c
`netinet/tcp_subr.c
`netinet/tcp_timer.c
`netinet/tcp_usrreq.c
`
`Description
`
`Global
`
`tcphdr structure definition
`t cp_debug structure definition
`definitions for TCP’s finite state machine
`TCP
`
`macros for comparing sequence numbers
`definitions for TCP timers
`t cpcb (control block) and tcp s t a t (statistics) structure definitions
`TCP plus IP header definition
`support for SO_DEBUG socket debugging (Section 27.10)
`t cp_input and ancillary functions (Chapters 28 and 29)
`tcp_output and ancillary functions (Chapter 26)
`miscellaneous TCP subroutines (Chapter 27)
`TCP timer handling (Chapter 25)
`PRU xxx request handling (Chapter 30)
`
`Figure 24.1 Files discussed in the TCP chapters.
`
`system initialization
`
`socket
`receive buffer
`
`various
`system calls
`
`getsockopt
`setsockopt
`
`_ D
`
`Statist
`
`software interrupt
`
`Figure 24.2 Relationship of TCP functions to rest of the kernel.
`
`Ex.1013.822
`
`DELL
`
`

`

`-~hapter 24
`
`n 24.2
`
`Code Introduction 797
`
`tions
`
`,ckopt
`ckopt
`
`}lobal Variables
`
`Figure 24.3 shows the global variables we encounter throughout the TCP functions.
`
`Variable
`
`tcb
`tcp_last_inpcb
`tcpstat
`tcp. outflags
`tcp_recvspace
`tcp_sendspace
`tcp. iss
`tcprexmtthresh
`tcp_mssdflt
`tcp rttdflt
`tcp do rfc1323
`tcp now
`tcp_keepidle
`tcp_keepintvl
`
`Datatype
`struct inpcb
`struct inpcb *
`struct tcpstat
`u_char
`u_long
`u_long
`tcp_seq
`int
`int
`int
`int
`u_long
`int
`int
`
`tcp maxidle
`
`int
`
`Description
`head of the TCP Internet PCB list
`pointer to PCB for last received segment: one-behind cache
`TCP statistics (Figure 24.4)
`array of output flags, indexed by connection state (Figure 24.16)
`default size of socket receive buffer (8192 bytes)
`default size of socket send buffer (8192 bytes)
`initial send sequence number (ISS)
`number of duplicate ACKs to trigger fast retransmit (3)
`default MSS (512 bytes)
`default RTT if no data (3 seconds)
`if true (default), request window scale and timestamp options
`500 ms counter for RFC 1323 timestamps
`keepalive: idle time before first probe (2 hours)
`keepalive: interval between probes when no response (75 sec)
`(also used as timeout for connect)
`keepalive: time after probing before giving up (10 min)
`
`Figure 24.3 Global variables introduced in the following chapters.
`
`Statistics
`
`Various TCP statistics are maintained in the global structure tcpstat, described in Fig-
`ure 24.4. We’ll see where these counters are incremented as we proceed through the
`code.
`Figure 24.5 shows some sample output of these statistics, from the netstat
`command. These statistics were collected after the host had been up for 30 days. Since
`some counters come in pairs--one counts the number of packets and the other the
`number of bytes--we abbreviate these in the figure. For example, the two counters for
`the second line of the table are tcps_sndpack and tcps_sndbyte.
`
`-s
`
`The counter for tcps_sndbyte should be 3,722,884,824, not -22,194,928 bytes. This is an
`
`average of about 405 bytes per segment, which makes sense. Similarly, the counter for
`tcps_rcvackbyte should be 3,738,811,552, not -21,264,360 bytes (for an average of about 565
`bytes per segment). These numbers are incorrectly printed as negative numbers because the
`printf calls in the netstat program use %d (signed decimal) instead of %lu (long integer,
`unsigned decimal). All the counters are unsigned long integers, and these two counters are
`near the maximum value of an unsigned 32-bit long integer (232 - 1 = 4, 294, 967, 295).
`
`Ex.1013.823
`
`DELL
`
`

`

`798
`
`TCP: Transmission Control Protocol
`
`Chapter 24
`
`Section
`24
`
`Ised by I
`
`~N!
`
`tcpstat member
`
`tcps_accepts
`tcps_closed
`tcps_connattempt
`tcps_conndrops
`tcps_connects
`tcps_delack
`tcps_drops
`tcps_keepdrops
`tcps_keepprobe
`tcps_keeptimeo
`tcps_pawsdrop
`tcps_pcbcachemiss
`tcps_persisttimeo
`tcps~redack
`tcps_preddat
`tcps_rcvackbyte
`tcps_rcvackpack
`tcps_rcvacktoomuch
`tcps_rcvafterclose
`tcps_rcvbadoff
`tcps_rcvbadsum
`tcps_rcvbyte
`tcps_rcvbyteafterwin
`tcps_rcvdupack
`tcps_rcvdupbyte
`tcps_rcvduppack
`tcps_rcvoobyte
`tcps_rcvoopack
`tcps_rcvpack
`tcps_rcvpackafterwin
`tcps_rcvpartdupbyte
`tcps_rcvpartduppack
`tcps_rcvshort
`tcps_rcvtotal
`tcps_rcvwinprobe
`tcps_rcv%vinupd
`tcps_rexmttimeo
`tcps_rttupdated
`tcps_segstimed
`tcps_sndacks
`tcps_sndbyte
`tcps_sndctrl
`tcps_sndpack
`tcps_sndprobe
`tcps_sndrexmitbyte
`tcps_sndrexmitpack
`tcps_sndtotal
`tcps_sndurg
`tcps_sndwinup
`tcps_timeoutdrop
`
`Description
`
`#SYNs received in LISTEN state
`#connections closed (includes drops)
`#connections initiated (calls to connect)
`#embryonic connections dropped (before SYN received)
`#connections established actively or passively
`#delayed ACKs sent
`#connections dropped (after SYN received)
`#connections dropped in keepalive (established or awaiting SYN)
`#keepalive probes sent
`#times keepalive timer or connection-establishment timer expire
`#segments dropped due to PAWS
`~times PCB cache comparison fails
`#times persist timer expires
`#times header prediction correct for ACKs
`#times header prediction correct for data packets
`#bytes ACKed by received ACKs
`#received ACK packets
`#received ACKs for unsent data
`#packets received after connection closed
`#packets received with invalid header length
`#packets received with checksum errors
`#bytes received in sequence
`#bytes received beyond advertised window
`#duplicate ACKs received
`#bytes received in completely duplicate packets
`#packets received with completely duplicate bytes
`#out-of-order bytes received
`#out-of-order packets received
`#packets received in sequence
`#packets with some data beyond advertised window
`#duplicate bytes in part-duplicate packets
`#packets with some duplicate data
`#packets received too short
`total #packets received
`#window probe packets received
`#received window update packets
`#retransmit timeouts
`#times RTT estimators updated
`#segments for which TCP tried to measure RTT
`#ACK-only packets sent (data length = 0)
`#data bytes sent
`#control (SYN, FIN, RST) packets sent (data length = 0)
`#data packets sent (data length > 0)
`#window probes sent (1 byte of data forced by persist timer)
`#data bytes retransmitted
`#data packets retransmitted
`total #packets sent
`#packets sent with URG-only (data length = 0)
`#window update-only packets sent (data length = 0)
`#connections dropped in retransmission timeout
`
`
`
`Figure 24.4 TCP statistics maintained in the tcpstat structure.
`
`~, 655,
`9,17
`257,
`862,
`229
`3,45
`74,9
`279,
`8,801,9
`6,61
`235,
`0 ac
`4,67
`46,g
`22 c
`3,44
`77,1
`1,8_c
`1,7[
`175,
`1,03
`60,~
`279
`0 di
`144,02(
`92,595
`126,82(
`237,74[
`ii0, Oil
`6,363,!
`114,79’
`86,
`i, 173 ]
`16,419
`6,8
`3,2
`733,13
`1,266,
`i, 851,
`
`SNMP’
`
`Ex.1013.824
`
`DELL
`
`

`

`24.2
`
`Code Introduction 799
`
`netstat -s output
`10,655,999 packets sent
`9,177,823 data packets (-22,194,928 bytes)
`257,295 data packets (81,075,086 bytes) retransmitted
`862,900 ack-only packets (531,285 delayed)
`229 URG-only packets
`3,453 window probe packets
`74,925 window update packets
`279,387 control packets
`~,801,953 packets received
`6,617,079 acks (for -21,264,360 bytes)
`235,311 duplicate acks
`0 acks for unsent data
`4,670,615 packets (324,965,351 bytes) rcvd in-sequence
`46,953 completely duplicate packets (1,549,785 bytes)
`22 old duplicate packets
`3,442 packets with some dup. data (54,483 bytes duped)
`77,114 out-of-order packets (13,938,456 bytes)
`1,892 packets (1,755 bytes) of data after window
`1,755 window probes
`175,476 window update packets
`1,017 packets received after close
`60,370 discarded for bad checksums
`279 discarded for bad header offset fields
`0 discarded because packet too short
`144,020 connection requests
`92,595 connection accepts
`126,820 connections established (including accepts)
`237,743 connections closed (including 1,061 drops)
`110,016 embryonic connections dropped
`6,363,546 segments updated rtt (of 6,444,667 attempts)
`114,797 retransmit timeouts
`86 connection dropped by rexmit timeout
`1,173 persist timeouts
`16,419 keepalive timeouts
`6,899 keepalive probes sent
`3,219 connections dropped by keepalive
`733,130 correct ACK header predictions
`1,266,889 correct data packet header predictions
`1,851,557 cache misses
`
`tcpstat members
`tcps_sndtotal
`tcps_snd{pack,byte}
`tcps_sndrexmit{pack,byte}
`tcps_sndacks,tcps_delack
`tcps_sndurg
`tcps_sndprobe
`tcps_sndwinup
`tcps_sndctrl
`tcps_rcvtotal
`tcps_rcvack{pack,byte]
`tcps_rcvdupack
`tcps_rcvacktoomuch
`tcps_rcv{pack,byte)
`tcps_rcvdup{pack,byte]
`tcps_pawsdrop
`tcps_rcvpartdup{pack,byte]
`tcps_rcvoo{pack,byte}
`tcps_rcv{pack,byte}afterwin
`tcps_rcvwinprobe
`tcps_rcvwindup
`tcps_rcvafterclose
`tcps_rcvbadsum
`tcps_rcvbadoff
`tcps_rcvshort
`tcps_connattempt
`tcps_accepts
`tcps_connects
`tcps_closed, tcps_drops
`tcps_conndrops
`tcps_{rttupdated, segstimed}
`tcps_rexmttimeo
`tcps_timeoutdrop
`tcps_persisttimeo
`tcps_keeptimeo
`tcps_keepprobe
`tcps_keepdrops
`tcps_predack
`tcps_preddat
`tcps_pcbcachemiss
`
`Figure 24.5 Sample TCP statistics.
`
`SNMP Variables
`
`Figure 24.6 shows the 14 simple SNMP variables in the TCP group and the counters
`from the tcpstat structure implementing that variable. The constant values shown
`for the first four entries are fixed by the Net/3 implementation. The counter
`tcpCurrEstab is computed as the number of Internet PCBs on the TCP PCB list.
`Figure 24.7 shows tcpTabl e, the TCP listener table.
`
`Ex.1013.825
`
`DELL
`
`

`

`800
`
`TCP: Transmission Control Protocol
`
`Chapter 24
`
`SNMP variable
`
`tcpRtoAlgorithm
`
`t cpRt oMin
`t cpRt oMax
`t cpMaxConn
`t cpAct iveOpens
`tcpPassiveOpens
`t cpAt t erupt Fa i 1 s
`
`tcpEstabResets
`
`t cpCurrEstab
`
`t cpI nS eg s
`tcpOutSegs
`
`tcpRetransSegs
`tcpInErrs
`
`t cpOutRs t s
`
`t eps t at members
`or constant
`4
`
`Description
`
`10 00
`64 0 00
`- 1
`t cps_connat tempt
`tcps_accepts
`t cps_conndrops
`
`t cps_dro~)s
`
`(see text)
`
`algorithm used to calculate retransmission timeout value:
`1 = none of the following,
`2 = a constant RTO,
`3 = MIL-STD-1778 Appendix B,
`4 = Van Jacobson’s algorithm.
`minimum retransmission timeout value, in milliseconds
`maximum retransmission timeout value, in milliseconds
`maximum #TCP connections (-1 if dynamic)
`#transitions from CLOSED to SYN_SENT states
`#transitions from LISTEN to SYN_RCVD states
`#transitions from SYN_SENT or SYN_RCVD to CLOSED,
`plus #transitions from SYN_RCVD to LISTEN
`!#transitions from ESTABLISHED or CLOSE_WAlT states to
`CLOSED
`#connections currently in ESTABLISHED or CLOSE_WAIT
`states
`total #segments received
`t cps_r cvt o t a I
`total #segments sent, excluding those containing only
`tcps_sndtotal -
`retransmitted bytes
`tcps_sndrexmitpack
`tcps_sndrexmitpack total #retransmitted segments
`tcps_rcvbadsum +
`total #segments received with an error
`t cps_rcvbadof f +
`tcps_rcvshor t
`(not implemented)
`
`total #segments sent with RST flag set
`
`Figure 24.6 Simple SNMP variables in tep group.
`
`
`
`
`
`
`SNMP variable
`PCB variable
`tcpConnSt ate
`t_state
`
`index = < tcpConnLocalAddress >.< tcpConnLocalPort >.< tcpConnRemAddress >.< tcpConnRemPort >
`Description
`state of connection: 1 = CLOSED, 2 = LISTEN,
`3 = SYN_SENT, 4 = SYN_RCVD, 5 = ESTABLISHED
`6 = FIN_WAIT_l, 7 = FIN_WAIT_2, 8 = CLOSE_WAI
`9 = LAST_ACK, 10 = CLOSING, 11 = TIME_WAIT,
`12 = delete TCP control block.
`local IP address
`local port number
`foreign IP address
`foreign port number
`
`tcpConnLocalAddress
`tcpConnLocalPort
`tcpConnRemAddress
`tcpConnRemPort
`
`inp_laddr
`inp_iport
`inp_faddr
`inp_fport
`
`~
`
`Figure 24.7 Variables in TCP listener table: tcpTable.
`
`The first PCB variable (t_state) is from the TCP control block (Figure 24.13) and
`remaining four are from the Internet PCB (Figure 22.4).
`
`Ex.1013.826
`
`DELL
`
`

`

`24.4
`
`TCP Header
`
`801
`
`!4.3 TCP protosw Structure
`
`Figure 24.8 lists the TCP protosw structure, the protocol switch entry for TCR
`
`Description
`TCP provides a byte-stream service
`TCP is part of the Internet domain
`appears in the ±p~ field of the IP header
`socket layer flags, not used by protocol processing
`receives messages from IP layer
`not used by TCP
`control input function for ICMP errors
`respond to administrative requests from a process
`respond to communication requests from a process
`initialization for TCP
`fast timeout function, called every 200 ms
`slow timeout function, called every 500 ms
`called when kernel runs out of mbufs
`not used by TCP
`
`inetsw[2]
`SOCK_STREAM
`&inetdomain
`IPPROTO_TCP (6)
`PR_CONNREQUIRED/PR WANTRCVD
`tcp_input
`
`0~
`
`cp_ctlinput
`[cp_ctloutput
`tcp_usrreq
`Ecp_init
`tcp_fasttimo
`tcp_slowtimo
`Ecp_drain
`0
`
`Member
`pr_type
`pr_domain
`pr_protocol
`pr_flags
`pr_input
`pr_output
`pr_ctlinput
`pr_ctloutput
`pr_usrreq
`pr_init
`pr_fasttimo
`pr_slowtimo
`pr_drain
`pr_sysctl
`
`
`
`Figure 24.8 The TCP protosw structure.
`
`24.4
`
`TCP Header
`
`The TCP header is defined as
`a tcphdr
`and Figure 24.10 shows a picture of the TCP header.
`
`structure. Figure 24.9 shows the C structure
`
`40 struct tcphdr {
`u_short th_sport;
`41
`u_short th_dport;
`42
`43
`tcp_seq th_seq;
`44
`tcp_seq th_ack;
`45 #if BYTE_ORDER =: LITTLE_ENDIAN
`46
`u_char th_x2:4,
`47
`th_off:4;
`48 #endif
`49 #if BYTE_ORDER == BIG_ENDIAN
`50
`u_char
`th_off:4,
`51
`th_x2:4;
`52 #endif
`th_flags;
`53
`u_char
`54
`u_short th_win;
`55
`u_short th_sum;
`u_short th_urp;
`56
`57 };
`
`/* source port */
`/* destination port */
`/* sequence number */
`/* acknowledgement number *!
`
`/* (unused) */
`/* data offset */
`
`/* data offset */
`/* (unused) */
`
`/* ACK, FIN, PUSH, RST, SYN, URG */
`/* advertised window */
`/* checksum */
`/* urgent offset */
`
`le
`
`Figure 24.9 tcphdr structure.
`
`tcp.h
`
`tcp.h
`
`Ex.1013.827
`
`DELL
`
`

`

`802
`
`TCP: Transmission Control Protocol
`
`th_sport
`16-bit source port number
`
`th_dport
`16-bit destination port number
`
`15 16
`
`th_seq
`32-bit sequence number
`
`th_of f
`4-bit header
`length
`
`th_x2
`reserved
`(6 bits)
`
`th_ack
`32-bit acknowledgment numbe, r
`UAPRSF
`RCSSYI
`GKHTNN
`
`th_win
`16-bit window size
`
`th_sum
`16-bitTCPchecksum
`
`th_urp
`16-bit urgent offset
`
`options (if any)
`
`data (if any)
`
`Chapter
`
`31
`
`20 bytes
`
`Figure 24.10 TCP header and optional data.
`
`Most RFCs, most books (including Volume 1), and the code we’ll examine call th_urp the
`urgent pointer. A better term is the urgent offset, since this field is a 16-bit unsigned offset that
`must be added to the sequence number field (th_seq) to give the 32-bit sequence number of
`the last byte of urgent data. (There is a continuing debate over whether this sequence number
`points to the last byte of urgent data or to the byte that follows. This is immaterial for the
`present discussion.) We’ll see in Figure 24.13 that TCP correctly calls the 32-bit sequence num-
`
`ber of the last byte of urgent data snd_up the send urgent pointer. But using the term pointer for
`the 16-bit offset in the TCP header is misleading. In Exercise 26.6 we’ll reiterate the distinction
`between the urgent pointer and the urgent offset.
`
`The 4-bit header length, the 6 reserved bits that follow, and the 6 flag bits are
`defined in C as two 4-bit bit-fields, followed by 8 bits of flags. To handle the difference
`in the order of these 4-bit fields within an 8-bit byte, the code contains an # i fde f based
`on the byte order of the system.
`Also notice that we call the 4-bit th_of f the header length, while the C code calls it
`
`the data offset. Both are correct since it is the length of the TCP header,
`options, in 32-bit words, which is the offset of the first byte of data.
`The th_flags member contains 6 flag bits, accessed using the names in Fig-
`ure 24.11.
`In Net/3 the TCP header is normally referenced as an IP header immediately
`lowed by a TCP header. This is how tcp_input processes received IP datagrams and
`how top_output builds outgoing IP datagrams. This combined IP/TCP header is a
`tcpiphdr structure, shown in Figure 24.12.
`
`Ex.1013.828
`
`DELL
`
`

`

`24.5
`
`TCP Control Block 803
`
`th_flags
`TH_ACK
`TH_FIN
`TH_ PUSH
`
`TH_RST
`TH_SYN
`TH_ URG
`
`Description
`the acknowledgment number (th_ack) is valid
`
`the sender is finished sending data
`receiver should pass the data to application without delay
`reset the connection
`
`
`the urgent offset (th_urp) is valid
`
`synchronize sequence numbers (establish connection)
`
`Figure 24.11 th_flags values.
`
`38 struct tcpiphdr {
`39
`struct ipovly ti_i;
`40
`struct tcphdr ti_t;
`41 ];
`
`/* overlaid ip structure */
`/* tcp header */
`
`tcpip.h
`
`42 #define ti_next
`43 #define ti_prev
`44 #define ti_xl
`45 #define ti_pr
`46 #define ti_len
`47 #define ti_src
`48 #define ti_dst
`49 #define ti_sport
`50 #define ti_dport
`51 #define ti_seq
`52 #define ti_ack
`53 #define ti_x2
`54 #define ti_off
`55 #define ti_flags
`56 #define ti_win
`57 #define ti_sum
`58 #define ti_urp
`
`ti_i ih_next
`ti_i ih_prev
`ti_i ih_xl
`ti_i ih__pr
`ti_i ih_len
`ti_i ih_src
`ti_i ih_dst
`ti_t th_sport
`ti_t.th_dport
`ti_t.th_seq
`ti_t.th_ack
`ti_t.th_x2
`ti_t.th_off
`ti_t.th_flags
`ti_t.th_win
`ti_t.th_sum
`ti_t.th_urp
`
`tcpip.h
`
`Figure 24.12 tcpiphdr structure: combined IP/TCP header.
`
`38--58
`
`an ipovly structure, which we showed earlier
`The 20-byte IP header is defined as
`in Figure 23.12. As we discussed with Figure 23.19, this structure is not a real IP header,
`although the lengths are the same (20 bytes).
`
`24.5
`
`TCP Control Block
`
`In Figure 22.1 we showed that TCP maintains its own control block, a tcpcb structure,
`in addition to the standard Internet PCB. In contrast, UDP has everything it needs in
`the Internet PCB--it doesn’t need its own control block.
`The TCP control block is a large structure, occupying 140 bytes. As shown in Fig-
`ure 22.1 there is a one-to-one relationship between the Internet PCB and the TCP control
`block, and each points to the other. Figure 24.13 shows the definition of the TCP control
`block.
`
`Ex.1013.829
`
`DELL
`
`

`

`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54 /*
`The following fields are used as in the protocol specification.
`55 *
`See RFC783, Dec. 1981, page 21.
`56 *
`*/
`57
`/* send sequence variables */
`58
`/* send unacknowledged */
`tcp_seq snd_una;
`59
`/* send next */
`tcp_seq snd_nxt;
`6O
`/* send urgent pointer */
`tcp_seq snd_up;
`61
`/* window update seg seq number *!
`tcp_seq snd_wll;
`62
`/* window update seg ack number *!
`tcp_seq snd_wl2;
`63
`/* initial send sequence number */
`tcp_seq iss;
`64
`!* send window */
`u_long snd_wnd;
`65
`receive sequence variables */
`66 /*
`/* receive window */
`u_long rcv_wnd;
`67
`/* receive next */
`tcp_seq rcv_nxt;
`68
`/* receive urgent pointer */
`tcp_seq rcv_up;
`69
`/* initial receive sequence number */
`tcp_seq irs;
`7O
`71
`72
`73
`74
`75
`76 /*
`77
`78
`79 /*
`8O
`81
`82
`83 /*
`84 * transmit timing stuff. See below for scale of srtt and rttvar.
`¯ "Variance" is actually smoothed difference.
`85
`86 */
`87
`88
`89
`90
`91
`92
`93
`
`* Additional variables for this
`*/
`* receive variables */
`tcp_seq rcv_adv;
`retransmit variables */
`tcp_seq snd_max;
`
`implementation.
`
`!* advertised window by other end *!
`
`8O4
`
`TCP: Transmission Control Protocol
`
`Section
`
`tcp_var.h
`
`struct tcpcb {
`/* reassembly queue of received segments */
`struct tcpiphdr *seg_next;
`/* reassembly queue of received segments */
`struct tcpiphdr *seg_prev;
`/* connection state (Figure 24.16) *!
`t_state;
`short
`t_timer[TCPT_NTIMERS]; /* tcp timers (Chapter 25) */
`short
`!* log(2) of rexmt exp. backoff */
`t_rxtshift;
`short
`/* current retransmission timeout (#ticks) */
`t_rxtcur;
`short
`/* #consecutive duplicate ACKs received */
`t_dupacks;
`short
`/* maximum segment size to send */
`u_short t_maxseg;
`/* 1 if forcing out a byte (persist/OOB)
`t_force;
`char
`/* (Figure 24.14) */
`u_short t_flags;
`/* skeletal packet for transmit *!
`struct tcpiphdr *t_template;
`/* back pointer to internet PCB */
`struct inpcb *t_inpcb;
`
`/* highest sequence number sent;
`* used to recognize retransmits */
`*/
`congestion control (slow start, source quench, retransmit after loss)
`/* congestion-controlled window */
`snd_cwnd;
`u_long
`/* snd_cwnd size threshhold for slow start
`snd_ssthresh;
`u_long
`* exponential to linear switch */
`
`short t_idle;
`short t_rtt;
`tcp_seq t_rtseq;
`short t_srtt;
`short t_rttvar;
`u_short t_rttmin;
`u_long max_sndwnd;
`
`/* inactivity time */
`/* round-trip time */
`/* sequence number being timed *!
`/* smoothed round-trip time */
`/* variance in round-trip time */
`/* minimum rtt allowed */
`/* largest window peer has offered */
`
`24.6
`
`Ex.1013.830
`
`DELL
`
`

`

`24.6
`
`TCP State Transition Diagram 805
`
`/* TCPOOB_HAVEDATA, TCPOOB_HADDATA */
`/* input character, if not SO_OOBINLINE */
`/* possible error not yet reported */
`
`/* scaling for send window (0-14) */
`/* scaling for receive window (0-14) */
`/* our pending window scale */
`/* peer’s pending window scale */
`/* timestamp echo data */
`/* when last updated *!
`/* sequence number of last ack field */
`
`94 /* out-of-band data */
`95
`char
`t_oobflags;
`96
`char
`t_iobc;
`97
`short
`t_softerror;
`98 /* RFC 1323 variables */
`99
`u_char snd_scale;
`i00
`u_char
`rcv_scale;
`I01
`u_char
`request_r_scale;
`102
`u_char
`requested_s_scale;
`103
`u_long is_recent;
`104
`u_long ts_recent_age;
`105
`tcp_seq last_ack_sent;
`106 ];
`107 #define intotcpcb(ip)
`108 #define sototcpcb(so)
`
`((struct tcpcb *) (ip)->inp_ppcb)
`(intotcpcb(sotoinpcb(so)))
`
`tcp_var.h
`
`
`
`Figure 24.13 t cpcb structure: TCP control block.
`
`We’ll save the discussion of these variables until we encounter them in the code.
`Figure 24.14 shows the values for the t_flags member.
`
`t_flags
`
`TF_ACKNOW
`TF_DELACK
`TF_NODELAY
`TF_NOOPT
`TF_SENTFIN
`TF_RCVD_SCALE
`TF_RCVD_TSTMP
`TF_REQ_SCALE
`TF_REQ_TSTMP
`
`Description
`send ACK immediately
`send ACK, but try to delay it
`don’t delay packets to coalesce (disable Nagle algorithm)
`don’t use TCP options (never set)
`have sent FIN
`set when other side sends window scale option in SYN
`set when other side sends timestamp option in SYN
`have/will request window scale option in SYN
`have/will request timestamp option in SYN
`
`Figure 24.14 t_flags values.
`
`24.6
`
`TCP State Transition Diagram
`
`Many of TCP’s actions, in response to different types of segments arriving on a connec-
`tion, can be summarized in a state transition diagram, shown in Figure 24.15. We also
`duplicate this diagram on one of the front end papers, for easy reference while reading
`the TCP chapters.
`These state transitions define the TCP finite state machine. Although the transition
`from LISTEN to SYN_SENT is allowed by TCP, there is no way to do this using the ¯
`sockets API (i.e., a connect is not allowed after
`a
`listen).
`The t_state member of the control block holds the current state of a connection,
`with the values shown in Figure 24.16.
`This figure also shows the top_out flags array, which contains the outgoing flags
`for top_output to use when the connection is in that state.
`
`Ex.1013.831
`
`DELL
`
`

`

`8O6
`
`TCP: Transmission Control Protocol
`
`Section
`
`starting point
`
`~ CLOSED
`
`appl: passive open
`send: <nothing>
`
`timeout
`send:
`
`send: SYN, ACK
`simultaneous open
`
`SYN_SENT
`active open
`
`appl: close
`or timeout
`
`Half-C
`
`2417
`
`Ex.1013.832
`
`recv~ ~ _I~.~CLOSE WAIT~
`send: ACK
`
`! !
`
`appl:! close
`send: ! FIN
`
`II
`
`(LAB ACK~-~s ........
`T~
`"x~ recv: ACK
`-.~
`end: <nothing>
`
`t_ J
`passive close
`
`appl:
`send:
`
`close
`FIN
`
`tata transfer state
`
`~ ~- - - - J
`-~ recv: FIN
`.-
`QFIN_WAIT_I~ ~ CLOSIN 3.)
`
`simultaneous close
`
`recv" I ACK
`
`%ONyX’g4. recv: [ACK
`
`
`" "¢y’’~ send:l<nOthing> ’1
`
`send:l<n°thing>" --~ recv: FIN ~’~
`
`"
`
`(FIN WAIT 2) send:ACK
`
`2MSL timeout
`
`active close
`¯~. normal transitions for client
`normal transitions for server
`state transitions taken when application issues operation
`state transitions taken when segment received
`what is sent for this transition
`Figure 24.15 TCP state transition diagram.
`
`----~.
`appl:
`recv:
`send:
`
`DELL
`
`

`

`24.7
`
`TCP Sequence Numbers 807
`
`tcp_out flags [ ]
`TH_RST
`
`/ TH_ACK
`
`[ TH_ACK
`TH_ACK
`TH_ACK
`
`/ TH_ACK
`/ TH_ACK
`/ TH_ACK
`TH_ACK
`TH_ACK
`
`0T
`
`H_SYN
`TH_SYN
`
`TH_FIN
`TH_FIN
`TH_FIN
`
`t_state
`TCPS_CLOSED
`TOPS_LISTEN
`TOPS SYN SENT
`TCPS SYN RECEIVED
`TCPS_ESTABLISHED
`TOPS_CLOSE_WAIT
`TOPS FIN WAIT_I
`TCPS_CLOSING
`TCPS_LAST_ACK
`TOPS FIN WAIT_2
`TCPS_TIME_WAIT
`
`value
`
`Description
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`
`closed
`listening for connection (passive open)
`have sent SYN (active open)
`have sent and received SYN; awaiting ACK
`established (data transfer)
`received FIN, waiting for application close
`have closed, sent FIN; awaiting ACK and FIN
`simultaneous close; awaiting ACK
`received FIN have closed; awaiting ACK
`have closed; awaiting FIN
`2MSL wait state after active close
`
`Figure 24.16 t_state values.
`
`Figure 24.16 also shows the numerical values of these constants since the code uses
`their numerical relationships. For example, the following two macros are defined:
`
`#define
`#define
`
`TCPS_HAVERCVDSYN(s) ((s) >: TCPS_SYN_RECEIVED)
`TCPS_HAVERCVDFIN(s) ((s) >: TCPS_TIME_WAIT)
`
`Similarly, we’ll see that tcp_not i fy handles ICMP errors differently when the connec-
`tion is not yet established, that is, when t_s tare is less than TCPS_ESTABLISHED.
`
`The name TCPS_HAVERCVDSYN is correct, but the name TCPS_HAVERCVDFIN is misleading.
`A FIN has also been received in the CLOSE_WAIT, CLOSING, and LAST_ACK states. We
`encounter this macro in Chapter 29.
`
`Half-Close
`
`When a process calls shutdown with a second argument of 1, it is called a half-close.
`TCP sends a FIN but allows the process to continue receiving on the socket. (Sec-
`tion 18.5 of Volume 1 contains examples of TCP’s half-close.)
`For example, even though we label the ESTABLISHED state "data transfer," if the
`process does a half-close, moving the connection to the FIN_WAIT_I and then the
`FIN_WAIT_2 states, data can continue to be received by the process in these two states.
`
`24.7 TCP Sequence Numbers
`
`Every byte of data exchanged across a TCP connection, along with the SYN and FIN
`flags, is assigned a 32-bit sequence number. The sequence number field in the TCP
`header (Figure 24.10) contains the sequence number of the first byte of data in the seg-
`ment. The acknowledgment number field in the TCP header contains the next sequence
`number that the sender of the ACK expects to receive, which acknowledges all data
`bytes through the acknowledgment number minus 1. In other words, the acknowledg-
`ment number is the next sequence number expected by the sender of the ACK. The
`acknowledgment number is valid only if the ACK flag is set in the header. We’ll see
`
`Ex.1013.833
`
`DELL
`
`

`

`808
`
`TCP: Transmission Control Protocol
`
`Chapter
`
`Secti.
`
`that TCP always sets the ACK flag except for the first SYN sent by an active open (the
`SYN_SENT state; see tcp_ou¢ f lags I2 ] in Figure 24.16) and in some RST segments.
`Since a TCP connection is
`each end must maintain a set of sequence
`full-duplex,
`numbers for both directions of data flow. In the TCP control block (Figure 24.13) there
`
`are 13 sequence numbers: eight for the send direction (the send sequence space)
`and five
`for the receive direction (the receive sequence space).
`Figure 24.17 shows the relationship of four of the variables in the send sequence
`space: snd_wnd, snd_una, snd_nxt, and snd_max. In this example we number the
`bytes I through 11.
`
`snd wnd : 6: offered window
`(advertised by receiver)
`
`usable window
`
`7
`
`8
`
`9
`
`10
`
`11
`
`.-.
`
`sent and
`acknowledged
`
`sent, not ACKed ~
`
`can send ASAP
`
`can’t send until
`window moves
`
`snd_una = 4
`oldest
`unacknowledged
`sequence number
`
`snd_nxt = 7
`next send
`sequence number
`
`snd_max : 7
`max.imum send
`sequence number
`
`Figure 24.17 Example of send sequence space.
`
`An acceptable ACK is one for which the following inequality holds:
`
`snd_una < acknowledgment field <: snd_max
`
`In Figure 24.17 an acceptable ACK has an acknowledgment field of 5, 6, or 7. An
`acknowledgment field less than or equal to snd_una is a duplicate ACK--it acknowl-
`edges data that has already been ACKed, or else snd_una would not have incremented
`past those bytes.
`We encounter the following test a few times in tcp_output, which is true if a seg-
`ment is being retransmitted:
`snd_nxt < snd_max
`
`Figure 24.18 shows the other end of the connection in Figure 24.17: the receive
`sequence space, assuming the segment containing sequence numbers 4, 5, and 6 has not
`been received yet. We show the three variables rcv_nxt, rcv_wnd, and rcv_adv.
`
`Ex.1013.834
`
`DELL
`
`

`

`Ex.1013.835
`
`24.7
`
`TCP Sequence Numbers
`
`809
`
`= 6: receive window
`rcv_wnd
`(advertised to sender)
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`¯ ¯ ¯
`
`old sequence numbers
`that TCP has ackn°wledged ~ ~
`
`rcv_nxt = 4
`next receive
`sequence number
`
`future sequence numbers
`not yet allowed
`
`~
`rcv_adv = 10
`highest advertised
`sequence number
`plus 1
`
`Figure 24.18 Example of receive sequence space.
`
`The receiver considers a received segment valid if it contains data within the win-
`dow, that is, if either of the following two inequalities is true:
`rcv_nxt <: beginning sequence number of segment < rcv_nxt + rcv_wnd
`
`rcv_nxt <= ending sequencenumberofsegment < rcv_nxt + rcv_wnd
`
`The beginning sequence number of a segment is just the sequence number field in the
`TCP header, ti_seq. The ending sequence number is the ~equence number field plus
`the number of bytes of TCP data, minus 1.
`For example, Figure 24.19 could represent the TCP segment containing the 3 bytes
`with sequence numbers 4, 5, and 6 in Figure 24.17.
`
`[--
`
`63-byte IP datagram
`
`IP header IP TCP header
`options
`8
`20 bytes
`20
`Figure 24.19 TCP segment transmitted as an IP datagram.
`
`TCP
`options
`12
`
`~
`
`1 1 1
`
`We assume that there are 8 bytes of IP options and 12 bytes of TCP options. Fig-
`ure 24.20 shows the values of the relevant variables.
`
`Description
`length of IP header + options in 32-bit words (= 28 bytes)
`length of IP datagram in bytes (20 + 8 + 20 + 12 + 3)
`length of TCP header + options in 32-bit words (= 32 bytes)
`sequence number of first byte of data
`#bytes of TCP data: ip_len - (ip_hl x 4) - (t i_o f £ x 4)
`sequence number of last byte of data: t i_s eq + t i_l en - 1
`
`Variable Value
`ip_hl
`7
`ip_len
`63
`8
`ti_off
`ti_seq
`4
`ti_len
`
`36
`
`Figure 24.20 Values of variables corresponding to Figure 24.19.
`
`DELL
`
`

`

`TCP: Transmission Control Protocol Chapter 24
`
`810
`
`Sect
`
`ti_len is not a field that is transmitted in the TCP header. Instead, it is computed a -~
`shown in Figure 24.20 and stored in the overlaid IP structure (Figure 24.12) once the
`received header fields have been checksummed and verified. The last value in this fig-
`ure is not stored in the header, but is computed from the other values when needed.
`
`Modular Arithmetic with Sequence Numbers
`
`A problem that TCP must deal with is that the sequence numbers are from a finite 32-bit
`number space: 0 through 4,294,967,295. If more than
`bytes of data are exchanged
`232
`across a TCP connection, the sequence numbers will be reused. Sequence numbers
`wrap around from 4,294,967,295 to 0.
`Even if less than
`232 bytes of data are exchanged, wrap around is still a problem
`because the sequence numbers for a connection don’t necessarily start at 0. The initial
`sequence number for each direction of data flow across a connection can start anywhere
`between 0 and 4,294,967,295. This complicates the comparison of sequence numbers.
`" reater than" 4,294,967,295, as we discuss below.
`For example, sequence number I is g
`TCP sequence numbers are defined
`as unsigned longs in tcp. h:
`typedef u_long tcp_seq;
`
`The four macros shown in Figure 24.21 compare sequence numbers.
`
`40 #define SEQ LT(a,b)
`41 #define SEQ_LEQ(a,b)
`42 #define SEQ_GT(a,b)
`43 #define SEQ_GEQ(a,b)
`
`((int) ((a)-(b)) < 0)
`((int) ((a)- (b)) <: 0)
`((int) ((a)-(b)) > 0) .....
`((int) ((a)-(b)) >: 0)
`
`Figure 24.21
`
`Macros for TCP sequence number comparison.
`
`tcp_seq.h
`
`tcp_seq.h
`
`Example--Sequence Number Comparisons
`
`Let’s look at an example to see how TCP’s sequence numbers operate. Assume 3-bit
`sequence numbers, 0 through 7. Figure 24.22 shows these eight sequence numbers,
`their 3-bit binary representation, and their two’s complement representation. (To
`the two’s complement take the binary num

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket