`
`TCP Output
`
`Chapter 26
`
`The next part of tcp_output, shown in Figure 26.26, starts with the code that is
`executed when 1 en equals 0: there is no data in the segment TCP is sending.
`
`317
`318
`319
`320
`321
`322
`323
`324
`325
`
`326
`327
`328
`329
`330
`331
`332
`333
`334
`335
`336
`337
`338
`
`tcp_output.c
`
`/* len -- 0 */
`} else {
`if (tp->t_flags & TF_ACKNOW)
`tcpstat.tcps_sndacks++;
`else if (flags & (TH_SYN I TH__FIN I TH_RST)
`tcpstat.tcps_sndctrl++;
`else if (SEQ_GT{tp->snd_up, tp->snd_una)}
`tcpstat.tcps_sndur~++;
`else
`tcpstat.tcps_sndwinup++;
`
`MGETHDR(m, M_DONTWAIT, MT_HEADER);
`if (m -- NULL) {
`error - ENOBUFS;
`goto out;
`
`}
`m->m_data +- max_linkhdr;
`m->m_len = hdrlen;
`
`}
`m->m pkthdr.rcvif = (struct ifnet *) 0;
`ti - mtod(m, struct tcpiphdr *);
`if (tp->t_template == 0)
`panic("tcp_output");
`bcopy((caddr_t) tp->t_template, (caddr_t) ti,
`
`sizeof (struct tcpiphdr) ) ;
`tcp_output.c
`t ep_output function: update statistics and allocate mbuf for IP and TCP headers.
`
`Figure 26.26
`
`318--325
`
`326-335
`
`336--338
`
`Update statistics
`Various statistics are updated: TF_ACKNOW and a length of 0 means this is an ACK-
`only segment. If any one of the flags SYN, FIN, or RST is set, this is a control segment.
`If the urgent pointer exceeds snd_una, the segment is being sent to notify the other end
`of the urgent pointer. If none of these conditions are true, this segment is a window
`update.
`Get mbuf for IP and TCP headers
`An mbuf with a packet header is allocated to contain the IP and TCP headers.
`Copy IP and TCP header templates into mbuf
`The template of the IP and TCP headers is copied from t_template into the mbuf
`by bcopy. This template was created by top_template.
`
`Figure 26.27 shows the next part of tcp_output, which fills in some remaining
`fields in the TCP header.
`Decrement snd_nxt if FIN is being retransmitted
`If TCP has already transmitted the FIN, the send sequence space appears as shown
`in Figure 26.28.
`
`339--346
`
`DELL EX.1095.901
`
`
`
`Section 26.7
`
`Send a Segment 877
`
`/* tcp_output.c
`* Fill in fields, remembering maximum advertised
`* window for use in delaying messages about window sizes.
`* If resending a FIN, be sure not to use a new sequence number.
`*/
`if (flags & TH_FIN && tp->t_flags & TF_SENTFIN &&
`tp >snd_nxt == tp->snd_max)
`tp->snd_nxt- ;
`
`* If we are doing retransmissions, then snd_nxt will
`* not reflect the first unsent octet. For ACK only
`* packets, we do not want the sequence nulmber of the
`* retransmitted packet, we want the sequence number
`* of the next unsent octet. So, if there is no data
`* (and no SYN or FIN), use snd max instead of snd_nxt
`* when filling in ti_seq. But if we are in persist
`* state, snd_max might reflect one byte beyond the
`* right edge of the window, so use snd_nxt in that
`* case, since we know we aren’t doing a retransmission.
`* (retransmit and persist are mutually exclusive...)
`*/
`if (len II (flags & (TH_SYH I TH_FIN)) I I tp->t_timer[TCPT_PERSIST])
`ti->ti_seq : htonl(tp->snd nxt);
`
`else
`
`ti->ti_seq : htonl(tp->snd max);
`
`ti->ti_ack : htonl(tp->rcv_nxt);
`
`tcp_output.c
`
`if (optlen) {
`bcopy((caddr_t) opt, (caddr_t) (ti + i), optlen);
`ti->ti_off = (sizeof(struct tcphdr) + optlen) >> 2;
`
`}t
`
`i->ti_flags : flags;
`
`339
`340
`341
`342
`343
`344
`345
`346
`347
`348
`349
`350
`351
`352
`353
`354
`355
`356
`357
`358
`359
`360
`36!
`362
`363
`
`364
`
`365
`366
`367
`368
`369
`
`Figure 26.27 tcp_output function: set ti_seg, ti_ack, and ti_flags.
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`FIN
`
`sent and acknowledged
`
`snd_una = 10
`
`snd_nxt = II
`snd_max : 11
`
`Figure 26.28 Send sequence space after FIN has been transmitted.
`
`Therefore, if the FIN flag is set, and if the TF_SENTFIN flag is set, and if
`snd_nxt
`equals snd_raax, TCP knows the FIN is being retransmitted. We’ll see shortly (Fig-
`ure 26.31) that when a FIN is sent, snd_nxt is incremented 1 one (since the FIN occu-
`pies a sequence number), so this piece of code decrements snd_nxt by 1.
`
`DELL EX.1095.902
`
`
`
`878
`
`TCP Output
`
`Chapter 26
`
`347-363
`
`364
`
`365--368
`
`369
`
`370--375
`
`376-377
`
`378--379
`
`381--383
`
`Set sequence number field of segment
`The sequence number field of the segment is normally set to snd_nxt, but is set to
`snd_max if (1) there is no data to send (len equals 0), (2) neither the SYN flag nor the
`FIN flag is set, and (3) the persist timer is not set.
`Set acknowledgment field of segment
`
`The acknowledgment field of the segment is always set to the next
`rcv_nxt,
`expected receive sequence number.
`Set header length if options present
`If TCP options are present (optl÷n is greater than 0), the options are copied into
`the TCP header and the 4-bit header length in the TCP header (t h_o f f in Figure 24.10)
`is set to the fixed size of the TCP header (20 bytes) plus the length of the options,
`divided by 4. This field is the number of 32-bit words in the TCP header, including
`options.
`The flags field in the TCP header is set from the variable flags.
`
`The next part of code, shown in Figure 26.29, fills in more fields in the TCP header
`and calculates the TCP checksum.
`Don’t advertise less than one full-sized segment
`Avoidance of the silly window syndrome is performed, this time in calculating the
`window size that is advertised to the other end (t i_win). Recall that win was set at the
`end of Figure 26.3 to the amount of space in the socket’s receive buffer. If win is less
`than one-fourth of the receive buffer size (so_rcv. sb_hiwat) and less than one full-
`sized segment, the advertised window will be 0. This is subject to the later test that pre-
`vents the window from shrinking. In other words, when the amount of available space
`reaches either one-fourth of the receive buffer size or one full-sized segment, the avail-
`able space will be advertised.
`Observe upper limit for advertised window on this connection
`If win is larger than the maximum value for this connection, reduce it to its maxi-
`mum value.
`Do not shrink window
`Recall from Figure 26.10 that rcv_adv minus rcv_nxt is the amount of space still
`available to the sender that was previously advertised. If win is less than this value,
`win is set to this value, because we must not shrink the window. This can happen when
`the available space is less than one full-sized segment (hence win was set to 0 at the
`beginning of this figure), but there is room in the receive buffer for some data. Fig-
`ure 22.3 of Volume i shows an example of this scenario.
`Set urgent offset
`If the urgent pointer (snd up) is greater than snd_nxt, TCP is in urgent mode.
`The urgent offset in the TCP header is set to the 16-bit offset of the urgent pointer from
`the starting sequence number of the segment, and the URG flag bit is set. TCP sends the
`urgent offset and the URG flag regardless of whether the referenced byte of urgent data
`is contained in this segment or not.
`
`DELL EX.1095.903
`
`
`
`Section 26.7
`
`Send a Segment 879
`
`370
`371
`372
`373
`374
`375
`376
`377
`378
`379
`380
`
`381
`382
`383
`384
`385
`386
`387
`388
`389
`390
`391
`
`392
`393
`394
`395
`396
`397
`398
`399
`
`/, tcp_output.c
`* Calculate receive window. Don’t shrink window,
`* but avoid silly window syndrome.
`*/
`if (win < (long) (so->so_rcv.sb hiwat / 4) && win < (long) tp->t_maxseg)
`win = 0;
`if (win > (long) TCP_MAXWIN << tp->rcv_scale)
`win = (long) TCP_MAXWIN << tp->rcv_scale;
`if (win < (long) (tp->rcv_adv - tp->rcv_nxt))
`win = (long) (tp->rcv_adv - tp->rcv_nxt);
`ti->ti_win - htons((u_short) (win >> tp->rcv_scale));
`
`if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
`ti->ti_urp = htons((u_short) (tp->snd_up - tp->snd_nxt));
`ti->ti_flags I: TH_URG;
`else
`/*
`* If no urgent pointer to send, then we pull
`* the urgent pointer to the left edge of the send window
`* so that it doesn’t drift into the send window on sequence
`* number wraparound.
`*/
`tp->snd_up = tp->snd_una; /* drag it along */
`
`* Put TCP length in extended header, and then
`* checksum extended header and data.
`*/
`if (len + optlen)
`ti->ti_len = htons((u_short) (sizeof(struct tcphdr) +
`optlen + len));
`ti->ti_sum = in_cksum(m, (int) (hdrlen + len));
`
`Figure 26.29
`
`tcp_output.c
`tcp_output function: fill in more TCP header fields and calculate checksum.
`
`Figure 26.30 shows an example of how the urgent offset is calculated, assuming the
`process executes
`send(fd, buf, 3, MSG_OOB);
`and the send buffer is empty when this call to send takes place. This shows that Berke-
`
`ley-derived systems consider the hrgent pointer to point to the first byte of data after the
`out-of-band byte. Recall our discussion after Figure 24.10 where we distinguished
`
`
`between the 32-bit urgent pointer in the data stream (snd_up), and the 16-bit urgent offset
`(t
`in the TCP header
`i_urp).
`
`There is a subtle bug here. The bug occurs when the send buffer is larger than 65535, regard-
`less of whether the window scale option is in use or not. If the send buffer is greater than
`65535 and is nearly full, and the process sends out-of-band data, the offset of the urgent
`pointer from snd_nxt can exceed 65535. But the urgent pointer is a 16-bit unsigned value,
`and if the calculated value exceeds 65535, the 16 high-order bits are discarded, delivering a
`bogus urgent pointer to the other end. See Exercise 26.6 for a solution.
`
`DELL EX.1095.904
`
`
`
`880
`
`TCP Output
`
`Chapter 26
`
`send queue
`"~
`-~so_snd. ~b_cc = 3
`
`4 5 6]
`
`snd_una
`snd_nxt
`
`snd_up = 7
`set by
`PRU_SEND00B
`
`.-- urgent offset = 3
`~set by tcp_output
`Example of urgent pointer and urgent offset calculation.
`
`Figure 26.30
`
`384--391
`
`392--399
`
`If TCP is not in urgent mode, the urgent pointer is moved to the left edge of the
`window (snd_una).
`The TCP length is stored in the pseudo-header and the TCP checksum is calculated.
`All the fields in the TCP header have been filled in, and when the IP and TCP header
`template were copied from t_template (Figure 26.26), the fields in the IP header that
`are used as the pseudo-header were initialized (as shown in Figure 23.19 for the UDP
`checksum calculation).
`
`400--405
`
`406--417
`
`418--419
`
`420--428
`
`The next part of tcp_output, shown in Figure 26.31, updates the sequence num-
`ber if the SYN or FIN flags are set and initializes the retransmission timer.
`Remember starting sequence number
`If TCP is not in the persist state, the starting sequence number is saved in
`s t art s eq. This is used later in Figure 26.31 if the segment is timed.
`Increment snd_nxt
`Since both the SYN and FIN flags take a sequence number, snd_nxt is incremented
`
`
`if either is set. TCP also remembers that the FIN has been sent, by setting the flag
`TF_SENTFIN. snd_nxt is then incremented by the number of bytes of data (fen),
`which can be 0.
`Update snd_max
`If the new value of snd_nxt is larger than snd_max, this is not a retransmission.
`The new value of snd_max is stored.
`If a segment is not currently being timed for this connection (t_rtt equals 0), the
`timer is started (t_rtt is set to 1) and the starting sequence number of the segment
`being timed is saved in t_rtseq. This sequence number is used by top_input to
`determine when the segment being timed is acknowledged, to update the RTT estima-
`tors. The sample code we discussed in Section 25.10 looked like
`if (tp->t_rtt && SEQ_GT(ti->ti_ack, tp->t_rtseq))
`tcp_xmit_timer(tp, tp->t_rtt);
`
`DELL EX.1095.905
`
`
`
`Section 26.7
`
`Send a Segment 881
`
`/*
`* In transmit state, time the transmission and arrange for
`* the retransmit. In persist state, just set snd_max.
`*/
`if (tp->t_force == 0 I I tp->t_timer[TCPT_PERSIST] := 0) {
`tcp_seq startseq = tp->snd_nxt;
`
`tcp_output.c
`
`* Advance snd_nxt over sequence space of this segment.
`*/
`if {flags & (TH_SYN I TH_FIN)) {
`if (flags & TH_SYN)
`tp >snd_nxt++;
`if (flags & TH_FIN) {
`tp->snd_nxt++;
`tp->t_flags I= TF_SENTFIN;
`
`}
`
`}t
`
`}
`
`p->snd_nxt +: len;
`if (SEQ_GT(tp->snd_nxt, tp >snd max)) {
`tp->snd_max = tp->snd_nxt;
`/*
`* Time this transmission if not a retransmission and
`* not currently timing anything.
`*/
`if (tp->t_rtt == 0) {
`tp->t_rtt = i;
`tp->t_rtseq : sEartseq;
`tcpstat.tcps_segstimed++;
`
`* Set retransmit timer if not currently set,
`* and not doing an ack or a keepalive probe.
`* Initial value for retransmit timer is smoothed
`* round-trip time + 2 * round-trip time variance.
`* Initialize counter which is used for backoff
`* of retransmit time.
`./
`if (tp->t_tim÷r[TCPT_REXMT] -- O &&
`tp->snd nxt [= tp >~nd una) {
`tp->t_timer[TCPT_REXMT] : tp >t_rxtcur;
`if (tp->t_timer[TCPT_PERSIST]) {
`tp->t_timer[TCPT_PERSIST] - 0;
`tp->t_rxtshift = 0;
`
`}
`
`}
`else if (SEQ GT(tp->snd nxt + len, tp->snd_max))
`tp->snd_max : tp->snd_nxt + len;
`
`Figure 26.31
`
`tcp_output.c
`function: fill in remaining fields in TCP header and calculate checksum.
`
`4 o o
`401
`402
`403
`404
`405
`
`406
`407
`408
`409
`410
`411
`412
`413
`414
`415
`416
`417
`418
`419
`420
`421
`422
`423
`424
`425
`426
`427
`428
`429
`430
`431
`432
`433
`434
`435
`436
`437
`438
`439
`440
`441
`442
`443
`444
`445
`446
`447
`
`DELL EX.1095.906
`
`
`
`882
`
`TCP Output
`
`Chapter 26
`
`430--440
`
`441--444
`
`446--447
`
`Set retransmission timer
`If the retransmission timer is not currently set, and if this segment contains data, the
`retransmission timer is set to t_rxtcar. Recall that t_rx~cur is set by
`c cp_xm±t_t ±re÷r, when an RTT measurement is made. This is an ACK-only segment
`if snd_nxC equals snd_una (since len was added to snd_nxt earlier in this figure),
`and the retransmission timer is set only for segments containing data.
`If the persist timer is enabled, it is disabled. Either the retransmission timer or the
`persist timer can be enabled at any time for a given connection, but not both.
`Persist state
`The connection is in the persist state since t_force is nonzero and the persist timer
`is enabled. (This else clause is associated with the if at the beginning of the figure.)
`snd_max is updated, if necessary. In the persist state, len will be one.
`
`448--452
`
`453--462
`
`463 --464
`
`467--470
`
`shown in Figure 26.32 completes the formation of
`The final part of
`tcp_output,
`the outgoing segment and calls ip_outpu~ to send the datagram.
`Add trace record for socket debugging
`
`If the SO_DEBUG socket option is enabled, tcp_trace adds a record to TCP’s circu-
`lar trace buffer. We describe this function in Section 27.10.
`Set IP length, TTL, and TOS
`The final three fields in the IP header that must be set by the transport layer are
`stored: IP length, TTL, and TOS. These three fields are marked with an asterisk at the
`bottom of Figure 23.19.
`
`The comments XXx are because the latter two fields normally remain constant for a connection
`and should be stored in the header template, instead of being assigned explicitly each time a
`segment is sent. But these two fields cannot be stored in the IP header until after the TCP
`checksum is calculated.
`Pass datagram to IP
`ip_output sends the datagram containing the TCP segment. The socket options
`are logically ANDed with SO_DONTROUTE, which means that the only socket option
`passed to ip_output is SO_DONTROUTE. The only other socket option examined by
`ip_output is SO_BROADCAST, so this logical AND turns off the SO_BROADCAST bit, if
`set. This means that a process cannot issue a connect to a broadcast address, even if it
`sets the SO_BROADCAST socket option.
`The error ENOBUFS is returned if the interface queue is full or if IP needs to obtain
`an mbuf and can’t. The function tclo_quench puts the connection into slow start, by
`setting the congestion window to one full-sized segment. Notice that tcp_output still
`returns 0 (OK) in this case, instead of the error, even though the datagram was dis-
`carded. This differs from udp_output (Figure 23020), which returned the error. The
`difference is that UDP is unreliable, so the ENOBUFS error return is the only indication
`to the process that the datagram was discarded. TCP, however, will time out (if the seg-
`ment contains data) and retransmit the datagram, and it is hoped that there will be
`space on the interface output queue or more available mbufs. If the TCP segment
`
`DELL EX.1095.907
`
`
`
`Section 26.7
`
`Send a Segment 883
`
`* Trace.
`*/
`if (so >so_options & SO_DEBUG)
`tcp_trace(TA_OUTPUT, tp->t_state, tp, ti, 0);
`
`tcp_output.c
`
`* Fill in IP length and desired time to live and
`* send to IP level. There should be a better way
`* to handle ttl and tos; we could keep them in
`* the template, but need a way to checksum without them.
`*/
`m->m~kthdr.len : hdrlen + len;
`((struct ip *) ti) >ip_len - m->m_pkthdr.len;
`((struct ip *) ti)->ip_ttl = tp->t_inpcb->inp_ip.ip_ttl;
`/* XXX */
`((struct ip *) ti)->ip_tos : tp >t_inpcb->inp_ip.ip_tos;
`/* XXX */
`error = ip_output(m, tp->t_inpcb >inp_options, &tp->t_inpcb->inp_route,
`so->so_options & SO_DONTROUTE, 0);
`
`if (error) {
`Out:
`if (error -- ENOBUFS) {
`tcp_quench(tp->t_inpcb, 0);
`return (0) ;
`
`}i
`
`f ((error -= EHOSTUNREACH I I error := ENETDOWN)
`&& TCPS_HAVERCVDSYN(tp->t_state)) {
`tp->t_softerror - error;
`return (0);
`
`}r
`
`eturn (error);
`
`}t
`
`cpstat.tcps_sndtotal++;
`
`* Data sent (as far as we can tell).
`* If this advertises a larger window than any other segment,
`* then remember the size of the advertised window.
`* Any pending ACK has now been sent.
`*/
`if (win > 0 && SEQ_GT(tp >rcv_nxt + win, tp->rcv_adv))
`tp->rcv_adv : tp->rcv_nxt + win;
`tp->last_ack_sent : tp->rcv_nxt;
`tp->t_flags &= -(TF_ACKNOW I TF_DELACK);
`
`if (sendalot)
`goto again;
`return (0);
`
`
`Figure 26.32 tcp_output function: call send segment.
`ip_output to
`
`¯tcp_output.c
`
`448
`449
`45O
`451
`452
`
`453
`454
`455
`456
`457
`458
`459
`460
`461
`462
`463
`464
`465
`466
`467
`468
`469
`470
`471
`472
`473
`474
`475
`476
`477
`478
`
`479
`48O
`481
`482
`483
`484
`485
`486
`487
`488
`
`489
`490
`491
`492
`
`DELL EX.1095.908
`
`
`
`884
`
`TCP Output
`
`Chapter 26
`
`471--475
`
`479--486
`
`487
`
`488
`
`489--490
`
`doesn’t contain data, the other end will time out when the ACK isn’t received and will
`retransmit the data whose ACK was discarded.
`If a route can’t be located for the destination, and if the connection has received a
`SYN, the error is recorded as a soft error for the connection.
`When tcp_output is called by tcp_usrreq as part of a system call by a process
`(Chapter 30, the PRU_CONNECT, PRU_SEND, PRU_SENDOOB, and PRU_SHUTDOWN
`requests), the process receives the return value from tcp_output. Other functions that
`call tcp_output, such as tcp_input and the fast and slow timeout functions, ignore
`the return value (because these functions don’t return an error to a process).
`Update rcv_adv and
`last_ack_sent
`If the highest sequence number advertised in this segment (rcv_nxt plus win) is
`larger than rcv_adv, the new value is saved. Recall that rcv_adv was used in Fig-
`ure 26.9 to determine how much the window had opened since the last segment that
`was sent, and in Figure 26.29 to make certain TCP was not shrinking the window.
`The value of the acknowledgment field in the segment is saved in
`last_ack_sent. This variable is used by tcp_input with the timestamp option
`(Section 26.6).
`Any pending ACK has been sent, so the TF_ACKNOW and TF_DELACK flags are
`cleared.
`More data to send?
`If the s÷ndalot flag is set, a jump is made back to the label aga±n (Figure 26.1).
`This occurs if the send buffer contains more than one full-sized segment that can be sent
`(Figure 26.3), or if a full-sized segment was being sent and TCP options were included
`that reduced the amount of data in the segment (Figure 26.24).
`
`26.8
`
`tcp_template Function
`
`The function tcp_newtcpcb (from the previous chapter) is called when the socket is
`created, to allocate and partially initialize the TCP cont~ol block. When the first seg-
`ment is sent or received on the socket (an active open is performed, the PRU_CONNECT
`request, or a SYN arrives for a listening socket), tcp_ternplate creates a template of
`the IP and TCP headers for the connection. This minimizes the amount of work
`required by t cp_output when a segment is sent on the connection.
`Figure 26.33 shows the tcp_teraplate function.
`Allocate mbuf
`The template of the IF and TCP headers is formed in an mbuf, and a pointer to the
`mbuf is stored in the t_template member of the TCP control block. Since this func-
`
`tion can be called at the software interrupt level, from tcp_itaput, the N_DONTWAIT
`flag is specified.
`Initialize header fields
`All the fields in the IF and TCP headers are set to 0 except as follows: tier is set
`to the IP protocol value for TCP (6); ti_l÷n is set to 20, the default length of the TCP
`
`59-72
`
`73-88
`
`DELL EX.1095.909
`
`
`
`Section 26.9
`
`tcp_respond Function 885
`
`59 struct tcpiphdr *
`60 tcp_template(tp)
`61 struct tcpcb *tp;
`62 {
`63
`64
`65
`
`struct inpcb *inp = tp->t_inpcb;
`struct mbuf *m;
`struct tcpiphdr *n;
`
`tcp_subr.c
`
`if ((n : tp->t_template) =: 0} {
`m : m_get(H_DONTWAIT, MT_HEADER);
`if (m := NULL)
`return (0);
`m->m_len = sizeof(struct tcpiphdr);
`n = mtod(m, struct tcpiphdr *);
`
`}
`n->ti_next : n->ti_prev = 0;
`n->ti:xl = 0;
`n->ti~r : IPBROTO_TCP;
`n->ti_len = htons(sizeof(struct tcpiphdr)
`n->ti_src : inp->inp_laddr;
`n->ti_dst : inp->inp_faddr;
`n->ti_sport : inp->inp_iport;
`n->ti_dport = inp->inp_fport;
`n->ti_seq : 0;
`n->ti_ack = 0;
`n->ti_x2 = 0;
`n->ti_off = 5;
`n->ti_flags : 0;
`n->ti_win = 0;
`n->ti_sum = 0;
`n->ti_urp : 0;
`return (n);
`
`sizeof(struct ip));
`
`/* 5 32-bit words = 20 bytes */
`
`66
`67
`68
`69
`70
`71
`72
`73
`74
`75
`76
`77
`78
`79
`80
`81
`82
`83
`84
`85
`86
`87
`88
`89
`9O }
`
`Figure 26.33 tcp_template function: create template of IP and TCP headers.
`
`tcp_subr.c
`
`i_o f f is set to 5, the number of 32-bit words in the 20-byte TCP header.
`header; and t
`Also the source and destination IP addresses and TCP port numbers are copied from the
`Internet PCB into the TCP header template.
`Pseudo-header for TCP checksum computation
`The initialization of many of the fields in the combined IP and TCP header simpli-
`fies the computation of the TCP checksum, using the same pseudo-header technique as
`discussed for UDP in Section 23.6. Examining the udpiphdr structure in Figure 23.19
`shows why tcp_template initializes fields such as ti_next and ti_prev to 0.
`
`73--88
`
`26.9 tcp_respond Function
`
`tcp_respond is a special-purpose function that also calls
`The function
`send IP datagrams, tcp_respond is called in two cases:
`
`to
`
`ip_output
`
`DELL EX.1095.910
`
`
`
`886
`
`TCP Output
`
`Chapter 26
`
`1. by tcp_input to generate an RST segment, with or without an ACK, and
`2. by tcp_¢im÷rs to send a keepalive probe.
`
`Instead of going through all the logic of tcp_ouCput for these two cases, the special-
`purpose function ¢cp_respond is called. We also note that the function ¢cp_drop
`that we cover in the next chapter also generates RST segments by calling ¢cp_ouCput
`Not all RST segments are generated by ¢cp_r÷spond.
`Figure 26.34 shows the first half of tcp_respond.
`
`tcp_subr.c
`
`104 void
`105 tcp_respond(tp, ti, m, ack, seq, flags)
`106 struct tcpcb *tp;
`107 struct tcpiphdr *ti;
`108 struct mbuf *m;
`109 tcp_seq ack, seq;
`flags;
`ii0 int
`iii {
`112
`113
`114
`
`int
`tlen;
`win = 0;
`int
`struct route *to = 0;
`(tp] {
`win = sbspace(&tp >t_inpcb->inp_socket->so_rdv);
`ro = &tp->t_inpcb->inp_route;
`
`/* generate keepalive probe */
`(m =~ 0) {
`m : m_gethdr(M_DONTWAIT, MT_HEADER);
`if (m == NULL)
`return;
`/* no data is sent */
`tlen = 0;
`m->m_data +: max_linkhdr;
`*mtod(m, struct tcpiphdr *) = *ti;
`ti = mtod(m, struct tcpiphdr *);
`flags : TH_ACK;
`
`if
`
`]i
`
`f
`
`115
`116
`117
`118
`119
`120
`121
`122
`123
`124
`125
`126
`127
`
`/* generate RST segment */
`
`} else {
`128
`m_freem(m->m next);
`129
`m->m_next : 0;
`130
`m->m_data = (caddr_t) ti;
`131
`m->m_len = sizeof(struct tcpiphdr);
`132
`tlen = 0;
`133
`134 #define xchg(a,b,type) { type t; t=a; a~b; b:t; }
`xchg(ti->ti_dst.s_addr, ti->ti_src.s_addr, u_long);
`135
`xchg(ti->ti_dport, ti->ti_sport, u_short);
`136
`137 #under xchg
`138
`}
`
`
`
`Figure 26.34 tcp_respond function: first half.
`
`tcp_subr.c
`
`104--110
`
`Figure 26.35 shows the different arguments to tcp_respond for the three cases in
`which it is called.
`
`DELL EX.1095.911
`
`
`
`Section 26.9
`
`tcp_respond Function 887
`
`g ~erate RST without ACK
`
`g terate RST with ACK
`
`tp
`
`tp
`
`tp
`
`ti
`
`ti
`
`ti
`
`g terate keepalive
`
`tp t_template
`
`Arguments
`ack
`
`m
`m
`
`m
`
`0
`t i_seq +
`ti_len
`NULL rcv_nxt
`
`seq
`
`flags
`
`ti_ack
`
`0
`
`snd_una
`
`TH_RST
`TH_RST
`TH_ACK
`0
`
`Figure 26.35 Arguments to tcp_respond.
`
`113--118
`
`119--127
`
`¢p is a pointer to the TCP control block (possibly a null pointer); ¢± is a pointer to an
`IP/TCP header template; ra is a pointer to the mbuf containing the segment causing the
`RST to be generated; and the last three arguments are the acknowledgment field,
`sequence number field, and flags field of the segment being generated.
`It is possible for tcp_input to generate an RST when a segment is received that
`does not have an associated TCP control block. This happens, for example, when a seg-
`ment is received that doesn’t reference an existing connection (e.g., a SYN for a port
`without an associated listening server). In this case ¢p is null and the initial values for
`win and ro are used. If tp is not null, the amount of space in the receive buffer will be
`sent as the advertised window, and the pointer to the cached route is saved in ro for the
`call to ip_output.
`Send keepalive probe when keepalive timer expires
`The argument ra is a pointer to the mbuf chain for the received segment. But a keep-
`alive probe is sent in response to the keepalive timer expiring, not in response to a
`received TCP segment. Therefore m is null and m_geChdr allocates a packet header
`mbuf to contain the IP and TCP headers, t 1 en, the length of the TCP data, is set to 0,
`since the keepalive probe doesn’t contain any data.
`
`Some older implementations based on 4.2BSD do not respond to these keepalive probes unless
`the segment contains data. Net/3 can be configured to send 1 garbage byte of data in the
`probe to elicit the response by defining the name TCP_COMPAT_42 when the kernel is com-
`piled. This assigns 1, instead of 0, to tlen. The garbage byte causes no harm, because it is not
`the expected byte (it is a byte that the receiver has previously received and acknowledged), so
`it is thrown away by the receiver.
`
`128--i38
`
`The assignment of * t i copies the TCP header template structure pointed to by t i
`into the data portion of the mbuf. The pointer ¢ i is then set to point to the header tem-
`plate in the mbuf.
`Send RST segment in response to received segment
`in response to a received segment.
`An RST segment is being sent by
`tcp_input
`The mbuf containing the input segment is reused for the response. All the mbufs on the
`chain are released by m_free except the first mbuf (the packet header), since the seg-
`ment generated by tcp_resloond consists of only an IP header and a TCP header. The
`source and destination IP address and port numbers are swapped in the IP and TCP
`header.
`
`DELL EX.1095.912
`
`
`
`888
`
`TCP Output
`
`Figure 26.36 shows the final half of
`
`tcp_respond.
`
`ti->ti_len = htons((u_short) (sizeof(struct tcphdr) + tlen));
`tlen +- sizeof(struct tcpiphdr);
`m >m_len : tlen;
`m->m_pkthdr.len = tlen;
`m->m_pkthdr.rcvif - (struct ifnet *) 0;
`ti->ti_next = ti->ti_prev - 0;
`ti->ti_xl = 0;
`ti->ti_seq = htonl(seq);
`ti->ti_ack = htonl(ack);
`ti->ti_x2 : 0;
`ti->ti_off : sizeof(struct tcphdr) >> 2;
`ti->ti_flags = flags;
`if (tp)
`ti->ti_win = htons((u_short) (win >> tp->rcv_scale));
`else
`ti->ti_win = htons((u_short) win);
`ti->ti_urp : 0;
`ti->ti_sum = 0;
`ti->ti_sum : in_cksum(m, tlen);
`({struct ip *) ti) >ip_len = tlen;
`((struct ip *) ti)->ip_ttl = ip_defttl;
`(void) ip_output(m, NULL, ro, 0, NULL);
`
`139
`140
`141
`142
`143
`144
`145
`146
`147
`148
`149
`150
`151
`152
`153
`154
`155
`156
`157
`158
`159
`160
`161
`
`Chapter 26
`
`tcp_subr.c
`
`Figure 26.36 tcp_respond function: second half.
`
`tcp_subr.c
`
`139--157
`
`The fields in the IP and TCP headers must be initialized for the TCP checksum com-
`putation. These statements are similar to the way top_template initializes the
`t_template field. The sequence number and acknowledgment fields are passed by
`the caller as arguments. Finally ip_output sends the datagram.
`
`26.10 Summary
`
`This chapter has looked at the general-purpose function that generates most TCP seg-
`ments (tcp_output) and the special-purpose function that generates RST segments
`and keepalive probes
`(t,cp_respond).
`Many factors determine whether TCP can send a segment or not: the flags in the
`segment, the window advertised by the other end, the amount of data ready to send,
`whether unacknowledged data already exists for the connection, and so on. Therefore
`the logic of tcp_output determines whether a segment can be sent (the first half of the
`function), and if so, what values to set all the TCP header fields to (the last half of the
`function). If a segment is sent, the TCP control block variables for the send sequence
`space must be updated.
`One segment at a time is generated by tcp_output, and at the end of the function
`a check is made of whether more data can still be sent. If so, the function loops around
`and tries to send another segment. This looping continues until there is no more data to
`
`DELL EX.1095.913
`
`
`
`Chapter 26
`
`Exercises 889
`
`send, or until some other condition (e.g., the receiver’s advertised window) stops the
`transmission.
`A TCP segment can also contain options. The options supported by Net/3 specify
`the maximum segment size, a window scale factor, and a pair of timestamps. The first
`two can only appear with SYN segments, while the timestamp option (if supported by
`both ends) normally appears in every segment. Since the window scale and timestamp
`options are newer and optional, if the first end to send a SYN wants to use the option, it
`sends the option with its SYN and uses the option only if the other end’s SYN also con-
`tains the option.
`
`Exercises
`
`26.1
`
`26.2
`
`26.3
`
`26.4
`
`26.5
`
`26.6
`
`26.7
`
`26.8
`
`26.9
`
`26.10
`
`26.11
`
`of data, yet the
`Slow start is resumed in Figure 26.1 when there is a pause in the sending
`
`
`amount of idle time is calculated as the amount of time since the last segment was received
`on the connection. Why doesn’t TCP calculate the idle time as the amount of time since
`the last segment was sent on the connection?
`With t?igure 26.6 we said that len is less than 0 if the FIN has been sent but not acknowl-
`edged and not retransmitted. What happens if the FIN is retransmitted?
`Net/3 always sends the window scale and timestamp options with an active open. Why
`does the global variable ~cp do
`rfc1323 exist?
`In Figure 25.28, which did not use the timestamp option, the RTT estimators are updated
`eight times. If the timestamp option had been used in this example, how many times
`would the RTT estimators have been updated?
`In Figure 26.23 bcopy is called to store the received MSS in the variable rnss. Why not cast
`the pointer to opt [ 21 into a pointer to an unsigned short and perform an assignment?
`After Figure 26.29 we described a bug in the code, which can cause a bogus urgent offset to
`be sent. Propose a solution. (Hint: What is the largest amount of TCP data that can be sent
`in a segment?)
`With Figure 26.32 we mentioned that an error of ElxJOBUFS is not returned to the process
`because (1) if the discarded segment contained data, the retransrnission timer will expire
`and the data will be retransmitted, or (2) if the discarded segment was an ACK-only seg-
`ment, the other end will retransmit its data when it doesn’t receive the ACK. What if the
`discarded segment contains an RST?
`Explain the settings of the PSH flag in Figure 20.3 of Volume 1.
`Why does Figure 26.36 use the value of ±p_d÷fe~l for the TTL, while Figure 26.32 uses
`the value in the PCB?
`Describe what happens with the mbuf allocated in Figure 26.25 when IP options are speci-
`fied by the process for the TCP coimection. Implement a better solution.
`tcp_output is a long function (about 500 lines, including comments), which can appear
`to be inefficient. But lots of the code handles special cases. Assume the function is called
`with a full-sized segment ready to be sent, and no special cases: no IP options and no spe-
`cial flags such as SYN, FIN, or URG. About how many lines of C code are actually exe-
`cuted? How many functions are called before the segment is passed to J,p_ou~pu~?
`
`DELL EX.1095.914
`
`
`
`890
`
`TCP Output
`
`Chapter 26
`
`26.12
`
`26.13
`
`In the example at the end of Section 26.3 in which the application did a write of 100 bytes
`followed by a write of 50 bytes, would anything change if the application called
`once for both buffers, instead of calling wr±~÷ twice? Does anything change with
`if the two buffer lengths are 200 and 300, instead of 100 and 50?
`The timestamp that is sent in the timestamp option is taken from the global tcp_now,
`which is incremented every 500 ms. Modify TCP to use a higher resolution timestamp
`value.
`
`DELL EX.1095.915
`
`
`
`27
`
`TCP Functions
`
`27.1
`
`Introduction
`
`This chapter presents numerous TCP functions that we need to cover before discussing
`TCP input in the next two chapters:
`
`tcp_dz-a±n is the protocol’s drain function, called when the kernel is out of
`mbufs. It does nothing.
`¯
`t cp_drop aborts a connection by sending an RST.
`¯ ~c~_c~_os÷ performs the normal TCP connecti