`
`IP Fragmentation and
`Reassembly
`
`10.1
`
`Introduction
`
`In this chapter we describe the IP fragmentation and reassembly processing that we
`postponed in Chapter 8.
`IP has an important capability of being able to fragment a packet when it is too
`large to be transmitted by the selected hardware interface. The oversized packet is split
`into two or more IP fragments, each of which is small enough to be transmitted on the
`selected network. Fragments may be further split by routers farther along the path to
`the final destination. Thus, at the destination host, an IP datagram can be contained in a
`single IP packet or, if it was fragmented in transit, it can arrive in multiple IP packets.
`Because individual fragments may take different paths to the destination host, only the
`destination host has a chance to see all the fragments. Thus only the destination host
`can reassemble the fragments into a complete datagram to be delivered to the appropri-
`ate transport protocol.
`Figure 8.5 shows that 0.3% (72, 786/27, 881,978) of the packets received were frag-
`ments and 0.12% (260,484/(29,447,726-796,084)) of the datagrams sent were frag-
`mented. On wor~_c~, s~d. corn, 9.5% of the packets received were fragments.
`has more NFS activity, which is a common source of IP fragmentation.
`Three fields in the IP header implement fragmentation and reassembly: the identifi-
`cation field (±p_±d), the flags field (the 3 high-order bits of ±p_o f f), and the offset field
`(the 13 low-order bits of ±p_o£f). The flags field is composed of three 1-bit flags. Bit 0
`is reserved and must be 0, bit I is the "don’t fragment" (DF) flag, and bit 2 is the "more
`fragments" (MF) flag. In Net/3, the flag and offset fields are combined and accessed by
`±p_off, as shown in Figure 10.1.
`
`275
`
`DELL EX.1095.300
`
`
`
`276
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`ip_o f f I 0 IDF~vI~ fragment offset
`1 1 1
`13 bits
`Figure 10.1 ip_o £ f controls fragmentation of an IP packet.
`
`I P_DF and
`f f with
`Net/3 accesses the DF and MF bits by masking
`ip_o
`tively. An IP implementation must allow an application to request that the DF bit be set
`in an outgoing datagram.
`
`P_MF respec-
`I
`
`Net/3 does not provide application-level control over the DF bit when using UDP or TCP.
`
`A process may construct and send its own IP headers with the raw IP interface (Chapter 32).
`
`The DF bit may be set by the transport layers directly such as when TCP performs path MTU
`discovery.
`
`The remaining 13 bits of ip_off specify the fragment’s position within the original
`datagram, measured in 8-byte units. Accordingly, every fragment except the last must
`contain a multiple of 8 bytes of data so that the following fragment starts on an 8-byte
`boundary. Figure 10.2 illustrates the relationship between the byte offset within the
`original datagram and the fragment offset (low-order 13 bits of ip_off) in the frag-
`ment’s IP header.
`
`IP header
`ip_off =0 MF=0
`20 bytes :
`
`IP header
`ip_off =0 MF=I
`20 bytes
`
`maximum datagram
`
`~1
`
`7 8
`
`15 16 23 24
`
`65511 ~,~65514
`
`8bytes ¯
`
`8bytes
`
`8bytes
`
`8bytes " 3
`bytes
`
`8 bytes
`
`i
`
`IP header
`ip_off=l MF=I
`20 bytes
`
`8 bytes
`
`IP header
`ip_off=2 MF=I
`20 bytes
`
`8 bytes
`
`IP header
`i~_off =8188 MF=I
`20 bytes
`
`8 bytes
`
`IP header
`ip_of f = 8189 MF=0
`20 bytes
`
`[
`
`~
`3
`bytes
`
`Figure 10.2 Fragmentation of a 65535-byte datagram.
`
`DELL EX.1095.301
`
`
`
`Section 10.2
`
`Code Introduction 277
`
`Figure 10.2 shows a maximally sized IP datagram divided into 8190 fragments.
`Each fragment contains 8 bytes except the last, which contains only 3 bytes. We also
`show the MF bit set in all the fragments except the last. This is an unrealistic example,
`but it illustrates several implementation issues.
`The numbers above the original datagram are the byte offsets for the portion ofdata
`
`
`the datagram. The fragment offset (±p_o f f) is computed from the start of the data por-
`tion of the datagram. It is impossible for a fragment to include a byte beyond offset
`65514 since the reassembled datagram would be larger than 65535 bytes--the maxi-
`mum value of the ±p_len field. This restricts the maximum value of ±p_off to 8189
`
`(8189 x 8 = 65512), which leaves room for 3 bytes in the last fragment. If IP options are
`present, the offset must be smaller still.
`Because an IP internet is connectionless, fragments from one datagram may be
`interleaved with those from another at the destination. ±p_id uniquely identifies the
`fragments of a particular datagram. The source system sets ip_±d in each datagram to
`a unique value for all datagrams using the same source (±p_arc), destination
`(±p_dst), and protocol (±p_p) values for the lifetime of the datagram on the internet.
`To summarize, ±p_±d identifies the fragments of a particular datagram, J_p_off
`positions the fragment within the original datagram, and the MF bit marks every frag-
`ment except the last.
`
`10.2 Code Introduction
`
`The reassembly data structures appear in a single header. Reassembly and fragmenta-
`tion processing is found in two C files. The three files are listed in Figure 10.3.
`
`File
`net inet / ip_var, h
`net inet / ip_output, c
`net inet / ip_input, c
`
`Description
`reassembly data structures
`fragmentation code
`reassembly code
`
`Figure 10.3
`
`Files discussed in this chapter.
`
`Global Variables
`
`Only one global variable, ±pq, is described in this chapter.
`
`~.Variable
`
`~.pq
`
`Type
`
`struct ipq *
`
`Description
`reassemblylist
`
`Figure 10.4 Global variable introduced in this chapter.
`
`DELL EX.1095.302
`
`
`
`278
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`Statistics
`
`The statistics modified by the fragmentation and reassembly code are shown in Fig-
`ure 10.5. They are a subset of the statistics included in the ±pstat structure described
`by Figure 8.4.
`
`ipstat member
`ips_cant frag
`
`Description
`#datagrams not sent because fragmentation was
`required but was prohibited by the DF bit
`ips_odropped #output packets dropped because of a memory shortage
`ips_o fragment s #fragments transmitted
`ips_fragmented #packets fragmented for output
`
`Figure 10.5 Statistics collected in this chapter.
`
`10.3
`
`Fragmentation
`
`and describe the fragmentation code. Recall from Fig-
`We now return to
`ip_output
`ure 8.25 that if a packet fits within the MTU of the selected outgoing interface, it is trans-
`mitted in a single link-level frame. Otherwise the packet must be fragmented and
`transmitted in multiple frames. A packet may be a complete datagram or it may itself
`be a fragment that was created by a previous system. We describe the fragmentation
`code in three parts:
`¯ determine fragment size (Figure 10.6),
`¯ construct fragment list (Figure 10.7), and
`¯ construct initial fragment and send fragments (Figure 10.8).
`
`253
`254
`255
`256
`257
`258
`259
`260
`261
`262
`263
`264
`265
`266
`
`/*
`* Too large for interface; fragment if possible.
`* Must be able to put at least 8 bytes per fragment.
`*/
`if (ip >ip_off & IP_DF) {
`error = EMSGSIZE;
`ipstat.ips_cantfrag++;
`goto bad;
`
`}
`fen : (ifp >if_mtu hlen) & -7;
`if (len < 8) {
`error - EMSGSIZE;
`goto bad;
`
`}
`
`Figure 10.6 ip_output function: determine fragment size.
`
`ip_output.c
`
`ip_output.c
`
`253 261
`
`The fragmentation algorithm is straightforward, but the implementation is compli-
`cated by the manipulation of the mbuf structures and chains. If fragmentation is
`
`DELL EX.1095.303
`
`
`
`Section 10.3
`
`Fragmentation 279
`
`prohibited by the DF bit, ip_output discards the packet and returns EMSGSIZE. If the
`datagram was generated on this host, a transport protocol passes the error back to the
`process, but if the datagram is being forwarded, ip_forward generates an ICMP desti-
`nation unreachable error with an indication that the packet could not be forwarded
`without fragmentation (Figure 8.21).
`Net/3 does not implement the path MTU discovery algorithms used to probe the
`path to a destination and discover the largest transmission unit supported by all the
`intervening networks. Sections 11.8 and 24.2 of Volume 1 describe path MTU discovery
`for UDP and TCP.
`len, the number of data bytes in each fragment, is computed as the MTU of the
`interface less the size of the packet’s header and then rounded down to an 8-byte
`boundary by clearing the low-order 3 bits (& -7). If the MTU is so small that each frag-
`ment contains less than 8 bytes,
`returns
`ip_output
`EMSGSIZE.
`Each new fragment contains an IP header, some of the options from the original
`packet, and at most fen data bytes.
`
`The code in Figure 10.7, which is the start of a C compound statement, constructs
`the list of fragments starting with the second fragment. The original packet is converted
`into the initial fragment after the list is created (Figure 10.8).
`The extra block allows mhlen, firstlen, and mnext to be declared closer to their
`use in the function. These variables are in scope until the end of the block and hide any
`similarly named variables outside the block.
`Since the original mbuf chain becomes the first fragment, the for loop starts with
`the offset of the second fragment: hJ_en + 3_en. For each fragment ip_output takes the
`following actions:
`
`Allocate a new packet mbuf and adjust its re_data pointer to leave room for a
`16-byte link-layer header (max_linkhdr). If ip_output didn’t do this, the
`network interface driver would have to allocate an additional mbuf to hold the
`link header or move the data. Both are time-consuming tasks that are easily
`avoided here.
`Copy the IP header and IP options from the original packet into the new packet.
`The former is copied with a structure assignment, ip_optcopy copies only
`those options that get copied into each fragment (Section 10.4).
`Set the offset field (ip_off) for the fragment including the MF bit. If MF is set
`in the original packet, then MF is set in all the fragments. If MF is not set in the
`original packet, then MF is set for every fragment except the last.
`Set the length of this fragment accounting for a shorter header (ip_optcopy
`may not have copied all the options) and a shorter data area for the last frag-
`ment. The length is stored in network byte order.
`Copy the data from the original packet into this fragment, re_copy allocates
`additional mbufs if necessary. If re_copy fails, ENOBUFS is posted. Any mbufs
`already allocated are discarded at sendorfree.
`
`262--266
`
`267--269
`
`270--276
`
`277--284
`
`285--290
`
`291--297
`
`298
`
`299--305
`
`DELL EX.1095.304
`
`
`
`280
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`ip_output.c
`
`mhlen, firstlen = len;
`int
`struct mbuf **mnext = &m->m_nextpkt;
`
`* Loop through length of segment after first fragment,
`* make new header and copy data of each part and link onto chain.
`
`m0 - m;
`mhlen - sizeof(struct ip);
`for (off = hlen + len; off < (u_short) ip->ip_len; off +- len) {
`MGETHDR(m, M_DONTWAIT, MT_HEADER);
`if (m -= 0) {
`error - ENOBUFS;
`ipstat.ips_odropped++;
`goto sendorfree;
`
`]m
`
` >m_data +- max_linkhdr;
`mhip - mtod(m, struct ip *);
`
`if (hlen > sizeof(struct ip)) {
`mhlen = ip_optcopy(ip, mhip) + sizeof(struct ip);
`mhip->ip_hl - mhlen >> 2;
`
`}m
`
`->m_len - mhlen;
`mhip->ip_off = ((off - hlen) >> 3) + (ip->ip_off & -IP_MF);
`if (ip->ip_off & IP_MF)
`mhip->ip_off I: IP_MF;
`if (off + len >= (u_short) ip->ip_len)
`len = (u_short) ip->ip_len - off;
`
`else
`mhip->ip_off I= IP_MF;
`mhip->ip_len = htons((u_short) (len + mhlen));
`m->m_next - m_copy(m0, off, len);
`if (m->m next == 0) {
`(void) m_free(m);
`/* ??? */
`error - ENOBUFS;
`ipstat.ips_odropped++;
`goto sendorfree;
`
`}m
`
`->m_~kthdr.len = mhlen + fen;
`m->m~kthdr.rcvif = (struct ifnet *) 0;
`mhip->ip_off = htons((u_short) mhip->ip_off);
`mhip->ip_sum = 0;
`mhip->ip_sum - in_cksum(m, mhlen);
`*mnext : m;
`mnext : &m->m nextpkt;
`ipstat.ips_ofragments++;
`
`Figure 10.7 ip_output function: construct fragment list.
`
`ip_output.c
`
`}
`
`267
`
`268
`269
`
`270
`271
`272
`273
`274
`275
`276
`277
`278
`279
`280
`281
`282
`283
`284
`285
`286
`287
`288
`289
`290
`291
`292
`293
`294
`295
`296
`297
`298
`299
`3OO
`301
`302
`303
`304
`3O5
`306
`307
`3O8
`309
`310
`311
`312
`313
`314
`
`DELL EX.1095.305
`
`
`
`Section 10.3
`
`306--314
`
`Fragmentation 281
`
`Adjust the mbuf packet header of the newly created fragment to have the correct
`total length, clear the new fragment’s interface pointer, convert ip_o£ f to net-
`work byte order, compute the checksum for the new fragment, and link the frag-
`ment to the previous fragment through m_nextpkt.
`
`In Figure 10.8, ip_output
`fragment to the interface layer.
`
`constructs the initial fragment and then passes each
`
`315
`316
`317
`318
`319
`320
`321
`322
`323
`324
`325
`326
`327
`328
`329
`330
`331
`332
`333
`334
`335
`
`336
`337
`398
`
`/. ip_outpuLc
`* Update first fragment by trilnming what’s been copied out
`* and updating header, then send each fragment (in order).
`*/
`m = mO;
`m_adj(m, hlen + firstlen (u_short) ip->ip_len);
`m >m_pkthdr.len - hlen + firstlen;
`ip->ip_len = htons((u_short) m->m_pkthdr.len);
`ip->ip_off = htons((u_short) (ip->ip_off I IP_MF));
`ip->ip_sum = O;
`ip->ip_sum = in_cksum(m, hlen);
`sendorfree:
`for (m = mO; m; m - mO) {
`mO = m->m nextpkt;
`m->m nextpkt : O;
`if (error == O)
`(ifp, m,
`error = (*ifp->if_output)
`(struct sockaddr *) dst, ro->ro_rt);
`
`else
`m_freem(m);
`
`if (error == O)
`ipstat.ips_fragmented++;
`
`}
`
`ip_output.c
`
`Figure 10.8 ip_output function: send fragments.
`
`315--325
`
`326--338
`
`The original packet is converted into the first fragment by trimming the extra data
`from its end, setting the MF bit, converting ip_! en and ip_o f f to network byte order,
`and computing the new checksum. All the IP options are retained in this fragment. At
`the destination host, only the IP options from the first fragment of a datagram are
`retained when the datagram is reassembled (Figure 10.28). Some options, such as
`source routing, must be copied into each fragment even though the option is discarded
`during reassembly.
`At this point, ip_output has either a complete list of fragments or an error has
`occurred and the partial list of fragments must be discarded. The for loop traverses
`the list either sending or discarding fragments according to error. Any error encoun-
`tered while sending fragments causes the remaining fragments to be discarded.
`
`DELL EX.1095.306
`
`
`
`282
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`10.4 ip_optcopy Function
`
`During fragmentation, ip_optcopy (Figure 10.9) copies the options from the incoming
`packet (if the packet is being forwarded) or from the original datagram (if the datagram
`is locally generated) into the outgoing fragments.
`
`395 int
`396 ip_optcopy(ip, jp)
`397 struct ip *ip, *jp;
`398 {
`399
`400
`
`u_char *cp, *dp;
`opt, optlen, cnt;
`int
`
`ip_output.c
`
`cp : (u_char *) (ip + i);
`dp : (u_char *) (jp + i);
`cnt : (ip >ip_hl << 2) sizeof(struct ip);
`for (; cnt > 0; cnt = optlen, cp +: optlen) {
`opt - cp[0];
`if (opt -= IPOPT_EOL)
`break;
`if (opt :: IPOPT_NOP) {
`/* Preserve for IP mcast tunnel’s LSRR alignment. */
`*dp++ - IPOPT_NOP;
`optlen - i;
`continue;
`} else
`optlen - cp[IPOPT_OLEN];
`/* bogus lengths should have been caught by ip_dooptions */
`if (optlen > cnt)
`optlen = cnt;
`if (IPOPT_COPIED(opt))
`bcopy((caddr_t) cp,
`dp += optlen;
`
`{
`
`(caddr_t) dp, (unsigned) optlen);
`
`(jp + i); optlen & 0x3; optlen++)
`
`}
`
`}f
`
`or (optlen = dp - (u_char *)
`*dp++ : IPOPT_EOL;
`return (optlen);
`
`
`
`Figure 10.9 ip_optcopy function.
`
`ip_output.c
`
`401
`402
`403
`404
`405
`406
`407
`408
`409
`410
`411
`412
`413
`414
`415
`416
`417
`418
`419
`420
`421
`422
`423
`424
`425
`426
`
`ip_optcopy are: ip, a pointer to the IP header of the outgoing
`The arguments to
`packet; and j p, a pointer to the IP header of the newly created fragment, i p_olot c
`initializes cp and dp to point to the first option byte in each packet and advances cp
`and dp as it processes each option. The first for loop copies a single option during
`each iteration stopping when it encounters an EOL option or when it has examined all
`the options. NOP options are copied to preserve any alignment constraints in the sub-
`sequent options.
`
`The Net/2 release discarded NOP options.
`
`DELL EX.1095.307
`
`
`
`Section 10.5
`
`Reassembly 283
`
`423-426
`
`ip_optcopy copies the option
`If IPOPT_COPIED indicates that the copied bit is on,
`
`to the new fragment. Figure 9.5 shows which options have the copied bit set. If an
`option length is too large, it is truncated; ip_dooptions should have already discov-
`ered this type of error.
`The second for loop pads the option list out to a 4-byte boundary. This is required,
`since the packet’s header length (ip_hlen) is measured in 4-byte units. It also ensures
`that the transport header that follows is aligned on a 4-byte boundary. This improves
`performance since many transport protocols are designed so that 32-bit header fields are
`aligned on 32-bit boundaries if the transport header starts on a 32-bit boundary. This
`arrangement increases performance on CPUs that have difficulty accessing unaligned
`32-bit words.
`Figure 10.10 illustrates the operation of ip_optcopy.
`
`IP header
`
`20 bytes
`
`timestamp option
`
`12 bytes ..
`
`LSRR option
`
`11 bytes
`
`IP header
`
`20 bytes
`Figure 10.10
`
`LSRR option
`
`I ~_end-of-list
`option
`]~
`11 bytes
`1
`Not all options are copied during fragmentation.
`
`does not copy the timestamp option (its
`In Figure 10.10 we see that
`ip_optcopy
`copied bit is 1). ip_optcopy has also
`copied bit is 0) but does copy the LSRR option (its
`added a single EOL option to pad the new options to a 4-byte boundary.
`
`10.5
`
`Reassembly
`
`Now that we have described the fragmentation of a datagram (or of a fragment), we
`return to ipintr and the reassembly process. In Figure 8.15 we omitted the reassembly
`code from ipintr and postponed its discussion, ipintr can pass only entire data-
`grams up to the transport layer for processing. Fragments that are receive
`d by ipintr
`are passed to il~_reass, which attempts to reassemble fragments into complete data-
`grams. The code from ipintr is shown in Figure 10.11.
`Recall that ip_off contains the DF bit, the MF bit, and the fragment offset. The DF
`bit is masked out and if either the MF bit or fragment offset is nonzero, the packet is a
`fragment that must be reassembled. If both are zero, the packet is a complete datagram,
`the reassembly code is skipped and the else clause at the end of Figure 10.11 is exe-
`cuted, which excludes the header length from the total datagram length.
`ra_pul lup moves data in an external cluster into the data area of the mbuf. Recall
`that the SLIP interface (Section 5.3) may return an entire IP packet in an external cluster
`if it does not fit in a single mbuf. Also ra_devget can return the entire packet in a clus-
`ter (Section 2.6). Before the mtod macros will work (Section 2.6), m_pullup must move
`the IP header from the cluster into the data area of an mbuf.
`
`271 --279
`
`280--286
`
`DELL EX.1095.308
`
`
`
`284
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`ipjnput.c
`
`271
`272
`273
`274
`275
`276
`277
`278
`279
`280
`281
`282
`283
`284
`285
`286
`287
`288
`289
`290
`291
`292
`293
`294
`295
`296
`297
`298
`
`999
`300
`301
`302
`303
`304
`305
`306
`307
`308
`
`309
`310
`311
`312
`313
`314
`315
`316
`317
`318
`319
`320
`321
`322
`
`ours:
`/*
`* If offset or IP_I~F are set, must reassemble.
`* Otherwise, nothing need be done.
`* (We could look in the reassembly q%teue to see
`* if the packet was previously fragmented,
`* but it’s not worth the time; just let them time out.)
`*/
`if (ip->ip_off & -IP_DF) {
`if (m->m_flags & M_EXT) { /* XXX */
`if ((m : m pullup(m, sizeof(struct ip))) := 0) {
`ipstat.ips_toosmall++;
`goto next;
`
`}
`ip : mtod(m, struct ip *);
`
`}
`/*
`* Look for queue of fragments
`* of this datagram.
`*/
`for (fp : ipq.next; fp !- &ipq; fp ~ fp->next)
`if (ip->ip_id == fp >ipq_id &&
`ip->ip_src.s_addr == fp->ipq_src.s_addr &&
`ip->ip_dst.s_addr -= fp->ipq_dst.s_addr &&
`ip->ip_p := fp->ipq_p)
`goto found;
`
`fp = 0;
`found:
`
`* Adjust ip_len to Not reflect header,
`* set ip_mff if more fragments are expected,
`* convert offset of this to bytes.
`*/
`ip->ip_len -: hlen;
`((struct ipasfrag *) ip)->ipf_mff &=
`if (ip->ip_off & IP_MF)
`((struct ipasfrag *) ip)->ipf_mff I= i;
`ip->ip_off <<= 3;
`
`* If datagram marked as having more fragments
`* or if this is not the first fragment,
`* attempt reassembly; if it succeeds, proceed.
`./
`if (((struct ipasfrag *) ip)->ipf_mff & 1 I I ip >ip_off)
`ipstat.ips_fragments++;
`ip = ip_reass((struct ipasfrag *) ip, fp);
`if (ip =: 0)
`goto next;
`ipstat.ips_reassembled++;
`m = dtom(ip);
`} else if (fp)
`ip_freef(fp);
`
`DELL EX.1095.309
`
`
`
`Section 10.5
`
`Reassembly 285
`
`323
`324
`
`} else
`ip->ip_len -: hlen;
`
`
`
`Figure 10.11 ipintr function: fragment processing.
`
`ip_input.c
`
`287 297
`
`Net/3 keeps incomplete datagrams on the global doubly linked list, ipq. The name
`is somewhat confusing since the data structure isn’t a queue. That is, insertions and
`deletions can occur anywhere in the list, not just at the ends. We’ll use the term
`emphasize this fact.
`ipintr performs a linear search of the list to locate the appropriate datagram for
`the current fragment. Remember that fragments are uniquely identified by the 4-tuple:
`{ip_id, ip_src, ip_dst, ip_p}. Each entry in ipq is a list of fragments and fp points
`to the appropriate list if ipintr finds a match.
`
`Net/3 uses linear searches to access many of its data structures. While simple, this method can
`become a bottleneck in hosts supporting large numbers of network connections.
`
`298 303
`
`At found, the packet is modified by ipintr to facilitate reassembly:
`
`304
`
`305--307
`
`¯
`
`¯
`
`to exclude the standard IP header and any options.
`ipintr changes
`ip_len
`We must keep this in mind to avoid confusion with the standard interpretation
`of ip_len, which includes the standard header, options, and data. ip_len is
`also changed if the reassembly code is skipped because this is not a fragment.
`ipintr copies the MF flag into the low-order bit of ipf_mff, which overlays
`ip_tos (&- -1 clears the low-order bit only). Notice that ip must be cast to a
`pointer to an ipasfrag structure before ipf_mff is a valid member Sec-
`tion 10.6 and Figure 10.14 describe the ipas frag structure.
`
`Although RFC 1122 requires the IP layer to provide a mechanism that enables the transport
`layer to set ip_tos for every outgoing datagram, it only recommends that the IP layer pass
`ip_tos values to the transport layer at the destination host. Since the low-order bit of the
`TOS field must always be 0, it is available to hold the MF bit while ip_of f (where the MF bit
`is normally found) is used by the reassembly algorithm.
`ip_of f can now be accessed as a 16-bit offset instead of 3 flag bits and a 13-bit
`offset.
`ip_of f is multiplied by 8 to convert from 8-byte to 1-byte units.
`
`ipf_mff and ip_off determine if ipintr should attempt reassembl~. Fig-
`ure 10.12 describes the different cases and the corresponding actions. Remember that
`fp points to the list of fragments the system has previously received for the datagram.
`Most of the work is done by ip_reass. "
`If ip_reass is able to assemble a complete datagram by combining the current
`fragment with previously received fragments, it returns a pointer to the reassembled
`datagram. If reassembly is not possible, ip_reass saves the fragment and ipintr
`jumps to next to process the next packet (Figure 8.12).
`This e ! s e branch is taken when a complete datagram arrives and ip_hl en is mod-
`ified as described earlier This is the normal flow, since most received datagrams are not
`fragments.
`
`308
`
`309--322
`
`323--324
`
`DELL EX.1095.310
`
`
`
`286
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`Action
`no assembly required
`discard the previous fragments
`initialize new fragment list
`with this fragment
`insert into existing fragment
`list, attempt reassembly
`initialize new fragment list
`insert into existing fragment
`list, attempt reassembly
`
`ip_off
`
`ipf_mf f
`
`fp
`
`Description
`complete datagram
`complete datagram
`fragment of new datagram
`
`null
`nonnull
`null
`
`false
`false
`true
`
`true
`
`00a
`
`ny
`
`any
`
`nonnull
`
`fragment of incomplete datagram
`
`nonzero
`nonzero
`
`false
`false
`
`null
`nonnull
`
`tail fragment of new datagram
`tail fragment of incomplete datagram
`
`Figure 10.12 IP fragment processing in ipintr and ip_reass.
`
`If a complete datagram is available after reassembly processing, it is passed up to
`the appropriate transport protocol by ip i n t r (Figure 8.15):
`(*inetsw[ip_protox[ip->ip_p]].pr_input) (m, hlen);
`
`10.6
`
`ip_reass Function
`
`ipintr passes ip_reass a fragment to be processed, and a pointer to the matching
`reassembly header from ipqo ip_reass attempts to assemble and return a complete
`datagram or links the fragment into the datagram’s reassembly list for reassembly when
`the remaining fragments arrive. The head of each reassembly list is an ipq structure,
`show in Figure 10.13.
`
`52 struct ipq {
`struct ipq *next, *prev;
`53
`/* to other reass headers */
`/* time for reass q to live */
`54
`u_char ipq_ttl;
`/* protocol of this fragment */
`55
`u_char ipq_p;
`/* sequence id for reassembly */
`u_short ipq_id;
`56
`struct ipasfrag *ipq_next, *ip~_prev;
`57
`/* to ip headers of fragments */
`58
`struct in_addr ipq_src, ipq_dst;
`59
`60 };
`
`
`
`Figure 10.13 ipq structure.
`
`ip_var.h
`
`ip_var.h
`
`52--60
`
`The four fields required to identify a datagram’s fragments, ip_id, ip_l~, ip_src,
`and ip_dst, are kept in the i~)q structure at the head of each reassembly list. Net/3
`constructs the list of datagrams with next and prey and the list of fragments with
`ipq_next and ipq_prev.
`The IP header of incoming IP packets is converted to an ipasfrag structure (Fig-
`ure 10.14) before it is placed on a reassembly list.
`
`DELL EX.1095.311
`
`
`
`ip_reass Function 287
`Section 10.6
`
`66
`67
`68
`69
`7O
`71
`72
`73
`74
`75
`76
`77
`78
`79
`8O
`81
`82
`83
`84
`85
`86
`
`ip_var.h
`
`struct ipas frag {
`#if BYTE_ORDER-- LITTLE_ENDIAN
`u_char ip_hl : 4,
`ip_v: 4 ;
`
`#endif
`#if BYTE_ORDER == BIG_ENDIAN
`u_char ip_v:4,
`ip_hl:4;
`
`#endif
`u_char ipf_mff;
`
`XXX overlays ip_tos: use low bit
`to avoid destroying tos;
`copied from (ip_off&IP_MF) */
`
`short
`u_short
`short
`u_char
`u_char
`u_short
`struct
`struct
`
`ip_len;
`ip_id;
`ip_off;
`ip_ttl;
`ip_p;
`ip_sum;
`ipasfrag *ipf_next;
`ipasfrag *ipf_prev;
`
`;
`
`/* next fragment */
`/* previous fragment */
`
`ip_var.h
`
`Figure 10.14 ipasfrag
`
`structure.
`
`66--86
`
`3_p_reass collects fragments for a particular datagram on a circular doubly linked
`list joined by the ilof_next and ipf_prev members. These pointers overlay the
`source and destination addresses in the IP header. The ipf_mff member overlays
`ip_tos from the ip structure. The other members are the same.
`
`Figure 10.15 illustrates the relationship between the fragment header list
`the fragments (ipas frag),
`Down the ]eft side of Figure 10.15 is the list of reassembly headers. The first node in
`the list is the global ipq structure, ipq. It never has a fragment list associated with it.
`The i~q list is a doubly linked list used to support fast insertions and deletions. The
`next and ~rev pointers reference the next or previous ipq structure, which we have
`shown by terminating the arrows at the corners of the structures.
`Each i~oq structure is the head node of a circular doubly linked list of ipas frag
`structures. Incoming fragments are placed on these fragment lists ordered by their frag-
`ment offset. We’ve highlighted the pointers for these lists in Figure 10.15.
`Figure 10.15 still does not show all the complexity of the reassembly structures. The
`reassembly code is difficult to follow because it relies so heavily on casting pointers to
`three different structures on the underlying mbuf. We’ve seen this technique already,
`for example, when an ip structure overlays the data portion of an mbuf.
`
`Figure 10.16 illustrates the relationship between an mbuf, an ipq structure, an
`ipas frag structure, and
`structure.
`an ip
`
`(ipq) and
`
`DELL EX.1095.312
`
`
`
`288
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`ipq:
`
`to end
`of list
`
`next
`prey
`
`ipq_next
`ipq_prev
`
`next -
`-- prey
`
`head of reassembly list;
`fragments are ever
`linked to this structure
`
`......... f~agmen
`ipas frag{ )
`
`t l!sts,.0{de[Pd.by, fr.agmen~ pffse~ ..... ,,
`ipas frag{ }
`
`-- ipq_prev
`
`~ ipf_next -
`ipf_prev ~
`
`~
`
`ipf_next
`~
`
`~ fragments
`~ for one
`~ datagram
`
`ipq{}
`next -
`-- prey
`
`to start
`of list
`
`ipq_next
`-- ipq__prev
`
`ipasf rag { }
`
`ipf_prev
`
`received
`fragments
`for one
`datagram
`
`Figure 10.15 The fragment header list, ipq, and fragments.
`
`re_data [ ]
`
`ip~{ } [
`
`next
`
`prev
`
`ipq_next ipq__prev
`
`ipasfrag{}
`
`~hl
`
`ip{}~~ ip_src
`
`ip_dst
`
`Figure 10.16 An area of memory can be accessed through multiple structures.
`
`DELL EX.1095.313
`
`
`
`Section 10.6
`
`ip_reass Function 289
`
`A lot of information is contained within Figure 10.16:
`
`All the structures are located within the data area of an mbuf.
`The ipq list consists of ipq structures joined by next and prey. Within the
`structure, the four fields that uniquely identify an IP datagram are saved
`(shaded in Figure 10.16).
`Each ipq structure is treated as an ipasfrag structure when accessed as the
`head of a linked list of fragments. The fragments are joined by ipf_next and
`ipf_prev, which overlay the ipq
`structures’
`ipq_next
`bers.
`Each ipasfrag structure overlays the ip structure from the incoming frag-
`ment. The data that arrived with the fragment follows the structure in the mbuf.
`The members that have a different meaning in the ipasfrag structure than
`they do in the ip structure are shaded.
`
`Figure 10.15 showed the physical connections between the reassembly structures
`and Figure 10.16 illustrated the overlay technique used by ip_reass. In Figure 10.17
`we show the reassembly structures from a logical point of view: this figure shows the
`reassembly of three datagrams and the relationship between the ipq list and the
`ipas frag structures.
`
`ipq : I
`
`fp
`
`ip_id : 5
`
`~ ip_id = 6
`
`ip~{}
`~ ip_id = 7
`
`ipasfrag{}~ ~ ipasfrag{) ~ ipasfrag{ } 1
`
`271£[i~ii$~;i<$$~<.:~<...,544~;{~!~ MF 815 816 1031
`
`
`
`~ ipasfrag{} ~ ipasfrag{} ~
`
`~ ipasfra~{} ~
`~ 816 1031
`
`Figure 10.17 Reassernbly of three IP datagrams.
`
`The head of each reassembly list contains the id, protocol, source, and destination
`address of the original datagram. Only the ip_id field is shown in the figure. Each
`fragment list is ordered by the offset field, the fragment is labeled with MF if the MF bit
`is set, and missing fragments appear as shaded boxes. The numbers within each frag-
`
`ment show the starting and ending byte offset for the fragment relative to the data
`portion of the original datagram, not to the IP header of the original datagram.
`The example is constructed to show three UDP datagrams with no IP options and
`-1024 bytes of data each. The total length of each datagram is 1052 (20 + 8 + 1024) bytes,
`
`DELL EX.1095.314
`
`
`
`290
`
`IP Fragmentation and Reassembly
`
`Chapter 10
`
`which is well within the 1500-byte MTU of an Ethernet. The datagrams encounter a
`SLIP link on the way to the destination, and the router at that link fragments the data-
`grams to fit within a typical 296-byte SLIP MTU. Each datagram arrives as four frag-
`ments. The first fragment contain a standard 20-byte IP header, the 8-byte UDP header,
`and 264 bytes of data. The second and third fragments contain a 20-byte IP header and
`272 bytes of data. The last fragment has a 20-byte header and 216 bytes of data
`(1032 = 272 x 3 + 216).
`In Figure 10.17, datagram 5 is missing a single fragment containing bytes 272
`through 543. Datagram 6 is missing the first fragment, bytes 0 through 271, and the end
`of the datagram starting at offset 816. Datagram 7 is missing the first three fragments,
`bytes 0 through 815.
`
`Figure 10.18 lists ±p_reass. Remember that ±p±ntr calls ±p_reass when an IP
`fragment has arrived for this host, and after any options have been processed.
`
`337 /*
`* Take incoming datagram fragment and try to
`338
`339
`* reassemble it into whole datagram~ If a chain for
`340
`* reassembly of this datagram already exists, then it
`* is given as fp; otherwise have to make a chain.
`341
`342 */
`343 struct ip *
`344 ip_reass(ip, fp)
`345 struct ipasfrag *ip;
`346 struct ipq *fp;
`347 {
`348
`349
`350
`351
`352
`
`struct mbuf *m = dtom(ip);
`struct ipasfrag *q;
`struct mbuf *t;
`int
`hlen = ip->ip_hl << 2;
`int
`i, next;
`
`353
`354
`355
`356
`357
`358
`
`/*
`* Presence of header sizes in mbufs
`* would confuse code below.
`*/
`m->m_data +- hlen;
`m->m len -- hlen;
`
`/* reassembly code */
`
`dropfrag:
`ipstat.ips_fragdropped++;
`m_freem(m);
`return (0);
`
`465
`466
`467
`468
`469 ]
`
`ip_input.c
`
`ip_input.c
`
`Figure 10.18 ip_reass
`
`function: datagram reassembly.
`
`343--358
`
`When ip_reass is called, ip points to the fragment and fp either points to the
`matching ±pq structure or is null.
`
`DELL EX.1095.315
`
`
`
`Section 10.6
`
`ip_reass Function 291
`
`465--469
`
`Since reassembly involves only the data portion of each fragment, ip_reass
`adjusts re_data and m_len from the mbuf containing the fragment to exclude the IP
`header in each fragment.
`When an error occurs during reassembly, the function jumps to dropfrag, which
`increments il~s_fragdropped, discards the fragment, and returns a null pointer.
`Dropping frag