Do Big Packets work in IPv6? | Source CGP Grey, Wikimedia Commons

The design of IPv6 represented a relatively conservative evolutionary step of the Internet protocol. Mostly, it’s just IPv4 with significantly larger address fields. Mostly, but not completely, as there were some changes. IPv6 changed the boot process to use auto-configuration and multicast to perform functions that were performed by ARP and DHCP in IPv4. IPv6 added a 20-bit Flow Identifier to the packet header. IPv6 replaced IP header options with an optional chain of extension headers. IPv6 also changed the behaviour of packet fragmentation. This is what we will look at here.

We’ve looked at IP packet fragmentation earlier, so I won’t repeat myself here, but that article provides some useful background, so it may be useful to read before embarking on this.

IPv4 did not specify a minimum packet size that had to be passed un-fragmented through the network, although by implication it makes a lot of sense to ensure that the IP header itself is not fragmented. By implication, the minimum un-fragmented size is 40 bytes as presumably you would require any IP header options to remain with the common IPv4 header that is replicated in each fragment packet. The other limit defined by IPv4 was that a host had to be able to reassemble an IP packet that was up to at least 576 bytes in total.

The change implemented in IPv6 was that the IPv6 specification effectively cemented the IPv4 “DONT FRAGMENT” bit to ON. IPv6 packets could not be fragmented on the fly during transit across the network, so each router could either forward on an IPv6 packet or discard it. An early IPv6 specification, RFC 1883, published in 1995, required that IPv6 be able to pass any IPv6 packet across an IPv6 network that was up to 576 bytes in size without triggering packet fragmentation. This is a consistent change in the IPv4 behaviour, which effectively says that IP packets of up to 576 octets have a high probability of successful delivery, as they will not trigger any structural size limitations, and all hosts had to accept packets up to 576 bytes in size. By removing fragmentation on the fly within the network, the original IPv6 specification consistently translated this to a requirement to be able to pass a packet of up to 576 bytes in size across an IPv6 network without triggering fragmentation. This size was altered during the refinement of the protocol specification, and RFC 2460, published in 1998, raised this minimum size from 576 bytes to 1,280 bytes.

This raises two questions: Why set this “Don’t Fragment” bit to always ON? Why use 1,280 bytes as the critical minimum packet size?

Why Set “Don’t Fragment”?

This was a topic of considerable debate in the late 1980s in packet networking and surfaced once more in the design of IPv6 a few years later.

Fragmentation was seen as being highly inefficient. When a fragment is lost there is no ability to signal that just the missing fragment needs to be present. Instead, the sender is required to resend the entire packet once more. Fragmentation also represents a window of opportunity in terms of exploiting potential vulnerabilities. Trailing fragments have no upper layer transport protocol header, so firewalls have a problem in determining whether or not the fragment should be admitted. Packet reassembly consumes resources at the destination, so an attacker can generate packet fragments and force the intended victim to reserve for a period reassembly resources to await the remainder of the original packet’s fragments that will never arrive.

This and more was written up in a 1987 paper, “Fragmentation Considered Harmful” by Kent and Mogul, and the conclusions from the paper are to avoid fragmentation wherever possible.

This paper recommended the use of the “Don’t Fragment” flag as a means of supporting the communications hosts discovering the path MTU. The intended functionality was that a router that could not forward a packet as it was too large for the next hop link would return the leading bytes of the packet together with the value of the MTU of the next hop link back to the packet’s sender. The sender could assemble this information and discern the path MTU for each of the destinations that the local sender communicates with.

Why 1,280?

The online digital archives of the discussions at the time appear to be relatively incomplete, and I cannot find anything in the way of a design discussion about the selection of this value for IPv6. One view is that if not 576 then surely the next logical choice would be 1,500. The rationale for 1,500 was the predominance of Ethernet as an almost ubiquitous layer 2 media framework for digital transmission systems, and even if they were not Ethernet networks, Ethernet packet framing was extremely common.

It’s possible that considerations of encapsulation and the related area of tunnelling come into play here. The use of “shim” packet headers for PPP and MPLS would tend to imply that, as a universal constant 1,492 was probably a safe choice, allowing 8 bytes for local shim headers. However, if you admit this form of packet encapsulation, then why not allow for IPv6-in-IPv4 (20 bytes), or even IPv6-in-IPv6 (40 bytes)? If you allow for this possibility, then perhaps it might also make sense to also add a further 8 bytes for a UDP header, or 20 bytes to permit a TCP header. Even in this case of encapsulating IPv6-in-TCP-in-IPv6, the resultant payload packet size is a maximum of 1,440 octets.

It appears that the minimum unfragmented size of 1,280 bytes as specified in RFC 2460 appears to be the result of considering the extremely unlikely case of a 1,500 bytes packet that carries 220 octets of encapsulation headers. Our intuition tends to the view that the Internet supports a 1,500 byte packet almost ubiquitously, and most forms of tunnelling would be happily accommodated by allowing for much less than 220 bytes of tunnel encapsulation headers per packet.

So why was the value of 1,280 chosen? I’m afraid that I simply don’t know!

IPv6 Fragmentation Behaviour

How do we cope with variable packet size limits in the IPv6 network? When a router is passed a packet that is too large to be forwarded to the next hop, then the router is supposed to extract the source IPv6 address of the packet and generate an ICMP message addressed to this source address. The ICMP message includes the MTU value of the next hop and includes as a payload the leading bytes of the original packet.

What should an IPv6 host do when receiving this ICMP message?

RFC 1981 proposed that hosts reduced the size of packets: “The node MUST reduce the size of the packets it is sending along the path.”

In the context of a TCP session, this can be achieved through an interaction between the upper-level protocol and the ICMP receiver on the sending host. The ICMP Packet Too Big message should cause the TCP session to drop its estimate of the maximum un-fragmented packet size that is supported on the path between this host and the remote end (the “PathMTU”) to the value provided in the ICMP message. This way the TCP session should not generate fragmented packets, but dynamically adjust its packetization size to match what is supported on the network path. In other words, for TCP sessions, the preferred behaviour is not to use IP fragmentation as a response, but instead to push this information into the TCP session and adjust the TCP Maximum Segment Size appropriately. When this occurs as per the plan, one should not see fragmented TCP packets at all.

If we won’t (or shouldn’t) see TCP fragments in IPv6, should we expect to see UDP fragments? As with many questions about UDP behaviour, the answer is that “it depends!” What it depends on is the behaviour of the application that is emitting the UDP packets.

Assuming that the UDP application is a stateless application that operates in a simply query/response model, such as a DNS server for example, then the application has no remembered connection state and no buffer of previously sent messages. This implies that the ICMP message is largely useless in any case!

So what value should an IPv6 host use for its local MTU?

If you set it low, such as a setting of 1,280 bytes, then attempts to send larger payloads in UDP will result in fragmentation of the payload.

If you set it higher, and 1,500 bytes is also common, then attempts to send a large payload, such as 1,400 bytes may encounter path constraints and may generate ICMP Packet Too Big ICMP messages. Assuming that the ICMP message makes it all the way back to the sender (which is by no means an assured outcome) then a TCP application can react by adjusting its session MSS. On the other hand, a UDP-based application may be simply be unable to react.

This would tend to suggest that a conservative approach is to use 1,280 bytes as a local MTU, as this would minimize, or hopefully eliminate the issues of UDP and ICMP PTB messages. However, relying on packet fragmentation, which is the inevitable consequence of using a small MTU with larger UDP payloads, may not necessarily be a good idea either. A soon-to-be Informational RFC, currently called draft-ietf-v6ops-ipv6-ehs-in-real-world by Fernando Gont and colleagues, indicated a packet drop rate of up to 55% when passing IPv6 packets with Fragmentation Extension headers through the network.

Experimenting in the DNS

Given that this issue of packet fragmentation is one that primarily concerns UDP, and the major user of UDP on the Internet today appears to be the DNS, we set up an experiment that tested the ability to pass variously sized IPv6 UDP DNS responses through the network.

We tested in three size ranges for packets: small (below 1,280 bytes); medium (between 1,280 and 1,500 bytes) and large (above 1,500 bytes). The first size is the control point, and we do not expect packets of this size to encounter any network delivery issues. The second size lies between 1280 and 1,500 bytes, and we expect to see some form of interaction between packets in this size range and network paths that generate ICMP PTB messages that have MTUs below the particular packet size. The third size is 1700 octets. The sending hosts use a local outbound MTU setting of 1,500, so the outbound packet is fragmented at the source into an initial segment and a trailing fragment. Both of these packets have a fragmentation header.

We used a modified DNS name server that ignored EDNS0 UDP buffer sizes, and did not respond to TCP connection requests. The system was configured with a local MTU of 1,500 bytes, and in this case listened exclusively on IPv6. The Linux kernel (running Debian GNU/Linux 8 (Jesse)) we used includes support for ICMPv6 Packet Too Big Messages, and will hold in its FIB a cache of the recent Path MTU values. In the case of outgoing UDP messages, the kernel will perform fragmentation down to 1,280 bytes.

We expected to see a result where all IPv6 resolvers were capable of receiving packets up to 1,280 bytes in length. It’s unclear how many resolvers sit behind network paths that do not admit 1,500-byte packets, so we are unsure what the drop rate might be for packets that are sized between 1,280 and 1,500 bytes. The interaction is a little more complex here, as while the local server will adjust its local cache of Path MTU in response to received ICMP messages, this will only be effective if the remote end uses the same interface identifier value subsequent query, and if the remote end performs a re-query within the local cache lifetime.

For packets larger than 1,500 octets, the response will be fragmented at the source, and there are three potential reasons why the network will discard the packet. The initial fragment will be 1,500 octets in length, which implies that this leading fragment will encounter the same path MTU issues as the slightly smaller packets. Secondly, firewalls may reject trailing fragments as there is no transport level port information in the trailing fragment. And finally, there is the Extension Header drop issue where other observations report a drop rate of approximately 50% when an IPv6 packet has a fragmentation Extension Header.

We would expect to see that if a resolver is able to follow the glueless delegation path for small packets then we would expect to see some level of packet drop for packets larger than 1280 bytes due to Path MTU mismatch issues where the remote end does not perform re-queries within the cache lifetime. For the packet test larger than 1,500 bytes this would be compounded by a further level of packet drop due to firewall filtering of trailing fragments, and in addition, there is the extension header packet drop problem as noted above.

How can you tell if a resolver has received a response?The technique we used in this experiment is one of a combination of dynamic generation of DNS labels and the synthesis of “glueless delegation”. The technique we used in this experiment is one of a combination of dynamic generation of DNS labels and the synthesis of “glueless delegation”. When a resolver attempts to resolve a unique terminal name, the “parent” will send a response to indicate that the terminal name lies in a uniquely named delegated zone, and provides the DNS name of the authoritative name server for this delegated zone, but it deliberately omits the conventional inclusion of the IP address values of this name server (the “glue” records that are conventionally loaded into the Additional section of a DNS response). This means that the resolver will have to separately resolve this DNS name to an IP address before it can resume its task to resolve the original name. We have made this name a synthetic unique name so that the resolver cannot use a cached value, and must perform the name resolution task each time. In our case, we have deliberately inflated the DNS response to the query for the address record of the authoritative name server, and the visible confirmation that the DNS resolver has successfully received this inflated response is in the subsequent query to the delegated domain for the original name.



What we observed from this experiment is shown in Table 1.