Why use a non default packet scheduler?

Packet schedulers are a portion of the kernel that queues network data on a specific interface and governs how they are transmitted and received including buffers. Below I will breakdown a couple of the packet schedulers included in this kernel.



fq_codel

FQ_Codel (Fair Queuing Controlled Delay) is queuing discipline that combines Fair Queuing with the CoDel AQM scheme. FQ_Codel uses a stochastic model to classify incoming packets into different flows and is used to provide a fair share of the bandwidth to all the flows using the queue. Each such flow is managed by the CoDel queuing discipline. Reordering within a flow is avoided since Codel internally uses a FIFO queue.



pfifo_fast

The FIFO algorithm forms the basis for the default qdisc on all Linux network interfaces (pfifo_fast). It performs no shaping or rearranging of packets. It simply transmits packets as soon as it can after receiving and queuing them. This is also the qdisc used inside all newly created classes until another qdisc or a class replaces the FIFO.



A real FIFO qdisc must, however, have a size limit (a buffer size) to prevent it from overflowing in case it is unable to dequeue packets as quickly as it receives them. Linux implements two basic FIFO qdiscs, one based on bytes, and one on packets. Regardless of the type of FIFO used, the size of the queue is defined by the parameter limit. For a pfifo the unit is understood to be packets and for a bfifo the unit is understood to be bytes.



pie

PIE is designed to control delay effectively. First, an average dequeue rate is estimated based on the standing queue. The rate is used to calculate the current delay. Then, on a periodic basis, the delay is used to calculate the dropping probabilty. Finally, on arrival, a packet is dropped (or marked) based on this probability. PIE makes adjustments to the probability based on the trend of the delay i.e. whether it is going up or down.The delay converges quickly to the target value specified. alpha and beta are statically chosen parameters chosen to control the drop probability growth and are determined through control theoretic approaches. alpha determines how the deviation between the current and target latency changes probability. beta exerts additional adjustments depending on the latency trend. The drop probabilty is used to mark packets in ecn mode. However, as in RED, beyond 10% packets are dropped based on this probability. The bytemode is used to drop packets proportional to the packet size.



fq

A packet scheduler is charged with organizing the flow of packets through the network stack to meet a set of policy objectives. The kernel has quite a few of them, including CBQ for fancy class-based routing, CHOKe for routers, and a couple of variants on the CoDel queue management algorithm. FQ joins this list as a relatively simple scheduler designed to implement fair access across large numbers of flows with local endpoints while keeping buffer sizes down; it also happens to implement TCP pacing.



FQ keeps track of every flow it sees passing through the system. To do so, it calculates an eight-bit hash based on the socket associated with the flow, then uses the result as an index into an array of red-black trees. The data structure is designed, according to Eric, to scale well up to millions of concurrent flows. A number of parameters are associated with each flow, including its current transmission quota and, optionally, the time at which the next packet can be transmitted.



That transmission time is used to implement the TCP pacing support. If a given socket has a pace specified for it, FQ will calculate how far the packets should be spaced in time to conform to that pace. If a flow's next transmission time is in the future, that flow is added to another red-black tree with the transmission time used as the key; that tree, thus, allows the kernel to track delayed flows and quickly find the one whose next packet is due to go out the soonest. A single timer is then used, if needed, to ensure that said packet is transmitted at the right time.



The scheduler maintains two linked lists of active flows, the "new" and "old" lists. When a flow is first encountered, it is placed on the new list. The packet dispatcher services flows on the new list first; once a flow uses up its quota, that flow is moved to the old list. The idea here appears to be to give preferential treatment to new, short-lived connections — a DNS lookup or HTTP "GET" command, for example — and not let those connections be buried underneath larger, longer-lasting flows. Eventually the scheduler works its way through all active flows, sending a quota of data from each; then the process starts over.



There are a number of additional details, of course. There are limits on the amount of data queued for each flow, as well as a limit on the amount of data buffered within the scheduler as a whole; any packet that would exceed one of those limits is dropped. A special "internal" queue exists for high-priority traffic, allowing it to reach the wire more quickly. And so on.



One other detail is garbage collection. One problem with this kind of flow tracking is that nothing tells the scheduler when a particular flow is shut down; indeed, nothing can tell the scheduler for flows without local endpoints or for non-connection-oriented protocols. So the scheduler must figure out on its own when it can stop tracking any given flow. One way to do that would be to drop the flow as soon as there are no packets associated with it, but that would cause some thrashing as the queues empty and refill; it is better to keep flow data around for a little while in anticipation of more traffic. FQ handles this by putting idle flows into a special "detached" state, off the lists of active flows. Whenever a new flow is added, a pass is made over the associated red-black tree to clean out flows that have been detached for a sufficiently long time — three seconds in the current patch.



cake

The CAKE Principle:

(or, how to have your cake and eat it too)



This is a combination of several shaping, AQM and FQ

techniques into one easy-to-use package:



- An overall bandwidth shaper, to move the bottleneck away

from dumb CPE equipment and bloated MACs. This operates

in deficit mode (as in sch_fq), eliminating the need for

any sort of burst parameter (eg. token buxket depth).

Burst support is limited to that necessary to overcome

scheduling latency.



- A Diffserv-aware priority queue, giving more priority to

certain classes, up to a specified fraction of bandwidth.

Above that bandwidth threshold, the priority is reduced to

avoid starving other classes.



- Each priority class has a separate Flow Queue system, to

isolate traffic flows from each other. This prevents a

burst on one flow from increasing the delay to another.

Flows are distributed to queues using a set-associative

hash function.



- Each queue is actively managed by Codel. This serves

flows fairly, and signals congestion early via ECN

(if available) and/or packet drops, to keep latency low.

The codel parameters are auto-tuned based on the bandwidth

setting, as is necessary at low bandwidths.



The configuration parameters are kept deliberately simple

for ease of use. Everything has sane defaults. Complete

generality of configuration is not a goal.



The priority queue operates according to a weighted DRR

scheme, combined with a bandwidth tracker which reuses the

shaper logic to detect which side of the bandwidth sharing

threshold the class is operating. This determines whether

a priority-based weight (high) or a bandwidth-based weight

(low) is used for that class in the current pass.



This qdisc incorporates much of Eric Dumazet's fq_codel code,

customized for use as an integrated subordinate.