Two years ago we blogged about our love of BPF (BSD packet filter) bytecode.



CC BY 2.0 image by jim simonson

Then we published a set of utilities we are using to generate the BPF rules for our production iptables: the bpftools.

Today we are very happy to open source another component of the bpftools: our p0f BPF compiler!

Meet the p0f

p0f is a tool written by superhuman Michal Zalewski.

The main purpose of p0f is to passively analyze and categorize arbitrary network traffic. You can feed p0f any packet and in return it will derive knowledge about the operating system that sent the packet.

One of the features that caught our attention was the concise yet explanatory signature format used to describe TCP SYN packets.

The p0f SYN signature is a simple string consisting of colon separated values. This string cleanly describes a SYN packet in a human-readable way. The format is pretty smart, skipping the varying TCP fields and keeping focus only on the essence of the SYN packet, extracting the interesting bits from it.

We are using this on daily basis to categorize the packets that we, at CloudFlare, see when we are a target of a SYN flood. To defeat SYN attacks we want to discriminate the packets that are part of an attack from legitimate traffic. One of the ways we do this uses p0f.

We want to rate limit attack packets, and in effect prioritize processing of other, hopefully legitimate, ones. The p0f SYN signatures give us a language to describe and distinguish different types of SYN packets.

For example here is a typical p0f SYN signature of a Linux SYN packet:

4:64:0:*:mss*10,6:mss,sok,ts,nop,ws:df,id+:0

while this is a Windows 7 one:

4:128:0:*:8192,8:mss,nop,ws,nop,nop,sok:df,id+:0

Not getting into details yet, but you can clearly see that there are differences between these operating systems. Over time we noticed that the attack packets are often different. Here are two examples of attack SYN packets:

4:255:0:0:*,0::ack+,uptr+:0 4:64:0:*:65535,*:mss,nop,ws,nop,nop,sok:df,id+:0

You can have a look at more signatures in p0f's README and signatures database.

It's not always possible to perfectly distinguish an attack from valid packets, but very often it is. This realization led us to develop an attack mitigation tool based on p0f SYN signatures. With this we can ask iptables to rate limit only the selected attack signatures.

But before we discuss the mitigations, let's explain the signature format.



CC BY-SA 3.0 image by Hyacinth at the English language Wikipedia

Signature

As mentioned, the p0f SYN signature is a colon-separated string with the following parts:

IP version : the first field carries the IP version. Allowed values are 4 and 6 .

: the first field carries the IP version. Allowed values are and . Initial TTL : assuming that realistically a packet will not jump through more than 35 hops, we can specify an initial TTL ittl (usual values are 255 , 128 , 64 and 32 ) and check if the packet's TTL is in the range (ittl, ittl - 35).

: assuming that realistically a packet will not jump through more than 35 hops, we can specify an initial TTL ittl (usual values are , , and ) and check if the packet's TTL is in the range (ittl, ittl - 35). IP options length : length of IP options. Although it's not that common to see options in the IP header (and so 0 is the typical value you would see in a signature), the standard defines a variable length field before the IP payload where options can be specified. A * value is allowed too, which means "not specified".

: length of IP options. Although it's not that common to see options in the IP header (and so is the typical value you would see in a signature), the standard defines a variable length field before the IP payload where options can be specified. A value is allowed too, which means "not specified". MSS : maximum segment size specified in the TCP options. Can be a constant or * .

: maximum segment size specified in the TCP options. Can be a constant or . Window Size : window size specified in the TCP header. It can be a expressed as:

: window size specified in the TCP header. It can be a expressed as: a constant c , like 8192

, like 8192 a multiple of the MSS, in the c*mss format

format a multiple of a constant, in the %c format

format any value, as *

Window Scale : window scale specified during the three way handshake. Can be a constant or * .

: window scale specified during the three way handshake. Can be a constant or . TCP options layout : list of TCP options in the order they are seen in a TCP packet.

: list of TCP options in the order they are seen in a TCP packet. Quirks : comma separated list of unusual (e.g. ACK number set in a non ACK packet) or incorrect (e.g. malformed TCP options) characteristics of a packet.

: comma separated list of unusual (e.g. ACK number set in a non ACK packet) or incorrect (e.g. malformed TCP options) characteristics of a packet. Payload class: TCP payload size. Can be 0 (no data), + (1 or more bytes of data) or * .

TCP Options format

The following common TCP options are recognised:

nop : no-operation

: no-operation mss : maximum segment size

: maximum segment size ws : window scaling

: window scaling sok : selective ACK permitted

: selective ACK permitted sack : selective ACK

: selective ACK ts : timestamp

: timestamp eol+x: end of options followed by x bytes of padding

Quirks

p0f describes a number of quirks:

df : don't fragment bit is set in the IP header

: don't fragment bit is set in the IP header id+ : df bit is set and IP identification field is non zero

: df bit is set and IP identification field is non zero id- : df bit is not set and IP identification is zero

: df bit is not set and IP identification is zero ecn : explicit congestion flag is set

: explicit congestion flag is set 0+ : reserved ("must be zero") field in IP header is not actually zero

: reserved ("must be zero") field in IP header is not actually zero flow : flow label in IPv6 header is non-zero

: flow label in IPv6 header is non-zero seq- : sequence number is zero

: sequence number is zero ack+ : ACK field is non-zero but ACK flag is not set

: ACK field is non-zero but ACK flag is not set ack- : ACK field is zero but ACK flag is set

: ACK field is zero but ACK flag is set uptr+ : URG field is non-zero but URG flag not set

: URG field is non-zero but URG flag not set urgf+ : URG flag is set

: URG flag is set pushf+ : PUSH flag is set

: PUSH flag is set ts1- : timestamp 1 is zero

: timestamp 1 is zero ts2+ : timestamp 2 is non-zero in a SYN packet

: timestamp 2 is non-zero in a SYN packet opt+ : non-zero data in options segment

: non-zero data in options segment exws : excessive window scaling factor (window scale greater than 14)

: excessive window scaling factor (window scale greater than 14) linux : match a packet sent from the Linux network stack ( IP.id field equal to TCP.ts1 xor TCP.seq_num ). Note that this quirk is not part of the original p0f signature format; we decided to add it since we found it useful.

: match a packet sent from the Linux network stack ( field equal to xor ). Note that this quirk is not part of the original p0f signature format; we decided to add it since we found it useful. bad: malformed TCP options

Mitigating attacks

Given a p0f SYN signature, we want to pass it to iptables for mitigation. It's not obvious how to do so, but fortunately we are experienced in BPF bytecode since we are already using it to block DNS DDoS attacks.

We decided to extend our BPF infrastructure to support p0f as well, by building a tool to compile a p0f SYN signature into a BPF bytecode blob, which got incorporated into the bpftools project.

This allows us to use a simple and human readable syntax for the mitigations - the p0f signature - and compile it to a very efficient BPF form that can be used by iptables.

With a p0f signature running as BPF in the iptables we're able to distinguish attack packets with a very high speed and react accordingly. We can either hard -j DROP them or do a rate limit if we wish.

How to compile p0f to BPF

First you need to clone the cloudflare/bpftools GitHub repository:

$ git clone https://github.com/cloudflare/bpftools.git

Then compile it:

$ cd bpftools $ make

With this you can run bpfgen p0f to generate a BPF filter that matches a p0f signature.

Here's an example where we take the p0f signature of a Linux TCP SYN packet (the one we introduced before), and by using bpftools we generate the BPF bytecode that will match this category of packets:

$ ./bpfgen p0f -- 4:64:0:*:mss*10,6:mss,sok,ts,nop,ws:df,id+:0 56,0 0 0 0,48 0 0 8,37 52 0 64,37 0 51 29,48 0 0 0, 84 0 0 15,21 0 48 5,48 0 0 9,21 0 46 6,40 0 0 6, 69 44 0 8191,177 0 0 0,72 0 0 14,2 0 0 8,72 0 0 22, 36 0 0 10,7 0 0 0,96 0 0 8,29 0 36 0,177 0 0 0, 80 0 0 39,21 0 33 6,80 0 0 12,116 0 0 4,21 0 30 10, 80 0 0 20,21 0 28 2,80 0 0 24,21 0 26 4,80 0 0 26, 21 0 24 8,80 0 0 36,21 0 22 1,80 0 0 37,21 0 20 3, 48 0 0 6,69 0 18 64,69 17 0 128,40 0 0 2,2 0 0 1, 48 0 0 0,84 0 0 15,36 0 0 4,7 0 0 0,96 0 0 1, 28 0 0 0,2 0 0 5,177 0 0 0,80 0 0 12,116 0 0 4, 36 0 0 4,7 0 0 0,96 0 0 5,29 0 1 0,6 0 0 65536, 6 0 0 0,

If this looks magical, use the -s flag to see the explanation on what's going on:

$ ./bpfgen -s p0f -- 4:64:0:*:mss*10,6:mss,sok,ts,nop,ws:df,id+:0 ; ip: ip version ; (ip[8] <= 64): ttl <= 64 ; (ip[8] > 29): ttl > 29 ; ((ip[0] & 0xf) == 5): IP options len == 0 ; (tcp[14:2] == (tcp[22:2] * 10)): win size == mss * 10 ; (tcp[39:1] == 6): win scale == 6 ; ((tcp[12] >> 4) == 10): TCP data offset ; (tcp[20] == 2): olayout mss ; (tcp[24] == 4): olayout sok ; (tcp[26] == 8): olayout ts ; (tcp[36] == 1): olayout nop ; (tcp[37] == 3): olayout ws ; ((ip[6] & 0x40) != 0): df set ; ((ip[6] & 0x80) == 0): mbz zero ; ((ip[2:2] - ((ip[0] & 0xf) * 4) - ((tcp[12] >> 4) * 4)) == 0): payload len == 0 ; ; ipver=4 ; ip and (ip[8] <= 64) and (ip[8] > 29) and ((ip[0] & 0xf) == 5) and (tcp[14:2] == (tcp[22:2] * 10)) and (tcp[39:1] == 6) and ((tcp[12] >> 4) == 10) and (tcp[20] == 2) and (tcp[24] == 4) and (tcp[26] == 8) and (tcp[36] == 1) and (tcp[37] == 3) and ((ip[6] & 0x40) != 0) and ((ip[6] & 0x80) == 0) and ((ip[2:2] - ((ip[0] & 0xf) * 4) - ((tcp[12] >> 4) * 4)) == 0) l000: ld #0x0 l001: ldb [8] l002: jgt #0x40, l055, l003 l003: jgt #0x1d, l004, l055 l004: ldb [0] l005: and #0xf l006: jeq #0x5, l007, l055 l007: ldb [9] l008: jeq #0x6, l009, l055 l009: ldh [6] l010: jset #0x1fff, l055, l011 l011: ldxb 4*([0]&0xf) l012: ldh [x + 14] l013: st M[8] l014: ldh [x + 22] l015: mul #10 l016: tax l017: ld M[8] l018: jeq x, l019, l055 l019: ldxb 4*([0]&0xf) l020: ldb [x + 39] l021: jeq #0x6, l022, l055 l022: ldb [x + 12] l023: rsh #4 l024: jeq #0xa, l025, l055 l025: ldb [x + 20] l026: jeq #0x2, l027, l055 l027: ldb [x + 24] l028: jeq #0x4, l029, l055 l029: ldb [x + 26] l030: jeq #0x8, l031, l055 l031: ldb [x + 36] l032: jeq #0x1, l033, l055 l033: ldb [x + 37] l034: jeq #0x3, l035, l055 l035: ldb [6] l036: jset #0x40, l037, l055 l037: jset #0x80, l055, l038 l038: ldh [2] l039: st M[1] l040: ldb [0] l041: and #0xf l042: mul #4 l043: tax l044: ld M[1] l045: sub x l046: st M[5] l047: ldxb 4*([0]&0xf) l048: ldb [x + 12] l049: rsh #4 l050: mul #4 l051: tax l052: ld M[5] l053: jeq x, l054, l055 l054: ret #65536 l055: ret #0

Example run

For example, consider we want to block SYN packets generated by the hping3 tool.

First, we need to recognize the p0f SYN signature. Here it is, we know that one off the top of our heads:

4:64:0:0:*,0::ack+:0

(notice: unless you use the -L 0 option, hping3 will send SYN packets with the ACK number set, interesting, isn't it?)

Now, we can use the bpftools to get BPF bytecode that will match the naughty packets:

$ ./bpfgen p0f -- 4:64:0:0:*,0::ack+:0 39,0 0 0 0,48 0 0 8,37 35 0 64,37 0 34 29,48 0 0 0, 84 0 0 15,21 0 31 5,48 0 0 9,21 0 29 6,40 0 0 6, 69 27 0 8191,177 0 0 0,80 0 0 12,116 0 0 4, 21 0 23 5,48 0 0 6,69 21 0 128,80 0 0 13, 69 19 0 16,64 0 0 8,21 17 0 0,40 0 0 2,2 0 0 3, 48 0 0 0,84 0 0 15,36 0 0 4,7 0 0 0,96 0 0 3, 28 0 0 0,2 0 0 7,177 0 0 0,80 0 0 12,116 0 0 4, 36 0 0 4,7 0 0 0,96 0 0 7,29 0 1 0,6 0 0 65536, 6 0 0 0,

This bytecode can then be passed to iptables:

$ sudo iptables -A INPUT -p tcp --dport 80 -m bpf --bytecode "39,0 0 0 0,48 0 0 8,37 35 0 64,37 0 34 29,48 0 0 0,84 0 0 15,21 0 31 5,48 0 0 9,21 0 29 6,40 0 0 6,69 27 0 8191,177 0 0 0,80 0 0 12,116 0 0 4,21 0 23 5,48 0 0 6,69 21 0 128,80 0 0 13,69 19 0 16,64 0 0 8,21 17 0 0,40 0 0 2,2 0 0 3,48 0 0 0,84 0 0 15,36 0 0 4,7 0 0 0,96 0 0 3,28 0 0 0,2 0 0 7,177 0 0 0,80 0 0 12,116 0 0 4,36 0 0 4,7 0 0 0,96 0 0 7,29 0 1 0,6 0 0 65536,6 0 0 0," -j DROP

And here's how it would look in iptables:

$ sudo iptables -L INPUT -v Chain INPUT (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 6 240 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80match bpf 0 0 0 0,48 0 0 8,37 35 0 64,37 0 34 29,48 0 0 0,84 0 0 15,21 0 31 5,48 0 0 9,21 0 29 6,40 0 0 6,69 27 0 8191,177 0 0 0,80 0 0 12,116 0 0 4,21 0 23 5,48 0 0 6,69 21 0 128,80 0 0 13,69 19 0 16,64 0 0 8,21 17 0 0,40 0 0 2,2 0 0 3,48 0 0 0,84 0 0 15,36 0 0 4,7 0 0 0,96 0 0 3,28 0 0 0,2 0 0 7,177 0 0 0,80 0 0 12,116 0 0 4,36 0 0 4,7 0 0 0,96 0 0 7,29 0 1 0,6 0 0 65536,6 0 0 0

Closing words

While defending from DDoS attacks is sometimes fun, most often it's a mundane repetitive job. We are constantly working on improving our automatic DDoS mitigation system, but we do not believe there is a strong reason to keep it all secret. We want to help others fighting attacks. Maybe if we all worked together one day we could solve the DDoS problem for all.

Releasing our code open source is an important part of CloudFlare. This blog post and the p0f BPF compiler are part of our effort to open source our DDoS mitigations. We hope others affected by SYN floods will find it useful.

Do you enjoy playing with low level networking bits? Are you interested in dealing with some of the largest DDoS attacks ever seen?

If so you should definitely have a look at the opened positions in our London, San Francisco, Singapore, Champaign (IL) and Austin (TX) offices!