1 Notably, the data flow and the control plane are separated. A node can remove itself by notifying its peers without losing a single packet.

An increasingly popular design for a datacenter network is BGP on the host: each host ships with a BGP daemon to advertise the IPs it handles and receives the routes to its fellow servers. Compared to a L2-based design, it is very scalable, resilient, cross-vendor and safe to operate. Take a look at “L3 routing to the hypervisor with BGP ” for a usage example.

BGP on the host with a spine-leaf IP fabric. A BGP session is established over each link and each host advertises its own IP prefixes.

While routing on the host eliminates the security problems related to Ethernet networks, a server may announce any IP prefix. In the above picture, two of them are announcing 2001:db8:cc::/64 . This could be a legit use of anycast or a prefix hijack. BGP offers several solutions to improve this aspect and one of them is to reuse the features around the RPKI .

Short introduction to the RPKI #

On the Internet, BGP is mostly relying on trust. This contributes to various incidents due to operator errors, like the one that affected Cloudflare a few months ago, or to malicious attackers, like the hijack of Amazon DNS to steal cryptocurrency wallets. RFC 7454 explains the best practices to avoid such issues.

2 People often use AS sets, like AS-APPLE in this example, as they are convenient if you have multiple AS numbers or customers. However, there is currently nothing preventing a rogue actor to add arbitrary AS numbers to their AS set.

IP addresses are allocated by five Regional Internet Registries ( RIR ). Each of them maintains a database of the assigned Internet resources, notably the IP addresses and the associated AS numbers. These databases may not be totally reliable but are widely used to build ACLs to ensure peers only announce the prefixes they are expected to. Here is an example of ACLs generated by bgpq3 when peering directly with Apple:

$ bgpq3 -l v6-IMPORT-APPLE -6 -R 48 -m 48 -A -J -E AS-APPLE policy-options { policy-statement v6-IMPORT-APPLE { replace: from { route-filter 2403:300::/32 upto /48; route-filter 2620:0:1b00::/47 prefix-length-range /48-/48; route-filter 2620:0:1b02::/48 exact; route-filter 2620:0:1b04::/47 prefix-length-range /48-/48; route-filter 2620:149::/32 upto /48; route-filter 2a01:b740::/32 upto /48; route-filter 2a01:b747::/32 upto /48; } } }

The RPKI (RFC 6480) adds public-key cryptography on top of it to sign the authorization for an AS to be the origin of an IP prefix. Such record is a Route Origination Authorization (ROA). You can browse the databases of these ROAs through the RIPE’s RPKI Validator instance:

RPKI validator shows one ROA for 85.190.88.0/21

BGP daemons do not have to download the databases or to check digital signatures to validate the received prefixes. Instead, they offload these tasks to a local RPKI validator implementing the “ RPKI -to-Router Protocol” ( RTR , RFC 6810).

For more details, have a look at “ RPKI and BGP : our path to securing Internet Routing.”

Using origin validation in the datacenter#

While it is possible to create our own RPKI for use inside the datacenter, we can take a shortcut and use a validator implementing RTR , like GoRTR, and accepting another source of truth. Let’s work on the following topology:

BGP on the host with prefix validation using RTR. Each server has its own AS number. The leaf routers establish RTR sessions to the validators.

3 We are using 16-bit AS numbers for readability. Because we need to assign a different AS number for each host in the datacenter, in an actual deployment, we would use 32-bit AS numbers.

You assume we have a place to maintain a mapping between the private AS numbers used by each host and the allowed prefixes:

ASN Allowed prefixes AS 65005 2001:db8:aa::/64 AS 65006 2001:db8:bb::/64 ,

2001:db8:11::/64 AS 65007 2001:db8:cc::/64 AS 65008 2001:db8:dd::/64 AS 65009 2001:db8:ee::/64 ,

2001:db8:11::/64 AS 65010 2001:db8:ff::/64

From this table, we build a JSON file for GoRTR, assuming each host can announce the provided prefixes or longer ones (like 2001:db8:aa::­42:d9ff:­fefc:287a/128 for AS 65005):

{ "roas" : [ { "prefix" : "2001:db8:aa::/64" , "maxLength" : 128 , "asn" : "AS65005" }, { "…" : "…" }, { "prefix" : "2001:db8:ff::/64" , "maxLength" : 128 , "asn" : "AS65010" }, { "prefix" : "2001:db8:11::/64" , "maxLength" : 128 , "asn" : "AS65006" }, { "prefix" : "2001:db8:11::/64" , "maxLength" : 128 , "asn" : "AS65009" } ] }

This file is deployed to all validators and served by a web server. GoRTR is configured to fetch it and update it every 10 minutes:

$ gortr -refresh = 600 \ -verify = false -checktime = false \ -cache = http://127.0.0.1/rpki.json INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96 INFO[0000] Updated added, new serial 1

The refresh time could be lowered but GoRTR can be notified of an update using the SIGHUP signal. Clients are immediately notified of the change.

The next step is to configure the leaf routers to validate the received prefixes using the farm of validators. Most vendors support RTR :

Configuring JunOS#

JunOS only supports plain-text TCP. First, let’s configure the connections to the validation servers:

routing-options { validation { group RPKI { session validator1 { hold-time 60 ; # session is considered down after 1 minute record-lifetime 3600 ; # cache is kept for 1 hour refresh-time 30 ; # cache is refreshed every 30 seconds port 8282 ; } session validator2 { /* OMITTED */ } session validator3 { /* OMITTED */ } } } }

By default, at most two sessions are randomly established at the same time. This provides a good way to load-balance them among the validators while maintaining good availability. The second step is to define the policy for route validation:

policy-options { policy-statement ACCEPT-VALID { term valid { from { protocol bgp; validation-database valid; } then { validation-state valid; accept; } } term invalid { from { protocol bgp; validation-database invalid; } then { validation-state invalid; reject; } } } policy-statement REJECT-ALL { then reject; } }

The policy statement ACCEPT-VALID turns the validation state of a prefix from unknown to valid if the ROA database says it is valid. It also accepts the route. If the prefix is invalid, the prefix is marked as such and rejected. We have also prepared a REJECT-ALL statement to reject everything else, notably unknown prefixes.

4 This restriction also prevents the peer from prepending its own ASN to deprioritize a path. A modern alternative is to use the graceful shutdown community.

A ROA only certifies the origin of a prefix. A malicious actor can therefore prepend the expected AS number to the AS path to circumvent the validation. For example, AS 65007 could annonce 2001:db8:dd::/64 , a prefix allocated to AS 65006, by advertising it with the AS path 65007 65006 . To avoid that, we define an additional policy statement to reject AS paths with more than one ASN:

policy-options { as-path EXACTLY-ONE-ASN "^.$" ; policy-statement ONLY-DIRECTLY-CONNECTED { term exactly-one-asn { from { protocol bgp; as-path EXACTLY-ONE-ASN; } then next policy; } then reject; } }

The last step is to configure the BGP sessions:

protocols { bgp { group HOSTS { local-as 65100 ; type external; # export [ … ]; import [ ONLY-DIRECTLY-CONNECTED ACCEPT-VALID REJECT-ALL ]; enforce-first-as; neighbor 2001:db8:42::a10 { peer-as 65005 ; } neighbor 2001:db8:42::a12 { peer-as 65006 ; } neighbor 2001:db8:42::a14 { peer-as 65007 ; } } } }

5 Cisco routers and FRR enforce the first AS by default. It is a tunable value to allow the use of route servers: they distribute prefixes on behalf of other routers.

The import policy rejects any AS path longer than one AS , accepts any validated prefix and rejects everything else. The enforce-first-as directive is also pretty important: it ensures the first (and, here, only) AS in the AS path matches the peer AS . Without it, a malicious neighbor could inject a prefix using an AS different than its own, defeating our purpose.

Let’s check the state of the RTR sessions and the database:

> show validation session Session State Flaps Uptime #IPv4/IPv6 records 2001:db8:4242::10 Up 0 00:16:09 0/9 2001:db8:4242::11 Up 0 00:16:07 0/9 2001:db8:4242::12 Connect 0 0/0 > show validation database RV database for instance master Prefix Origin-AS Session State Mismatch 2001:db8:11::/64-128 65006 2001:db8:4242::10 valid 2001:db8:11::/64-128 65006 2001:db8:4242::11 valid 2001:db8:11::/64-128 65009 2001:db8:4242::10 valid 2001:db8:11::/64-128 65009 2001:db8:4242::11 valid 2001:db8:aa::/64-128 65005 2001:db8:4242::10 valid 2001:db8:aa::/64-128 65005 2001:db8:4242::11 valid 2001:db8:bb::/64-128 65006 2001:db8:4242::10 valid 2001:db8:bb::/64-128 65006 2001:db8:4242::11 valid 2001:db8:cc::/64-128 65007 2001:db8:4242::10 valid 2001:db8:cc::/64-128 65007 2001:db8:4242::11 valid 2001:db8:dd::/64-128 65008 2001:db8:4242::10 valid 2001:db8:dd::/64-128 65008 2001:db8:4242::11 valid 2001:db8:ee::/64-128 65009 2001:db8:4242::10 valid 2001:db8:ee::/64-128 65009 2001:db8:4242::11 valid 2001:db8:ff::/64-128 65010 2001:db8:4242::10 valid 2001:db8:ff::/64-128 65010 2001:db8:4242::11 valid IPv4 records: 0 IPv6 records: 18

Here is an example of accepted route:

> show route protocol bgp table inet6 extensive all inet6.0: 11 destinations, 11 routes (8 active, 0 holddown, 3 hidden) 2001:db8:bb::42/128 (1 entry, 0 announced) *BGP Preference: 170/-101 Next hop type: Router, Next hop index: 0 Address: 0xd050470 Next-hop reference count: 4 Source: 2001:db8:42::a12 Next hop: 2001:db8:42::a12 via em1.0, selected Session Id: 0x0 State: <Active NotInstall Ext> Local AS: 65006 Peer AS: 65000 Age: 12:11 Validation State: valid Task: BGP_65000.2001:db8:42::a12+179 AS path: 65006 I Accepted Localpref: 100 Router ID: 1.1.1.1

A rejected route would be similar with the reason “rejected by import policy” shown in the details and the validation state would be invalid .

Configuring BIRD#

BIRD supports both plain-text TCP and SSH. Let’s configure it to use SSH. We need to generate keypairs for both the leaf router and the validators (they can all share the same keypair). We also have to create a known_hosts file for BIRD:

(validatorX) $ ssh-keygen -qN "" -t rsa -f /etc/gortr/ssh_key (validatorX) $ echo -n "validatorX:8283 " ; \ cat /etc/bird/ssh_key_rtr.pub validatorX:8283 ssh-rsa AAAAB3[…]Rk5TW0= (leaf1) $ ssh-keygen -qN "" -t rsa -f /etc/bird/ssh_key (leaf1) $ echo 'validator1:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts (leaf1) $ echo 'validator2:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts (leaf1) $ cat /etc/bird/ssh_key.pub ssh-rsa AAAAB3[…]byQ7s= (validatorX) $ echo 'ssh-rsa AAAAB3[…]byQ7s=' >> /etc/gortr/authorized_keys

GoRTR needs additional flags to allow connections over SSH:

$ gortr -refresh = 600 -verify = false -checktime = false \ -cache = http://127.0.0.1/rpki.json \ -ssh.bind = :8283 \ -ssh.key = /etc/gortr/ssh_key \ -ssh.method.key = true \ -ssh.auth.user = rpki \ -ssh.auth.key.file = /etc/gortr/authorized_keys INFO[0000] Enabling ssh with the following authentications: password=false, key=true INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96 INFO[0000] Updated added, new serial 1

Then, we can configure BIRD to use these RTR servers:

roa6 table ROA6; template rpki VALIDATOR { roa6 { table ROA6; }; transport ssh { user "rpki" ; remote public key "/etc/bird/known_hosts" ; bird private key "/etc/bird/ssh_key" ; }; refresh keep 30 ; retry keep 30 ; expire keep 3600 ; } protocol rpki VALIDATOR1 from VALIDATOR { remote validator1 port 8283 ; } protocol rpki VALIDATOR2 from VALIDATOR { remote validator2 port 8283 ; }

Unlike JunOS, BIRD doesn’t have a feature to only use a subset of validators. Therefore, we only configure two of them. As a safety measure, if both connections become unavailable, BIRD will keep the ROAs for one hour.

We can query the state of the RTR sessions and the database:

> show protocols all VALIDATOR1 Name Proto Table State Since Info VALIDATOR1 RPKI --- up 17:28:56.321 Established Cache server: rpki@validator1:8283 Status: Established Transport: SSHv2 Protocol version: 1 Session ID: 0 Serial number: 1 Last update: before 25.212 s Refresh timer : 4.787/30 Retry timer : --- Expire timer : 3574.787/3600 No roa4 channel Channel roa6 State: UP Table: ROA6 Preference: 100 Input filter: ACCEPT Output filter: REJECT Routes: 9 imported, 0 exported, 9 preferred Route change stats: received rejected filtered ignored accepted Import updates: 9 0 0 0 9 Import withdraws: 0 0 --- 0 0 Export updates: 0 0 0 --- 0 Export withdraws: 0 --- --- --- 0 > show route table ROA6 Table ROA6: 2001:db8:11::/64-128 AS65006 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:11::/64-128 AS65009 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:aa::/64-128 AS65005 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:bb::/64-128 AS65006 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:cc::/64-128 AS65007 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:dd::/64-128 AS65008 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:ee::/64-128 AS65009 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:ff::/64-128 AS65010 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100)

Like for the JunOS case, a malicious actor could try to workaround the validation by building an AS path where the last AS number is the legitimate one. BIRD is flexible enough to allow us to use any AS to check the IP prefix. Instead of checking the origin AS , we ask it to check the peer AS with this function, without looking at the AS path:

function validated(int peeras) { if (roa_check(ROA6, net, peeras) != ROA_VALID) then { print "Ignore invalid ROA " , net, " for ASN " , peeras; reject; } accept; }

The BGP instance is then configured to use the above function as the import policy:

protocol bgp PEER1 { local as 65100 ; neighbor 2001:db8:42::a10 as 65005 ; connect delay time 30 ; ipv6 { import keep filtered; import where validated(65005); # export …; }; }

You can view the rejected routes with show route filtered , but BIRD does not store information about the validation state in the routes. You can also watch the logs:

2019-07-31 17:29:08.491 <INFO> Ignore invalid ROA 2001:db8:bb::40:/126 for ASN 65005

Currently, BIRD does not reevaluate the prefixes when the ROAs are updated. There is work in progress to fix this. If this feature is important to you, have a look at FRR instead: it also supports the RTR protocol and triggers a soft reconfiguration of the BGP sessions when ROAs are updated.