14 min to read

Part 2: NSX-T routing deep dive - How a stateful service drastically changes routing

Quick recap of what we have covered so far:

In Part 1 we looked at two topologies:

T0 with ECMP + T1 with no stateful service

T0 with ECMP + T1 with a stateful service.

We looked at how north bound traffic changed before and after instantiating a T1-SR on our Edge Node cluster.

Now in Part 2 we will investigate what a south bound flow would look like with and without a T1 stateful service, and examine how this will affect routing paths.

We will start by looking at southbound traffic flows when there is no T1 stateful service. Refer to Topology A below.

Topology A: T0 in ECMP and a T1 router with no stateful services (T1 connected to T0 but the T1 is NOT associated with an Edge Cluster).

Let’s take a look at the BGP routing tables on the TORs to see how they are forwarding traffic to our T0-SRs across our Edges.

TOR1 BGP Routing table: Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *>i 0.0.0.0 192.168.1.100 0 100 0 ? *m 10.10.0.0/19 10.10.11.11 0 200 ? <-- Edge-A T0-SR *> 10.10.11.12 0 200 ? <-- Edge-B T0-SR TOR2 BGP Routing table: Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *>i 0.0.0.0 192.168.1.100 0 100 0 ? *m 10.10.0.0/19 10.10.12.11 0 200 ? <-- Edge-A T0-SR *> 10.10.12.12 0 200 ? <-- Edge-B T0-SR

We can see in the routing tables above that the TOR switches are installing multiple paths into their RIB to get to 10.10.0.0/19. Each TOR is peered with a T0-SR instance on Edge-A and Edge-B.

Keep in mind, these T0-SR/DR instances make up a single Tier-0 Gateway. See Figure A below.

Now that we have determined that the TOR switches will distribute traffic southbound across its T0-SR peers, let’s take a look at how the T0-SR forwards traffic.

Edge-A T0-SR Forwarding Table Logical Router UUID VRF LR-ID Name Type 33aa3f1f-c5cc-4f4f-9f34-37e800b0bbbd 3 8194 SR-T0-carrot SERVICE_ROUTER_TIER0 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 10.10.11.1 route f3a89727-7799-45ed-8ce3-19d4a0220911 00:50:56:90:27:5f 10.10.12.2 d8c6e18b-cdc0-4a86-8133-8ef432ef3f3b 00:50:56:90:ab:34 10.10.5.0/24 100.64.16.3 route b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

In the above forwarding table, I have highlighted the particular route we are interested in. This is a route to a segment on a T1 router which does not have a T1-SR. To understand exactly what this particular route is doing, we need the interface information for the T0 and T1. See interfaces below:

Edge-A T0-DR Interface: Interface : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 Ifuid : 395 Name : T0-carrot-T1-palpatine-t0_lrp Internal name : linked-395 Mode : lif IP/Mask : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64 MAC : 02:50:56:56:44:52 VNI : 71695 LS port : 8c925627-3de6-4cba-b099-f24d64bd446b Urpf-mode : PORT_CHECK Admin : up Op_state : up MTU : 1500 Edge-A T1-DR Interface: Interface : a58925eb-6130-4557-9d12-fa8582244971 Ifuid : 398 Name : T0-carrot-T1-palpatine-t1_lrp Mode : lif IP/Mask : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64 MAC : 02:50:56:56:44:55 VNI : 71695 LS port : 6d89b610-87ee-466c-b8d7-c307f6342813 Urpf-mode : NONE Admin : up Op_state : up MTU : 1500

For the particular route we are interested in:

10.10.5.0/24 100.64.16.3 route b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

Based on the forwarding table information, we can see that when the traffic is destined for the 10.10.5.0/24 network, the T0-SR will forward traffic to 100.64.16.3 via Interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

This is an interesting behaviour because interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 is the downlink on the T0-DR. This means the T0-SR is aware that to get to 100.64.16.x (inter-tier transit network), the traffic should first be forwarded to the T0-DR. This means the traffic would be forwarded out of the T0-SR’s bp-sr0-port across the intra-tier transit link to the bp-dr-port on the T0-DR. Once received, the T0-DR would forward it out of interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to the destination 100.64.16.3 IP. As we can see above this IP belongs to the intra-tier LIF interface of the T1-DR (Interface: a58925eb-6130-4557-9d12-fa8582244971).

Now we understand how the traffic is getting to the T1-DR within the Edge Node. Let’s look at how the traffic gets to its final destination.

Edge-A T1-DR Forwarding Table: Logical Router UUID VRF LR-ID Name Type 3e7f938b-8947-4b4a-a5a6-98b8c5dd1a89 9 15371 DR-T1-palpatine DISTRIBUTED_ROUTER_TIER1 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 100.64.16.2 route a58925eb-6130-4557-9d12-fa8582244971 10.10.5.0/24 route 1abf24d6-97af-4d1d-a764-048dd35a7aa1 10.10.5.1/32 route afa06616-5c43-5047-afb8-f41b18eac5bc Edge-A T1-DR Interfaces: Logical Router UUID VRF LR-ID Name Type 3e7f938b-8947-4b4a-a5a6-98b8c5dd1a89 9 15371 DR-T1-palpatine DISTRIBUTED_ROUTER_TIER1 Interfaces Interface : 1abf24d6-97af-4d1d-a764-048dd35a7aa1 Ifuid : 408 Name : infra-P-Segment-5-dlrp Mode : lif IP/Mask : 10.10.5.1/24 MAC : 02:50:56:56:44:52 VNI : 71697 LS port : c8baa8e5-99b6-4bfd-91a4-387dd9982df7 Urpf-mode : STRICT_MODE Admin : up Op_state : up MTU : 1500

We can see in the forwarding table above the T1-DR has a directly connected interface on the destination network (10.10.5.0/24) so when the traffic is received, the T1-DR will switch the traffic onto this network via interface 1abf24d6-97af-4d1d-a764-048dd35a7aa1. The traffic would then be encapsulated and sent out of the Edge Node VMs TEP to the specific Transport Node where the destination VM is running.

Now we understand North to South traffic when there is no stateful service. Let’s look at how a stateful service on a T1 changes the routing behaviour. See Topology B below.

Now we have met the following conditions on our T1:

– T1 must be connected to a T0

– T1 must be associated with an Edge Cluster

A T1 Service Router has been created in Active/Standby. The active T1-SR is running on Edge-B and the standby on Edge-A.

Even with the above conditions met, traffic flows from the TOR switches to the T0-SRs across our Edge Nodes and will still route in the same manner described at the start of this post (ECMP distribution across T0-SRs when heading southbound).

Let’s look at what will happen when the traffic comes in via the T0-SR on Edge-B:

Edge-B T0-SR Forwarding Table: UUID VRF LR-ID Name Type 5c57b700-f54c-4182-a638-0b13346dc2ff 1 11266 SR-T0-carrot SERVICE_ROUTER_TIER0 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 10.10.11.1 route edf37e34-b6ac-4443-b033-7376d81217ad 00:50:56:90:27:5f 10.10.12.2 ea488f49-8a9e-42f1-8602-2a001ea48dd1 00:50:56:90:ab:34 10.10.5.0/24 100.64.16.3 route b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 02:50:56:56:44:55 Edge-B T0-DR Interface Interface : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 Ifuid : 404 Name : T0-carrot-T1-palpatine-t0_lrp Internal name : linked-404 Mode : lif IP/Mask : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64 MAC : 02:50:56:56:44:52 VNI : 71695 LS port : c347a17a-0868-42e0-beb0-134e5b3031b4 Urpf-mode : PORT_CHECK Admin : up Op_state : up MTU : 1500

The T0-SR’s route to 10.10.5.0/24 goes via 100.64.16.3, which is an IP on the inter-tier transit network between the T0-DR and the T1-SR. To get there, it will first forward the traffic out of its intra-tier bp-sr0-port to the T0-DRs bp-dr-port. See information for these two interfaces below:

Edge-B T0-SR bp-sr-port: Interface : 55efebf5-a9fc-4216-b346-5f685178bbea Ifuid : 302 Name : bp-sr1-port Mode : lif IP/Mask : 169.254.0.4/25;169.254.0.3/25;fe80::50:56ff:fe56:5302/64;fe80::50:56ff:fe56:5301/64 MAC : 02:50:56:56:53:01 VNI : 71687 LS port : 257e8903-6126-4290-9dc5-41aa9a04de3e Urpf-mode : NONE Admin : up Op_state : up MTU : 1500 Edge-B T0-DR bp-dr-port: Interface : 105bca47-9671-4079-89cb-e00936764916 Ifuid : 305 Name : bp-dr-port Mode : lif IP/Mask : 169.254.0.1/25;fe80::50:56ff:fe56:4452/64 MAC : 02:50:56:56:44:52 VNI : 71687 LS port : 8d7dd4fc-b670-4448-a01f-2cd5c66a1ed1 Urpf-mode : PORT_CHECK Admin : up Op_state : up MTU : 1500

Once the traffic is received from the T0-SR, the T0-DR will forward the traffic out of its interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 over the inter-tier transit network to the T1-SRs LIF IP interface: 100.64.16.3

We can see the interface information for the destination T1-SR below.

Edge-B T1-SR Interface: Logical Router UUID VRF LR-ID Name Type 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3 14 15373 SR-T1-palpatine SERVICE_ROUTER_TIER1 Interfaces Interface : a58925eb-6130-4557-9d12-fa8582244971 Ifuid : 407 Name : T0-carrot-T1-palpatine-t1_lrp Mode : lif IP/Mask : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64 MAC : 02:50:56:56:44:55 VNI : 71695 LS port : 6d89b610-87ee-466c-b8d7-c307f6342813 Urpf-mode : NONE Admin : up Op_state : up MTU : 1500

The traffic has been received on the T1-SR’s a58925eb-6130-4557-9d12-fa8582244971 interface. Let’s look at the T1-SR’s forwarding table below to see how it will handle the next hop:

Edge-B T1-SR Forwarding Table: Logical Router UUID VRF LR-ID Name Type 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3 14 15373 SR-T1-palpatine SERVICE_ROUTER_TIER1 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 100.64.16.2 route a58925eb-6130-4557-9d12-fa8582244971 02:50:56:56:44:52 10.10.5.0/24 route 1abf24d6-97af-4d1d-a764-048dd35a7aa1 10.10.5.1/32 route afa06616-5c43-5047-afb8-f41b18eac5bc 100.64.16.2/31 route a58925eb-6130-4557-9d12-fa8582244971 100.64.16.3/32 route 54eb6c65-1d20-5930-a1a0-e168a47873b6 127.0.0.1/32 route 588f2add-9d3e-4555-9739-69261609fb50 169.254.0.0/28 route e17fef2f-e383-4303-b619-beaed7bfe946 169.254.0.1/32 route afa06616-5c43-5047-afb8-f41b18eac5bc 169.254.0.2/32 route 54eb6c65-1d20-5930-a1a0-e168a47873b6 Edge-B T1-DR Interface Interface : 1abf24d6-97af-4d1d-a764-048dd35a7aa1 Ifuid : 417 Name : infra-P-Segment-5-dlrp Mode : lif IP/Mask : 10.10.5.1/24 MAC : 02:50:56:56:44:52 VNI : 71697 LS port : c8baa8e5-99b6-4bfd-91a4-387dd9982df7 Urpf-mode : STRICT_MODE Admin : up Op_state : up MTU : 1500 Edge-B T1-SR Intra Tier Transit Interface: Interface : e17fef2f-e383-4303-b619-beaed7bfe946 Ifuid : 443 Name : bp-sr0-port Mode : lif IP/Mask : 169.254.0.2/28;fe80::50:56ff:fe56:5300/64 Edge-B T1-DR Intra Tier Transit Interface Interface : aca94d8e-684e-44cc-bc59-ac612a74a400 Ifuid : 408 Name : bp-dr-port Mode : lif IP/Mask : 169.254.0.1/28;fe80::50:56ff:fe56:4452/64

The T1-SR will forward the traffic out of its local bp-sr0-port across the intra-tier transit network, and the T1-DR will receive the traffic on its bp-dr-port. The T1-DR will switch this traffic onto its LIF Interface: 1abf24d6-97af-4d1d-a764-048dd35a7aa1 (directly connected interface on the 10.10.5.0/24 network). The traffic is then encapsulated and sent across the appropriate Geneve Tunnel to the specific transport node where the destination VM resides. See Topology B - Path B below for visualisation.

Transport Nodes (Including Edge VMs) hold a MAC table per segment. When traffic is to be forwarded to another Transport Node a MAC table lookup will occur to see which tunnel the traffic will be forwarded across. I dumped an example MAC table below:

Edge-B Logical Switch "M-Seg-2" MAC-Table: MAC : 00:50:56:90:3d:05 <--- MAC address of the destination virtual machine Tunnel : a426178f-be9a-5c45-abe3-0840f6fd6205 <-- Unique Tunnel ID IFUID : 350 LOCAL : 10.10.9.60 <--- Edge-B TEP IP REMOTE : 10.10.9.56 <--- Destination Transport Node TEP IP ENCAP : GENEVE

Now we understand what will happen when southbound traffic is received via the Edge Node with the active T1-SR instance. Let’s look at how this changes when southbound traffic is received on Edge-A while the active T1-SR instance resides on Edge-B.

We will start by looking at the forwarding table of the T0-SR on Edge-A:

Edge-A T0-SR Logical Router UUID VRF LR-ID Name Type 33aa3f1f-c5cc-4f4f-9f34-37e800b0bbbd 3 8194 SR-T0-carrot SERVICE_ROUTER_TIER0 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 10.10.11.1 route f3a89727-7799-45ed-8ce3-19d4a0220911 00:50:56:90:27:5f 10.10.12.2 d8c6e18b-cdc0-4a86-8133-8ef432ef3f3b 00:50:56:90:ab:34 10.10.5.0/24 100.64.16.3 route b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

In the forwarding table above, the T0-SR is forwarding traffic via interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to IP 100.64.16.3 to get to the 10.10.5.0/24 network. Let’s take a look at the interfaces below:

Edge-A T0-DR Interface: Interface : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 Ifuid : 395 Name : T0-carrot-T1-palpatine-t0_lrp Internal name : linked-395 Mode : lif IP/Mask : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64 MAC : 02:50:56:56:44:52 VNI : 71695 LS port : c347a17a-0868-42e0-beb0-134e5b3031b4 Urpf-mode : PORT_CHECK Admin : up Op_state : up MTU : 1500 Edge-B T1-SR (Active) Interface : a58925eb-6130-4557-9d12-fa8582244971 Ifuid : 407 Name : T0-carrot-T1-palpatine-t1_lrp Mode : lif IP/Mask : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64 MAC : 02:50:56:56:44:55 VNI : 71695 LS port : 6d89b610-87ee-466c-b8d7-c307f6342813 Urpf-mode : NONE Admin : up Op_state : up MTU : 1500

From the information above, we know that the traffic is being forwarded from the T0-SR out of the T0-DR interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to get to IP 100.64.16.3/31 (Gateway IP for this particular route). To do this, the T0-SR would forward the traffic out of its bp-sr0-port across the intra-tier transit network and the T0-DR receives the traffic on it’s bp-dr-port.

Once received, traffic is then forwarded out of the interface mentioned above (b8b36c5d-c85e-4cb3-9d63-1f69b97cc397), across the inter-tier transit network to the active T1-SR on Edge-B with LIF interface IP: 100.64.16.3.

Let’s take a look at the T1-SR forwarding table below to see how it will handle the traffic it has just received:

Edge-B T1-SR (Active) Logical Router UUID VRF LR-ID Name Type 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3 14 15373 SR-T1-palpatine SERVICE_ROUTER_TIER1 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gateway MAC 0.0.0.0/0 100.64.16.2 route a58925eb-6130-4557-9d12-fa8582244971 10.10.5.0/24 route 1abf24d6-97af-4d1d-a764-048dd35a7aa1 10.10.5.1/32 route afa06616-5c43-5047-afb8-f41b18eac5bc

In the forwarding table on the T1-SR it has a route to 10.10.5.0/24 via interface 1abf24d6-97af-4d1d-a764-048dd35a7aa1. If we look at the interface details below we can see that this is the LIF interface on the T1-DR which connects to the 10.10.5.0/24 segment where our destination virtual machine resides.

Edge-B T1-DR Interface : 1abf24d6-97af-4d1d-a764-048dd35a7aa1 Ifuid : 417 Name : infra-P-Segment-5-dlrp Mode : lif IP/Mask : 10.10.5.1/24 MAC : 02:50:56:56:44:52 VNI : 71697 LS port : c8baa8e5-99b6-4bfd-91a4-387dd9982df7 Urpf-mode : STRICT_MODE Admin : up Op_state : up MTU : 1500

The T1-SR will forward the traffic out of its backplane interface (bp-sr0-port) to the T1-DR’s bp-dr-port. T1-DR will then switch the traffic onto the 10.10.5.0/24 network as it has a directly connected interface there.

The MAC table lookup occurs, and determines where traffic is to be forwarded and which Geneve Tunnel the traffic will be sent across to the destination Transport Node. Below, Diagram D visualises the traffic flow we have just walked through.

TL;DR Summary of our investigation in Part 2:

Topology A - Path A and B: Southbound flows were distributed from the TOR switches across the T0-SR instances on the Edge Nodes. The routing for the T0 and T1 occurred locally within the edge on which the traffic was received. Traffic was then forwarded to the Transport Node where the destination VM was running. Visualisation below.

Topology B - Path B: Southbound traffic was received on the Edge Node where the active T1-SR was running (Edge-B). The T0 and T1 routing all occurred locally within the edge where the traffic was received. It was then forwarded to the transport node where the destination VM was running. Visualisation below.

Topology B - Path A: Southbound traffic was received on the Edge Node where the standby T1-SR was running (Edge-A). The T0 performed the appropriate routing and the traffic was then forwarded to the active T1-SR running on Edge-B. A route lookup occurred, it was forwarded to the T1-DR, switched onto the appropriate segment and then forwarded to the Transport Node where the destination VM was running. Visualisation below.

In Part 3 we will take a look at an alternate design where T1-SRs are in use at scale.