© Hervé Bry — https://www.flickr.com/photos/setaou/2935943672/

Was it slow? Not at all. In 2014, Nikita Popov published an inspiring blog post entitled Fast request routing using regular expressions. The article explains how one can match HTTP routes at very high performance, by combining them together in bigger regular expressions. Its conclusion rightfully reminds us that routing will usually not be a bottleneck in your apps, but also hints some of us are building high throughput HTTP servers in PHP, ending with this sentence: “If you tried to put the Symfony router behind such a server, it would totally cripple your performance.” This doesn’t have to be true.

The Symfony Routing component is a general purpose URL matcher and generator. It has zero dependencies, and while it is well integrated into the Symfony Framework, many use it standalone in their PHP apps. Basically, you give it a URL and an HTTP verb, and it returns back an array that qualifies the route that matched (name, variables, controller, etc.), if any. Reciprocally, you give it a route name, and it gives back a URL you can put in your links.

What makes it apart from any other PHP routing packages is that it is by far the most used one, with 75k daily downloads, and 47M total downloads at the time of writing. This doesn’t come out of nowhere: along the years, by requests and contributions of the community, it gained a battle-tested feature set, with a proven track record on your real world apps. For this reasons, its performance focused the attention of many minds, which provided advanced optimizations, like code generation or same-prefix reordering. This made it fast enough for almost any apps. But if you happened to need an even faster URL matcher, you had to give up some features and move on to using e.g. Nikita’s FastRoute. Until now.

The lessons from FastRoute

Nikita’s blog post shows how grouping several regular expressions into a bigger one improves performance. Take for example these 3 regular expressions:

#^/foo/([^/]+)/(\d+)$#

#^/bar/(\d+)$#

#^/foi/([^/]+)$#

You can match them much faster in a single step using:

#^(? |/foo/([^/]+)/(\d+)

|/bar/(\d+)

|/foi/([^/]+)

)$#x

The (?| is no mistake: it defines a non-capturing group where capturing group numbers are reset in each alternatives. This means that the captured group at index 1 will be the one of the alternative that matched.

The article then goes into a lengthy discussion to find a way to know which alternative matched. But if you read FastRoute’s source code, you’ll find another PCRE gem: you can name your patterns using the special (*MARK:name) syntax. This fully solves the problem and works well up to several hundreds of routes:

The last lesson we can put in our bag reading FastRoute’s source code is that static routes (i.e. with no placeholders) are better outside the regular expression; in a plain simple PHP array. Matching a route is then basically a two steps process: first an isset() check on that array, then a preg_match() against our big regex. Done. That’s all we need to take from Nikita’s work, and that’s huge! How can we apply these to Symfony?

The Symfony router feature set

Solely matching a URL is not enough to route an HTTP query. The Symfony router allows adding extra constraints, using the HTTP verb of course (GET, POST, etc.), but also the protocol (http/https), the host (useful for i18n) or just a custom expression based on the incoming request. Last but not least, routes have priorities and must be matched in order. That’s the most serious difficulty we’ll have to deal with: it means the same URL can match the pattern of two or more routes. But when the first one has extra constraints that don’t match, we have to check the next ones, and so on.

Let’s merge these requirements with the ideas provided by Nikita.

Matching static routes first

The first optimization we want to do is move static routes out of any regex, to match them using a simple hash-map lookup. This can only be partially achieved, because of the ordering constraint: we loop through our route collection, and for any routes that have no placeholders, we check if they match any of their previous routes. If they don’t, we can move them outside the regex. If they do, we can’t reorder (this doesn’t have to be a dead route, as there might be extra requirements to the previous ones.)

Since several static routes can match the same URL, or have custom conditions, we need to split this set in two: the simplest independent routes are put in an array, the other ones in a switch statement. Here is a simplified example output:

Grouping same-URL routes in “case” statements allows checking each route’s extra constraints in order. For the other ones, generating this potentially big $routes array (line 16) perfectly leverages OPcache’s immutable array optimizations. Since PHP 7.2, the switch also triggers an hash-map lookup. There is no faster instruction to do this in PHP (on lower versions, the comparisons are still really faster than a regex check.)

We’re going to use a very similar “switch” statement for non-static routes.

Matching non-static routes

The remaining routes need to be checked using a regular expression. Actually, we may need several of them, because PCRE doesn’t allow infinitely long regular expressions, nor does it allow matching with the Unicode modifier alternatively on and off in the same regex (that’s another feature of the Symfony router: matching using Unicode requirements.) For these reasons, we may need to split some routing tables in several chunks. This should be pretty rare in practice. Let’s focus on the single regex case.

We can now reuse the grouping logic from Nikita to compute merged regular expressions. To jump to the next step immediately, when routes have a constraint on their HTTP host, we can match both the host and the pathinfo in the same regex. Omitting the start/end of the regex:

|(?i:my\.host\.com)/foo/([^/]+)/(\d+)(*MARK:pattern1)

|(?i:my\.host\.com)/bar/(\d+)(*MARK:pattern2)

|(?i:[^/]+)/foi/([^/]+)(*MARK:pattern3) # host-less route

If we match this regex against $host.$pathinfo , we check the host and the path, figure out which pattern matched, and extract placeholders in one single operation. Based on the MARK captured index, we can then “switch” to the appropriate “case” to check additional constraints and decide which route matched, if any. The generated switch is very similar to the previous one:

Exactly as for static routes, this allows grouping same-pattern routes together, and using an immutable array for the other ones. But there is additional complexity to non-static routes.

Dealing with second-try matching

While static routes are exclusive to each others, non-static ones are not: the same URL can match two different patterns. Here is an example:

|/foo(\d+)(*MARK:pattern1)

|/foo(\w+)(*MARK:pattern2)

Let’s say pattern1 maps to route1 , but route1 has an additional constraints to match only POST requests (or any other type of constraints). Let’s now match /foo123 with a GET : preg_match() will report that pattern1 matched, but the extra constraints will reject it. How can we run the regex again, but skip pattern1 , since we know it’s a miss? PCRE provides an alternative breadth-first matching algorithm that could be used to match several patterns at the same time. Unfortunately, this is out of reach: the related API is not exposed to PHP devs.

The solution is to blacklist pattern1 by replacing its named mark by an always-failing assertion. An always-matching negative assertion would do the job (?!) , but I preferred to use the more tailored (*FAIL) one. So, basically, we search-and-replace the blacklisted pattern mark at runtime. Then we run preg_match() again, until we run out of potential matches:

|/foo(\d+)(*FAIL)

|/foo(\w+)(*MARK:pattern2)

In practice, this search-and-replace operation should be as fast as possible. Instead of using str_replace() , we’re going to use substr_replace() . This requires us to know beforehand at which offset exactly is the blacklisted mark in the regex string. Here is the trick: instead of using “pattern1” as the name of our mark, we’re going to use the offset:

|/foo(\d+)(*MARK:123)

|/foo(\w+)(*MARK:136)

Here, 123 is also the offset of the first M in the regex. 136 — 123 = 13 is the number of bytes between this M and the second one. If we happen to need to blacklist pattern 123 , what we need to do is:

$m = $matches['MARK'];

$regex = substr_replace($regex, 'FAIL', $m, 5 + strlen($m));

Let’s be clear: such second-try matching are quite rare in practice. We just have to deal with them to fulfill our feature set.

End of the story — part 1/2

We just explained how Nikita’s work can be ported to the Symfony router while keeping all our advanced features. We know how to build a URL matcher that uses a really fast hash-map lookup for static routes, and a single combined regular expression for non-static ones most of the time, while preserving in-order matching.

Did we improve anything? To have some reference while working on this, I created a simple benchmark, with 800 routes, half of them static, the other half non-static:

Before any optimizations, matching 10.000x took 777ms for the last static route, and 897ms for the last non-static one. For reference, the same with FastRoute takes 2ms (!) and 155ms respectively. After the described optimizations, these numbers dropped to 2ms also, and 65ms respectively. That’s 390x and 12x faster. Wow!

How did we achieve being 2x faster than FastRoute? Chunking. FastRoute matches no more than 30 routes per regex, while we match in one call. Could be fixed on their side for sure.

Does it work on real world apps? That’s when I love the Symfony community. You come up with supposedly clever ideas, and someone quickly reminds you the world is more complex than you’d expect. This was provided by David Maicher, who tried this new router on his 540-routes app, and experienced a slow down. Fixing this slow down is what provided us the additional optimizations to fill the gap with the title of this article: matching 77.7x times faster.

See Making Symfony router lightning fast - 2/2 if you want to know how.

P.S.: please Retweet if you liked this article, and don’t hesitate to follow me on Twitter, GitHub and/or here on Medium. ;-) Thanks for reading.