Update: The above version of the original code snippet I posted uses functools' lru_cache (least recently used) which is not only more concise but also provides the option to control the size of the cache. Thanks to cyanydeez for the suggestion.

On a recent project I was asked to help optimise the server-side render time for a page displaying a 200+ carpet swatches (I live on the edge I'm telling you).

The tech stack involved in this case was Flask > Mongoframe > Jinja2, and the page that was slow to render can be seen here: https://www.brintons.co.uk/carpets

I expected the performance issues to be shared somewhat between querying the database (via Mongoframes) and rendering the template (Jinja2), however, after added some basic profiling to the view I was surprised to see that several hundred calls to url_for were adding up to a significant amount of time - the bottleneck was actually with Flask.

To resolve this issue I implemented the code above to provide a version of url_for that caches the output.

Benchmarks

I put together some basic benchmarks, however take these with a pinch of salt as the implementation used to generate them was very simple (I've included the code at the end of the article).

Function Avg. time (in secs)

for 200 calls url_for 0.0189 cached_url_for 0.0006 cached_url_for (pre-populated*) 0.0003

* For this result I first made sure the cache was populated.

Benchmark results

The results show that the caching_url_for function is (once populated) approximately 60x faster.

But wait we're only saving a little under two hundredths of a second here! My guess is that this is down to the fact I'm running the benchmark against a site with just a handful of registered rules, on a site with 10s or 100s of rules (like Brintons.co.uk) the performance difference is greater because rules are built/matched by looping over all the registered rules until you find one that matches (from my interpretation of the code) - this also means some endpoints would be found faster than others.

On a large and busy website with lots of concurrent visitors this seems like a simple to implement and worthwhile optimisation. However, if anyone can spot a flaw in my thinking/code here I'd love to hear from you.

Benchmark code