Symfony Benchmarks: Scaling PHP by adding CPU & RAM

In the previous article in this series we took a look at how different runtimes affect Symfony performance, by comparing PHP 5.6, HHVM 3.11 and PHP 7.0.1. The conclusion was that both HHVM and PHP 7 offer significant improvements in performance without adding server resources. In this article we'll look at how adding them affects performance.

The simplest way to improve processing performance is to make a single CPU ever faster, but diminishing increases in single core performance have been the norm for over a decade now. This is why hosting lingo focuses on number of CPU cores, rather than the clockspeed.

Luckily PHP is well prepared for scaling by adding CPU and other resources by nature. It is relatively straightforward to scale just by throwing more resources at it:

The shared-nothing architecture of PHP where each request is completely distinct and separate from any other request leads to infinite horizontal scalability in the language itself.

-- http://techpatterns.com/forums/about567.html

Adding multiple servers does add complexity, but adding CPU and RAM to a single virtual server is nowadays drop-dead simple. Doubling the available CPU won't make individual requests run twice as fast, but in theory will allow you to serve twice as many requests at the same time.

That is the theory, but let's see how that holds in practise with some tests.

Scaling PHP by adding CPU cores

We will use the same base eZ Platform setup as in the earlier benchmarks, with an added 4 GB of RAM. A similar round of benchmarks is ran to see how the number of CPU cores relates to actual application performance. Try free trial for high performance VPS on UpCloud.

The server is operating with PHP 7.0.1 on PHP-FPM and Nginx 1.9.9 for all of the tests. For added details I ran passes with different max_children values for PHP-FPM to see if, and how, that affects performance. As before tests are repeated three times and average values are reported.

Front Page without Symfony Proxy

1 CPU Core

2 CPU Core

4 CPU Core

8 CPU Core

Average requests per CPU core (Concurrency of 10)

For unproxied full page results the results are as expected. Growing rather linearly with added CPU resources. A Maximum of 5 child PHP-FPM processes seems to be the best option until hitting 8 Cores, after which 10 child processes is the clear winner. Max 20 PHP-FPM children offers no advantage.

Performance peaks at 161 req/s with 8 CPU cores, 10 child processes and 50 concurrent users. Excluding concurrency of 1, per CPU core results range from 24 req/s at 1 Core to 19.625 req/s at 8 Cores. This illustrates that there is added overhead, other bottlenecks or room for configuration optimisations.

Front Page with Symfony Proxy

1 CPU Core

2 CPU Core

4 CPU Core

8 CPU Core

Average requests per CPU core (Concurrency of 10)

Proxied page results are inline with high-level expectations. On a single and dual core setups the highest output is delivered at a concurrency of 50 requests, where with 4 and 8 cores this evens out. Max children value of 10 is the by average, though with 8 cores 20 children offer a slight advantage .

Performance peaks at 8912 req/s with 8 CPU cores, 20 child processes and 50 concurrent users. Excluding concurrency of 1, per CPU core results range from 1323 req/s at 2 Core to 936.75 req/s with 8 Cores and 10 concurrent requests.

At concurrency of 10, 8 cores offer significantly worse performance per unit, where as at 50 a dual core setup proves more efficient than single core.

API without Symfony Proxy

1 CPU Core

2 CPU Core

4 CPU Core

8 CPU Core

Average requests per CPU core (Concurrency of 10)

Unproxied API calls, again are largely inline with expectations. Even on one CPU core increasing concurrency yields the best performance, while 10 child processes are optimal. The previous applies to dual core setups as well, but at 4 cores all child process settings offer virtually identical results. Naturally at 8 cores 5 child settings falls behind.

Performance hits highs at 567 req/s with 8 CPU cores, 20 child processes and 50 concurrent requests. Excluding concurrency of 1, per cpu core results range from 84 req/s with a single core setup. Falling to a mere 66.375 req/s per core in an 8 core setting indicates significant bottlenecks elsewhere in the setup.

API with Symfony Proxy

1 CPU Core

2 CPU Core

4 CPU Core

8 CPU Core

Average requests per CPU core (Concurrency of 10)

Results for proxied API calls remain largely consistent. Notably concurrency of 50 yields the highest results regardless of core count. For child processes 10 is the best overall value, with a curiously significant drop for 20 child processes with 8 cores. This likely indicates an underlying issue that now surfaces because of tiny transfer payload and short processing time.

A combination of 8 CPUs, 10 child processes and a concurrency of 50 takes the performance crown with 9758 req/s. Excluding concurrency of 1, per CPU core results range from 1364 req/s at with 1 core to 1032.125 req/s at 8 cores. Again a significant drop that displays that CPU scaling is not linear by default.

Scaling PHP by adding RAM

Adding processing capacity is quite straight forward to understand when your application is CPU bound with enough RAM at it's disposal. For memory-strapped environments it's expected that due to less swapping the speedup will be significant.

What about excess RAM? Can you have too much of it? Linux servers utilise memory efficiently, as unused memory is wasted memory. Let's see how our example application behaves when adding more RAM in the UpCloud environment.

The test starts from a mere 512 Megabytes to a total of 8 Gigabytes. All while keeping CPU Core count at 8 and PHP-FPM process count at 10.

Front Page without Symfony Proxy

With unproxied, CPU heavy page loads the 0.5 GB setup falls behind once concurrencies go higher. Likely starved of memory. After remaining rather stable for 1, 2 and 4 GB the numbers get a small, but noticeable boost at 8 GB for one reason or another.

Front Page with Symfony Proxy

With proxied page loads with a large payload the 0.5 GB setup stays closer to the setups with more generious amounts of RAM. Curiously the 8 GB setup consistently falls behind the 1, 2 and 4 GB counterparts. I'm speculating that this has something to do with the hosting environment architecture rather than the Linux server itself.

API without Symfony Proxy

Similar to the unproxied page loads, the unproxied API results clearly indicate that the 0.5 GB setup falls behind. And again the 1, 2 and 4 GB setups are very stable where as there is again a noticeable bump upwards at 8 GB.

API with Symfony Proxy

For the low processing, high throughput proxied API calls the results somewhat unexpected. The 0.5 GB setup falls somewhat behind on higher concurrencier, but for concurrency 10 results are rather strange. This could indicatye something in the underlying hosting architecture as the numbers even out again at fifty concurrent requests.

Conclusions

As expected, scaling CPU bound performance by adding provides the expected results. The results don't grow linearly, but are an easy way forward when the PHP runtime tweaks and application optimisations still have untried paths.

Increasing the number of PHP-FPM max_children value has only limited effect on performance. Tweaks can be done, but unless your number of children in below the number of CPUs, don't expect significant improvements with increases.

It is fine to get warnings such as this one in your logs occasionally:

[28-Dec-2015 21:59:39] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

Rather than trying to avoid them completely by increasing max_children setting. A greedy max_children can lead to PHP-FPM processes end up hogging all the available RAM and starting to kill processes, such as mysql daemon in this case:

/var/log/kern.log:Dec 28 22:39:09 xmasbench kernel: [ 2097.228319] Out of memory: Kill process 806 (mysqld) score 173 or sacrifice child

Scaling PHP by adding more memory is more of a mixed bag. Obviously if you are unable to run your PHP applications with the balance of CPU, you'll need more. In our example case memory usage remains rather low and 1 GB seems enough according to the benchmarks.

Adding memory beyond requirements did seem to have a boost at highest CPU and RAM counts (8 GB, 8 CPU) when your application is CPU bound with low transfers. But anomalities in results surfaced when low processing rates generated higher data throughput.

The tests were done in a shared hosting environment, which can behave unexpectedly and these RAM anomalies might be completely absent when benchmarking a physical controlled server environment.

In the next article we'll take a look at how caching with Varnish compares to the Symfony Proxy.

Written by Jani Tarvainen on Friday January 1, 2016

Permalink - Tags: php, php-fpm, ram, cpu, core, scaling, benchmark