7 minute read

When trying to speed up your PHP web applications, it sure helps to know where the bottle necks are. That’s where Xdebug and Q Cache Grind can help: they can tell you which functions and class methods are taking up the most time and memory, how many times they’re called, and who called them. With this information, it’s far easier to speed up your code than “shooting in the dark” by just trying to optimize everything.

In this post, I’ll show you how I setup Xdebug and Q Cache Grind on my Windows system running Laragon. Then I’ll show how I go about actually using it to identify a bottleneck, and modify the code to improve performance.

My Environment: Laragon on Windows

Given I need a Windows machine in order to play StarCraft 2, I’m on Windows. I used to use a virtual machine so I could pretend to be a unix machine, but found it painfully slow. Now I rock Laragon. If you haven’t heard of it, it’s basically XAMPP except awesomely fast, comes with tons of the cool tools the kids are using these days (like node.js and NPM), and its just so easy to use. Here’s my video showing how to set it up with WordPress in under 2 minutes.

2 Minute setup of Laragon for running PHP applications locally.

Having said how much I appreciate Laragon, most of these instructions will probably work with other setups.

One gotcha with setting up Laragon: it’s best to install somewhere “higher up” in your directory structure, like “C:/laragon”, instead of “C:/software/mysoftware/extrafolder/php/laragon.” That’s because someday you’ll probably need to use NPM, and it creates tons of nested sub-folders, which is a problem in Windows. Because while Linux and MacOS have no problem with super long folder paths (those lucky devils), Windows dies if a folder’s path is longer than about 250 characters. So just put Laragon in a folder that’s higher up.

Setting up XDebug

After setting up Laragon, you need to enable Xdebug. It’s like two steps (if everything goes according to plan). That’s what will record your application’s performance.

Now, you don’t want to run Xdebug all the time. While it’s handy, it also slows everything by about half. So we’ll only want to run it when we’re actually wanting to use it.

FYI here’s the relevant part of my php.ini file:

xdebug.remote_enable=1 xdebug.remote_autostart = 1 ;zend_extension = E:\Software\laragon\bin\php\php-7.1.19-Win32-VC14-x64\ext\php_xdebug-2.6.0-7.1-vc14-x86_64.dll ;xdebug.profiler_enable=1 xdebug.profiler_output_dir=E:\Software\laragon\data\cachegrind xdebug.profiler_output_name=callgrind.%t_%R.out

That bit of php.ini says to enable Xdebug, start it automatically, don’t enable profiling just yet, but when profiling is enabled, put the generated files into E:\Software\laragon\data\cachegrind\ (you can put them anywhere else you like) and name them something like callgrind...out .

Setup Q Cache Grind

Next, download Q Cache Grind. That’s the program we’ll use to analyze Xdebug’s output regarding application performance.

(In full disclosure: Q Cache Grind might be the same thing a K Cache Grind. Their documentation seems to use the term intercheangeably. Go figure.)

Unzip it somewhere you’ll remember. I have a folder called E/Software where I put all my stuff regarding software development, and that’s where I put it.

Remember where you put it, you’ll use this in a bit for reading the cache grind files.

Run the Profiler and Generate The Cache Grind File

Ok, now that everything’s setup, start the profiler.

By the way, expect everything to run about several times slower than usual when you activate the profiler. And it will generate a TON of data. So only use it sparingly. (For example, in WordPress it’s probably good to temporarily disable the heartbeat API.)

Open up php.ini again, and remove the ; in at the beginning of the line ;xdebug.profiler_enable=1 . That tells PHP to enable the profiler and start making Cache Grind files on every request it receives. Restart your webserver (with Laragon, just click the cute elephant twice, or “Stop” then “Start”.

Laragon management window.

Next, make sure Xdebug is active for you. In Laragon, you can toggle it on or off by clicking Menu, PHP, Quick Settings, then making sure “xdebug” is checked.

Enabling Xdebug in Laragon

Once Xdebug with the profiler is enabled, just then visit your webpage, or submit the form, or make your webserver/script/website do whatever it is you want it to do.

The request should take a lot longer than usual. And a new cache grind file should have been created in E:\Software\laragon\data\cachegrind (or whereever you specified in the php.ini file.)

Once you see the file, turn Xdebug off through Laragon’s UI, and put that ; back in the php.ini file from earlier.

View the Cache Grind File

Remember where you unzipped Q Cache Grind? Open that up and find the file qcachegrind.exe . Run it.

Q Cache Grind’s initial screen.

Then go File, then Open and navigate to where the cache grind file was that you just generated. Open it. You’ll see something like this.

Initially crazy confusing Cache Grind output

It’s pretty overwhelming at first. But here’s what’s going on…

The first column, “Incl.” .is the “Inclusive Wall Time”. That the percent of the request’s total time is spent inside that function, more-or-less. So the “main” process takes up about 100%, which makes sense (ok it’s off by 0.5%… it’s free software, you can’t complain.) The next one, “cycle 3” was probably some type of PHP internal function call too. The next few all show how long it took to execute require_once for a particular file. And finally, at EventEspresso\core\services\loaders\CachingLoader->load we see a regular function call.

The second column, “Self”, shows how long that particular function took just operating on basic PHP types, not calling any other functions you’ve defined (ie, how long it, itself, took, as opposed to time it spent waiting on functions that it called.)

The third column, “Called”, shows how many times that function wsa called.

And the fourth column, “Function”, is self-explanatory.

Analyzing the Cache Grind Data

Now the important question: so what? How do we use this information to figure out how to speed up our code? Here’s what I do:

Look for the first few function that have the largest “Self” value. Those are your CPU hogs. The ones with the largest “Inclusive Wall Time”, but low “Self” time, can blame that they spend all their time waiting on other functions to complete, so they’re not really the problem makers.

Here’s the problem makers I see in the data:

The two biggest “problem functions” that are eating up the most CPU time.

The functions EventEspresso\core\services\collections\LooseCollection->has ,which is using up 8% of the total request’s time all by itself, not including the time it’s waiting on other functions to return, and EventEspresso\core\services'collections\LooseCollection->get , which is using up about 5.5% of the total request’s time. What are these functions doing that’s so time consuming? Let’s look at the code…

Here’s has :

public function has($identifier) { $this->rewind(); while ($this->valid()) { if ($identifier === $this->getInfo()) { $this->rewind(); return true; } $this->next(); } return false; }

To fully understand what the function is doing you’d need to open up the Event Espresso codebase, but basically it’s looping over data until it finds what it’s looking for, $identifier .

Here’s get :

public function get($identifier) { $this->rewind(); while ($this->valid()) { if ($identifier === $this->getInfo()) { $object = $this->current(); $this->rewind(); return $object; } $this->next(); } return null; }

It’s basically the same: looping over data until it finds what it’s looking for.

We might be inclined to blame the functions they’re both calling, rewind , valid , current , or next , but remember these functions have long “Self” time, meaning they take a long time to execute WITHOUT taking into account all the function calls they wait on.

Besides, we can look at the “Callees” tab at the bottom. That tells us how much of the total request the functions it calls take up.

Interesting data from the “callers” and “callees” tabs

The functions it calls, EventEspresso\core\services\collections\LooseCollection->valid , EventEspresso\core\services\collections\LooseCollection->getInfo , and EventEspresso\core\services\collections\LooseCollection->next all have times under 0.1% of the request. So none of those are the problem, despite being call over 11,000 times each. (Note, the time shown for the callees is the total amount of time they take for the ENTIRE request too— it’s not the time taken only when they’re called from the has function we were analyzing.)

Improving Our Code

If we could bring down the “self” time for the has or get functions mentioned, we could shave off almost 14% of the total request time. How could we do that?

We need to try to make sure it’s not executing any unnecessary code. For example, if the functions weren’t already return ing right after they found the item they were looking for, that could improve performance quite a bit. But they’re good that way, they return as soon as they found the data they wanted.

Another idea is to see if you can do any type of caching, even during the lifetime of the request. For example, if we could use a standard PHP array to “remember” entries when they’re found, and looking up entries in an indexed PHP array would surely be much faster than looping over an entire collection.

Really, I would just use different data structure. A PHP array instead of this LooseCollection class.

Conclusion

Profiling your applications actual performance is very enlightening. There’s often a lot of developer time wasted optimizing code that only accounts for a couple milliseconds of the request; meanwhile there’s lots of users’ time wasted waiting for unimportant functions doing unnecessary tasks, or doing them in an unnecessary way.

For example, when we at Event Espresso added the perpetrating class EventEspresso\core\services\collections\LooseCollection , we actually replaced code that was using normal PHP arrays with these Collection classes as we had heard they were much more efficient. Well maybe they were in some theoretical benchmarks, but in this application they turned out to be many orders of magnitude slower. Oups!

Anyways, best luck profiling your PHP! Let me know if you have any questions!

Share this: Reddit

Twitter

Facebook

LinkedIn

