With the 5.2 release basically done, I decided to do some performance investigation and optimizations on KWin last week. From time to time I’m running KWin through valgrind’s callgrind tool to see whether we have some expensive code paths. So far I hadn’t done that for the 5.x series. Now after the switch to kdecoration2 I was really interested in the results as in the past rendering the decoration used to be a bottle neck during our compositing rendering loop.

Unfortunately callgrind doesn’t give us a good look on the performance of KWin as it neither includes GPU times nor roundtrips to the X server. Nevertheless it gives us a good look on our own CPU usage. I was rather surprised by the result as I didn’t find anything which looked bad. Nevertheless I was able to slightly optimize one method which is called whenever the X11 stacking order is changed by improving an internal algorithm which didn’t scope well with the larger than expected number of child windows of the root window.

But callgrind output wasn’t the only performance relevant thing I looked into. I investigated a really interesting bug report about the screen freezing for a short time when a new window opened. While I wasn’t able to reproduce the issue as is, I was able to reproduce a small freeze whenever a Qt 5 application opened. Interestingly only with a Qt 5 application. So I ran the same application in a Qt 4 and Qt 5 variant and only in the latter I got a freeze. Investigation showed pretty quick that KWin is not to blame, for one I got the freeze before KWin started to manage a window for it and I was able to reproduce with different window managers. With the help of xtrace I finally found the culprit and we found the appropriate bug report on Qt side. Also our KDE domain experts started to look into the issue on Qt side.

But still others were able get a small freeze whenever KWin started to manage a window. And in deed further investigation showed that the method handling the managing of a new window can take some time and can cause the compositor to drop frames. Ideally this would be solved by moving the compositor into a dedicated rendering thread but that’s quite a lot of work and might not help in that case as KWin’s main thread grabs the XServer while managing a window. So the better solution was to investigate why the method takes so long. To not drop any frames the method may not take longer than 16 msec, the shorter the better.

While managing a window KWin needs to read quite a lot of properties. Most of them are nicely read in a non-blocking way through the KWindowSystem framework, but some properties are also KWin internal and read in a blocking way. Most expensive was reading the icons which was triggering several round trips especially if the window did not specify the icons in a NETWM compliant way. This could easily cause a delay of 50 to 100 msec during managing a window. Overall the method could trigger up to 14 round trips to the X server which were not needed at all in the case of KWin. Our KWindowSystem framework got an adjustment to prevent the roundtrips if the user of the KWindowSystem framework has all required information already fetched. The result is that reading the icons is now significantly below one msec. For other roundtrip causing methods I introduced two new methods: one to perform the request, one to later read the result. This allowed to remove another set of roundtrips. My measurements showed that each roundtrip takes about half a millisecond on my system. Half an msec here, half an msec there easily adds. Unfortunately there are still some XLib calls (one to read motif hints and one to read WM_SIZE_HINTS) which ideally would get ported and as long as they are not ported delay the managing of a new Client.

Nevertheless this shows quite some nice improvements for our development version which will become 5.3 in a few months. Of course all of that would not have been possible without the switch from XLib to XCB.