window.onload = function() {

document.getElementById("comments").innerText="Comments";

document.getElementById("comments").nextSibling.nextElementSibling.innerText="Leave a comment";

document.getElementById("frmaddCommentForm-add").value="Submit comment";

document.querySelectorAll("#frm-comments-addCommentForm label")[0].innerHTML="Name: *";

document.querySelectorAll("#frm-inquiryForm-form label")[1].innerText="Name:";

document.querySelectorAll("#frm-inquiryForm-form label")[2].innerText="E-mail:";

document.querySelectorAll("#frm-inquiryForm-form label")[3].innerText="Message:";

document.querySelectorAll(".expose h6")[0].innerText="CONTACT US";

document.getElementById("message-sent").value="Sent";

};

Let me introduce our company at first. We are a small Czech agency specializing in the small business market. We can provide a full package of services for our customers – network design and configuration, server installation and administration, development of information systems and websites, online marketing – SEO, PPC, data analytics. We are also a certified Google Partner. You can meet us as speakers on many conferences in the Czech Republic, WordPress community conferences, WordCamp, Barcamps, Marketing workshops etc. I’m very interested in network security, but I often move around WordPress development and its security and performance.

I’m sorry this is currently the only article available in English. If you have any questions about online business in the Czech Republic, feel free to contact us.

Automated tools analyzed almost 4 GB of source code. The analysis was performed the first week of April 2015 therefore gained data is valid to this date.

The first task of the crawler was to determine WP version. It collected information about templates and plugins, which can be detected from HTML source of homepage, usage of Google Analytics, Facebook components (Like buttons, Fan walls, etc.) and some other details. All sites were checked on backlinks via Majestic SEO and on mentions on social networks (only for homepage). I also tried to determine webhosting provider of explored site from IP addresses.

I used 3 main methods to identify WP version:

From “generator” meta in the source code From “/readme.html” file From RSS feed “/feed”

In the case of failure I tried to identify the WP version from MD5 hashes of some core static files and I take advantage of the fact some plugins add WP versions as parameters to their static resources (?ver=xy). If this parameter appeared in at least 60% cases of resources with these parameters I considered this number as a WordPress version. Despite these methods I wasn’t able to determine exact WP version in about 4,000 cases. The reason is probably usage of Security Plugins, which hide WP versions.

Plugins and themes were detected according to links to their CSS and JS files in wp-content folder (e.g. /wp-content/themes/twentytwelve/style.css = Twenty Twelve theme).

Now we have come to the research results. Numbers are slightly rounded for better readability.

WP versions

The main point of the research was to determine which WP versions are actually used.

The good news is, thanks to easy updates most sites use the most recent version available at the time of research – version 4.1. Some sites (8) experiment with the beta version of 4.2 (which is stable at the time of writing this article and the patch 4.2.1 was already released).

On the other hand there are 7 sites which use version 1.5 from 2005, so they haven’t been updated for more than 10 years.

WordPress version 2.x is also very old – its last release was in 2009. There are almost 2,500 sites which use this version and I think they are very risky.

If we define versions 4.x as “Updated”, versions 3.9 as “Slightly outdated”, remains of 3.x as “Outdated” and lower versions as “Archaic”, we can see this distribution:

The number of outdated version 3 presents a serious security risk. It is more serious than the archaic versions, because there were many more leaky plugins available.

A closer looking at the still actively developed minor/patched versions may be also interesting:

It is obvious that autoupdates work well – most of the minor versions are the most recent.

Themes

After a short excursion to the WordPress core, we will focus on the most used Themes.

Many web developers prepare their own themes either child themes or themes made from scratch. Therefore enormous diversity is not surprising – I found more than 23,000 different themes. Despite this there are many popular themes used on hundreds of sites.

Number of sites use the theme Number of different themes (rounded) 1 16000 2 3600 3 1300 4 600 5-10 1100 11-99 580 100+ 28

As you can see the most of themes are unique (used on only one site).

On the other side, there are 28 very popular themes used on more than 100 sites – they are deployed on 20 % of explored sites.

11 % WordPress sites use default themes

Plugins

We have already checked core and visual appearance, so the next step is to add some extra functionality and security holes through plugins.

Lots of themes (mainly the premium ones) also include some plugins. Problems come when somebody makes changes in the original (premium) theme and “breaks” the possibility of updates. If there is some vulnerable plugin, it will never be fixed. This is why you always may use child themes or your own plugin for customization.

I found 160,000 plugins in total (6,500 different kinds) and made a chart of the 50 most used WordPress plugins in the Czech Republic. These 50 different plugins represent almost 50 % of all detected plugins.

* obsolete plugins

You may notice the dominance of the first 3 plugins:

All in One SEO Pack

This plugin enhanced some basic on-page SEO factors. The main purpose is to canonize links, set up titles and metas (social networks included), globally and per post, disable archive indexing and enable sitemap generation. This plugin also provides connection to Google Analytics and proves web ownership for some other tools (e.g. Google Webmasters Tool). It allows editing of robots.txt. But be careful here – this plugin (still?) disallows access to wp-includes folder which is something that Google doesn’t like.

I prefer WordPress SEO by Yoast instead.

Contact Form 7

CF7 allows contact form creation. You can use simple codes to design your own contact form with an unlimited number of fields. Outputs from this form are sent to chosen email addresses. I often use two other extensions: Contact Form 7 Modules (hidden fields) and Contact Form 7 Honeypot (simple, but efficient antispam). The main advantage of this plugin for me is its “extension readiness”. It is pretty simple to link it to other systems, e.g. CRM.

NextGen Gallery

This plugin was virtually the only reasonable gallery solution in older WP versions. In modern WP versions the integrated gallery is very usable. But if you need a more complex gallery solution, this plugin can be a good choice. There are lots of extensions for it, too.

The popularity of this plugin is probably related to the number of older WP versions.

WordPress is often used for simple linkbuilding SEO (I don’t want to use the word “blackhat”) and lead generation pages, so the reason of popularity of the first two plugins is obvious. I found a network of 1 600 sites made for this purpose which were owned by a single company.

I divided detected plugins into categories to determine the main reasons why users want to install plugins. The distribution also shows which features are missing by default.

Users look for advanced contact forms frequently.

There is also no lightbox to show pictures in the default configuration. We can see the Fancybox for WordPress plugin on the 27th place. A serious vulnerability was found in this plugin recently. I detected an unpatched version on almost 400 sites!

Almost 50 % of websites using Fancybox for WordPress plugins are vulnerable.

Users love image sliders, but I hate them. The most popular slider plugins are Slider Revolution and Layer Slider (both premium). A very serious vulnerability was found in the first one last year. Thousands of sites were infected. I think the main reason is its integration to various premium themes, which lost update capability due to editing the original theme, and frequent illegal usage of this plugin…

The Slider Revolution plugin was detected on more than 2,500 sites. Almost 600 of these sites use a vulnerable version and allow the attacker to get full control over web.

More than 20 % of sites are using the Slider Revolution version containing serious vulnerability.

Many users want to translate their site to other languages. WMPL is the second most popular premium plugin (there were also security problems recently). There are other popular plugins for localization: qTranslate (I don’t like the way it translates content using comment blocks) and Polylang, which is strong competition to WPML. But there is also a new guy among localization plugins which looks promising – Babble by Automattic.

Another common task is to add Google Analytics tracking code. Lots of modern themes have an option to add these codes, but special plugins are still very popular.

Plugins which recommend related posts are also installed often. These plugins usually consume a lot of performance. It is not a surprise caching plugins for performance boosting are also very popular.

The number of installations of 2 main caching plugins (WP Super Cache a W3 Total Cache) is comparable. WP Super Cache is my favorite one due to its simplicity - it does only page caching, but it does it perfectly. For further optimization I always use appropriate Object Cache drop-in to reduce DB queries caused by transients and Autoptimize to minify and combine CSS and JS.

Object Cache Backend is a simple way to enhance performance thanks Object caching technology available on the server (e.g. APC, Xcache, APCu,Memcached, Redis). Administrator needs select an appropriate drop-in by hand, so I found only 670 sites use this.

Only 1 % of site use Object Cache Backend.

Some users want to extend basic features like paginating by numbers or post rating. Users also want to embed Google Maps, social sharing buttons and videos. Somebody lacks the possibility to create tables in default WP editor (although TinyMCE allows it). We can see quite a big number of outdated installations here – more sites use WP-Table Reloaded than its successor TablePress.

Some users also appreciate the ability to create own layouts without HTML coding. It is possible thanks to various Page Builders. These plugins are often included in premium templates to gain commercial advantage.

Many sites are built for commercial purposes, so we can find various eCommerce plugins and plugins for sending newsletters.

Security Plugins

Security plugins are a separate chapter. It is not possible to detect these plugins in source HTML code. Fortunately there is only a few common plugins of this kind, so I wrote a test tailored to them. I detected a security plugin on 6 % of sites.

The most popular security plugin is iThemes Security. This plugin allows blocking access to files which consist of sensitive information (like readme.html). It also blocks readme.txt files in plugin folders. I used these files to get more information about sites, so I also checked if this feature is enabled. It was enabled on 20 % sites using iThemes Security.

Almost all security plugins allow hiding the WP version, so I suppose they caused lots of failures in determining WP version.

Where are they hosted?

I tried to identify webhoster on the basis of IP address. I used whois to get the owner of the IP subnet (address range). This method is not 100 % accurate – some bigger webhosters use more subnets with different names.

I wasn’t able to find all existing Czech WordPress sites, so real numbers can be different.

Chart of the Czech webhosting companies by number of hosted WordPress sites:

Rank Webhoster Number of sites 1 Wedos 13970 2 Savana 3590 3 Active24 2940 4 Český Hosting 2270 5 Stable.cz 2110 6 Forpsi 2090 7 Gransy 2000 8 Gigaserver 1480 9 Web4U 1120 10 Hosting90 980 11 cz-hosting 900 12 Ignum 760 13 Tele3 730 14 Pípni 700 15 Angelhosting 680 16 Zoner 580

Wedos is apparently the clear leader. It is caused by the low costs of their services and strong marketing. This company also sponsored several WordPress conferences, so their name is connected to this CMS.

In the Czech Republic there is quite a lot of datacenter houses. It is hard to determine the relationship between IP address and exact location, but I tried. You can see approximate numbers in the following table.

Rank Datacenter Number of sites 1 Wedos 14000 2 Master Internet (4D) 9200 3 Casablanca 8700 4 VSHosting (TTC/ServerPark) 6700 5 SuperNetwork (TTC) 4700 6 Active24 (Tower) 2800 7 CoolHousing 2200 8 Forpsi (CZ1) 2100 9 DialTelecom (Nagano) 1600 10 Coprosys (Nagano) 770

HTTP servers

I also tried to detect used HTTP servers. Most webhosters use Apache exclusively due to users’ expectations. Apache serves 51,000 sites. My personal best HTTP server is Nginx because of its performance and straight configuration. Nginx serves 11,500 WP sites in the Czech Republic. There aren’t many other players on this ground.

Apache 51000 Nginx 11500 IIS 1100 OpenResty 200 Lighttp 50 LiteSpeed 40

I had never heard about OpenResty before – it is extended Nginx.

I also tried to detect the exact version of HTTP server from headers, but this information is often hidden.

More than 30,000 Apaches didn’t disclose their version, 18,000 use 2.2 and 1,400 use 2.4.

In the case of Nginx the situation is similar:

4,500 didn’t disclose their version, 3,500 use 1.2.1, 1,600 use 1.7.1 and 1,400 use 1.6.2+3.

PHP versions

Many HTTP servers hide their exact version. It is not a surprise they hide PHP versions, too – 34,000 sites didn’t tell their PHP version.

PHP version affects performance, newer versions are noticeably faster.

Disclosed versions:

PHP/4.3 20 PHP/4.4 60 PHP/5.0 2 PHP/5.1 50 PHP/5.2 7000 PHP/5.3 14300 PHP/5.4 7500 PHP/5.5 2800 PHP/5.6 300

Several sites also experiment with HHVM.

Web charts

Everybody loves “top charts”, so I prepared a few. Keep in mind values except “Trust flow” may be artificially influenced by massive purchases of backlinks and fans.

Top 10 sites by Trust flow (Majestic SEO):

www.radegast.cz

www.pamatnik-terezin.cz

www.mediatel.cz

www.cscope.cz

www.corro.cz

www.mirc.cz

www.ancr.cz

www.neternity.cz

www.bonipueri.cz

www.zdravaprsa.cz

Top 10 sites by Citation flow (Majestic SEO):

www.autoskola-praha-ridicak.cz

web.etronic.cz

www.internetprofi.cz

www.hostivarskaprehrada.cz

new.rampusak-stity.cz

www.czech-production.cz

www.profilamas.cz

sd.kralovstilvi.cz

sstepanhon.cz

www.mediatel.cz

Top 10 sites by number of external backlinks (Majestic SEO):

www.geosense.cz

www.profilamas.cz

www.radegast.cz

www.neternity.cz

www.sperky-sw.cz

www.ftonline.cz

www.drosera.cz

www.a2b.cz

www.nsko.cz

www.internetprofi.cz

Top 10 sites by Facebook likes and shares:

www.artex-pokladny.cz

www.hubnutihrou.cz

www.milionaremdoroka.cz

www.revolucnimarketing.cz

www.darujvajicko.cz

www.elitevideoacademy.cz

www.akademieretoriky.cz

www.komunikacikuspechu.cz

www.moje-sebeduvera.cz

www.pragulic.cz

In this list there are many suspicious “infoproduct” sites. I’m not convinced high ranks originate in an organic way.

Top 10 sites by tweets:

www.luciesvarcova.cz

www.test2014.cz

www.fotoseminar.cz

www.neurra.cz

www.stvanci.cz

www.hubnutihrou.cz

www.companyconsults.cz

www.ceskycmelak.cz

www.cafedu.cz

www.mocslov.cz



Top 10 sites by LinkedIn mentions:

www.tqtest.cz

www.hubnutihrou.cz

www.taichiresort.cz

www.superprijem.cz

blog.emailkampane.cz

www.laserfoto.cz

www.kompetenz.cz

www.navykybohatych.cz

www.mediatel.cz

www.inside.cz

Top 10 sites by +1 on GooglePlus:

www.xindlx.cz

www.doperin.cz

www.antelli.cz

www.rodinne-konstelace.cz

www.studiocamo.cz

www.artex-pokladny.cz

www.test2014.cz

www.oezentrum.cz

www.neurra.cz

www.probuzenyslon.cz

Interesting data

I found some interesting data during the analysis, so let’s take a look.

Size of HTML

I monitored the size of HTML response (raw HTML – without JS, CSS, images).

50 % sites contain less than 28 kB of HTML

80 % sites contain less than 45 kB of HTML

I found almost 80 sites with more than 0,5 MB of raw HTML code = virtually useless sites.

Google Analytics

I also checked if sites use Google Analytics. Currently there are 3 types of tracking code:

Old tracking code (ga.js) – obsolete Universal Analytics – the new one Google Tag Manager – whole system for codes management, it uses Universal Analytics

More than half of websites don’t use Google Analytics.

27 % sites use the old GA code.

GTM is used on only 1 % of sites (about 650 sites in total).

Remarketing isn’t related to Google Analytics, but I also include this report here. Usage of remarketing codes show on commercial focused sites.

2 % of sites use remarketing (1,200 sites).

The same situation is in the case of AdSense:

Almost 9 % sites use Google AdSense.

3,800 sites use synchronous variant of advert code and 1,800 site asynchronous.

Facebook

The influence of Social networks is increasing, so I also tested if sites use Facebook components, e.g. Like and Share buttons or Fan walls.

Facebook Social plugins use 38 % sites.

HTTPS

Security is a very important aspect of every website, so I was interested in usage of HTTPS with valid certificates.

Only 0,24 % sites use HTTPS with a valid certificate.

What does Google PageSpeed say?

All sites were examined by Google PageSpeed Insights. I obtained data on counts of static resources (CSS, JS, Images) and their sizes.

In total PageSpeed score there are several factors counted: optimization of static content (code minification, possibility of lossless image compression), proper caching, transfer compression, existence of useless redirection, server response time. The most common problem was the size of images and unset caching in headers.

Half of sites achieved 75+ points in Google PageSpeed.

Data shows most sites reach pretty good scores. Some of them on both sides of the range were further examined to resolve why they got such a high/low score. The main reasons for low scores (360 sites got 0 points) were huge unoptimized images – Google PageSpeed found that it was possible to save a few megabytes without loss of quality. There were some extreme cases – 2 sites had images larger than 70 MB on their homepages! The worst rated sites had naturally many more problems. High scored sites were mostly older and simple ones – there is virtually a chance to screw something up. Modern professional sites usually got scores around 90.

Almost half of all tested sites contains more than 1 MB of images on their homepage.

Before you upload an image to your web, edit it in a graphics editor – lower the resolution and JPG quality settings. An image larger than the physical size of a screen won’t be at all appreciated by your visitors.

Tip: Personally I recommend IrfanView for quick images editing – in its plugin pack, there is a good tool, RIOT, for web image optimization (it is also possible to download newer versions which is also available in standalone and contains more tools for PNG).

Compared to the previous test of HTML code size, we can also calculate total page size including images, scripts and cascade styles. The total size of all homepages was almost 120 GB. Let’s take a look at homepage size distribution:

Almost half of tested sites have a homepage size under 400 kB.

Per resource type distribution is also interesting:

The fact that the biggest part belongs to images is not surprise. It is a lot bigger surprise that 30 % belongs to Javascripts.

The total count of static resources loaded by site is visible on following graph:

Many sites load a huge number of other files, sometimes more than 100. Some of them use various photo galleries so many requests are understandable. On the other hand many requests are caused by plugin over-usage.

Almost half of all tested sites contain more than 30 static resources on their homepage. There are more than 2,000 sites which have more than 100 static resources.

The number of static resources on homepages is visible in the following graphs:

Low request sites are usually older simple sites, but many modern sites use plugins to minify and combine static resources (e.g. thanks to Autoptimize plugin).

We discussed the number of resource and now it is time to look at their size.

50 % of sites include less than 60 kB of CSS and 350 kB of Javascripts. 10 % of sites use more than 300 kB of CSS and 4 % of sites use more than 2 MB of Javascripts.

According to the data I acquired, I daresay 25 % of transferred data are unnecessary.

Conslusion

My research proves the majority of Czech WordPress sites are outdated and lots of them are vulnerable. Performance is also an issue. WordPress is often used for infomarketing sites due to its simplicity and user friendliness – everybody can make his own site without any knowledge of code. Unfortunately these users do not accept the fact websites need constant care.

My research also shows the popularity of default themes and reveals the most used plugins. The Webhosting company Wedos proves clear dominance in this market.

Lots of sites can be described as “Infomarketing sites with default templates, which use plugins to turn on SEO and to administer contact forms, it runs with minimum costs and is outdated.” On the other hand there are lots of professional sites focused on security, speed and content quality. Simplicity is the main advantage of this CMS and with a little effort it is possible to make a professional website.

What about the next steps?

A lot of vulnerable sites were found. The access to many of them were even blocked by my antivirus. I would like to contact the owners and creators and help them repair/improve their sites.

You can follow us on social networks (almost in Czech):

Another articles about WordPress (Czech).

You may be also interested in slides from my speech at WordCamp Prague about WordPress Security or my slides about WordPress Performance.

The current state 08/2017: