Usage of Perl for websites fell below 1% Posted by Matthias Gelbmann on 17 January 2012 in News, Perl, Server-side Languages Summary: The popularity of Perl as a website scripting language is decreasing. Nevertheless we expect this language to be around for a long time to come. In the early days of the Web, Perl used to be the dominant scripting language. In those days, website programming was often done via the Common Gateway Interface (CGI) of web servers. Although that interface is language-independent, the term "CGI program" was used as a synonym for "Perl program" in that context. Even today, if you happen to see a file with the extension .cgi on a web server, it's a safe bet that this is a Perl script. Fact is, however, that we don't see them too often nowadays. At the beginning of this year, usage of Perl on web servers fell below 1%. It's at 0.997% at the moment. 1% of all websites is actually not too bad. It still means that millions of websites rely on Perl. It is still the number 5 language on web servers, and it also means that Perl is used on more sites than Ruby and Python combined. Nevertheless, there is a trend. Our technology change report reveals that although Perl is losing market share to all other languages, there is also a smaller, but still remarkable trend in the other direction. For example, 4.1% of all Perl sites switched to PHP recently, but at the same time 2,7% of all Perl sites were using PHP until recently. Another notable fact is, that Perl is used a bit more by high-traffic sites, with a market share of 1.5% of the top 1,000 sites and 1.8% of the top 10,000 sites. One element in the declining use of Perl may be the trend towards content management systems. 79% of all Perl sites don't use any content management system, compared to 58% of PHP sites. The most popular Perl-based CMS (Movable Type, Imperia and WebGUI) are far behind their PHP-based counterparts. 91.1% of all Perl sites run a Unix-like operating system. Perl is used on 1.6% of Unix servers, but only on 0.2% of Windows servers. The distribution per top level domains shows, that Perl is used by 3.3% of Japanese sites and by 2.2% of UK sites, but also by 2.4% of .edu (US education) sites and by 3.6% of .gov (US government) sites. On the other hand, that percentage is only 0.1% in China, South Korea and Turkey. Looking at the versions of Perl used on websites, we find that 76% use version 5.8. This version has been released in 2002 and has last been updated in 2008. The latest version of Perl, 5.14, released in May 2011 is used by only 0.4% of all Perl sites. Whatever the reason for the reluctance of webmasters to switch to newer versions is, it remains to be seen whether the work on Perl 6, which started more than a decade ago, will change that. Perl is certainly not a dying language, there is still a large and active community. It may be a bit out of fashion, but I can't imagine Perl to disappear from our surveys in the next 10 or even 20 years. _________________

Please note, that all trends and figures mentioned in that article are valid at the time of writing. Our surveys are updated frequently, and these trends and figures are likely to change over time. Share this page







28 comments So, something i see missing here is any explanation as to how you collected this data, since nowadays i find it hard to tell what is being used for most sites in the first place. Lack of this explanation frankly also makes me wonder how serious this can be taken. Lastly, do you have a graph that weights the occurance of each language by the usage of the actual site? Additionally: How do the absolute numbers look like? These are only relative numbers. Since the internet as a whole is constantly growing you cannot conclude that Perl's popularity is sinking. It might as well have risen and simply been outpaced by PHP so much that it dropped in relation. Hello Christian, I appreciate your feedback. You can find information about our methodology here https://w3techs.com/technologies and here https://w3techs.com/faq. We know that surveys such as this one cannot be perfect, please see also our disclaimer: https://w3techs.com/disclaimer. Our goal is for our surveys to be as accurate as possible. We actually do find concrete and credible indications about the usage of server-side languages on most sites, therefore we believe we are not too far away from that goal. I'm not sure what you mean by weighting the usage of a language by a site. If you mean distinguishing on how many pages of a site a certain language is used, then the answer is no, we don't do that. If we find a language being used on any page, we count it for the site. Regarding the absolute figures: we don't measure "the size of the Internet", we don't have these numbers. You are right: the absolute usage might still increase. I'm sorry if the wording in the article is too vague here. Matthias, thanks for the answer and the links. I would suggest to add such links directly to the article in order to avoid confusion upfront. :) As for usage, that was a bit of a language mixup, i meant traffic. For example, in crime statistics it is customary to compare countries on a per capita basis, because comparisons based on absolute numbers would be meaningless. In the same way i would like to see a chart that compares the languages not by the numbers of sites they occur on, but by the numbers of hits or users per [month, week, day] each language drives/reaches. As for absolute figures: You may not have absolute figures for the entire internet, but you can still publish them as graphs for the slice of the internet you sample. That would at least allow meaningful interpretation of the presented data. :) Christian, I agree, weighing the technologies by the number of hits would be an interesting statistic, but we don't know the number of hits of other sites. The closest we have is the breakdown by ranking: https://w3techs.com/technologies/breakdown/pl-perl/ranking, which shows that this weighted percentage for Perl would probably be around 1.5% or a bit less. Ah, bit of a shame that that data isn't available. I'm not entirely sure that going by ranking ranking is very useful, considering that the top site of the internet has a reach of 47%, while site 500 already has only 0.2% and the reach down at site 110000 is only 0.001%, meaning that the bulk of your data probably barely accounts for 1% of the world-wide traffic. I can't see anything in the links you provided that indicate how you determine what language builds a web-page beyond some fuzzy (and pretty unreliable for non-off-the-shelf-platforms) heuristics. If you provided a list of Alex 500 or 1000 sites and what your guesses were that they used that would be interesting, and back up your claims of accuracy - as it stands it just looks like "we have clever regular expressions" - I would suspect that easier to spot languages and content management systems do well because they're easy to spot rather than because they're widely used. "We actually do find concrete and credible indications about the usage of server-side languages on most sites" .. so publish them. As it stands we just have your word that your tech is really clever and you think it works Hello Aaron, I can understand that you are skeptical. That's a perfectly reasonable attitude towards any statistics that is published anywhere. Your reasoning that some technologies may be easier to spot than others is valid. This is actually one of the points we mention in our disclaimer, and you find a few more points there. These are the limitations we have to deal with when we collect our data. One of our principles is not to use any technique that would favor one technology over another competing technology. Another principle is to balance the remaining false-positives and false-negatives as much as possible. If we succeed in that, we get statistics that might not be 100% accurate, but that are as accurate as they can possibly be. That is our goal. You can check our results not only for the Alexa top 1000 sites, but for all sites. Enter any domain here https://w3techs.com/sites and you will see the technologies that we detect. If you do this for a number of sites, you might find errors or omissions. In that case, we would actually like to hear about it, as the feedback from our visitors is valuable information for us to improve our algorithms. Hi Matthias, I tried out your tool, I got 0 correctly detected and 1 false postive for PHP on the following sites : lovefilm.com

slando.com

imdb.com

catalystframework.org

socialtext.net

cpan.org

metacpan.org (misreported as PHP)

hiveminder.com

Bestpractical.com So again, show us your methodology and a sample of your results because right now your ability to detect server-side technology beyond (advertised and first tier) web servers is not looking credible. A user on another site tested a few famously perl-driven sites and for these you either failed to detect Perl or misdetected it as PHP: lovefilm.com

slando.com

imdb.com

catalystframework.org

socialtext.net

cpan.org

metacpan.org (misreported as PHP)

hiveminder.com

Bestpractical.com Sorry, but that indicates that basically your entire data set is bogus for websites that were written in 2000 or later. PS: Why does your graph not include a bar for "we don't know"? You do have this data and to exclude it is just very bad statistics. @Aaron, @Mithaldu: I'm sorry your comments overlapped due to the delay in approving them. Perhaps the only thing we can agree today is my statement about Perl having an active community. If some day I feel like having an emotional discussion again, I will post an article about the advantages and disadvantages of Perl. Why do we not include a "don't know" bar in the charts: This figure is 17.6% for server-side languages. We include these figures in the charts of those categories where we believe it provides useful information. For example, we see this figure is falling in the content management systems survey, and we conclude that more and more sites are using a CMS. On the other hand, we do not believe that 17.6% of the websites consist of static HTML pages nowadays. Most of them probably use a server-side language but don't give us a hint which one it is. That means a trend in that figure would not measure a change in the usage of languages, but it would primarily indicate a change in our ability to detect them, which would be less useful and potentially misleading in a market survey. Why do we not detect the server-side languages on 17.6% of the sites, including the ones on your list: We do not have any magic access to the servers, we can only use the information they provide. We put a lot of effort into finding ways to use this information so that we get an unbiased data set, that is a close as possible to the real data. I will give you two examples: If we see "WordPress 3.0.3" in the "generator" tag of a site, we believe that. We know that it's very easy for a webmaster to put this on a page without using WordPress, but we think that the number of webmasters doing that would not lead to a statistically significant distortion of our figures. Therefore we use the generator tag as one of the CMS indicators. Furthermore, we assume that they did not re-implement WordPress in another language than PHP, so we count this as PHP site. On the other hand, if we see mod_perl or mod_php in the "Server" http header line, we find this alone is not a strong enough indication that these languages are really used, so we don't count them. In total we have a few thousand indicators for technology usage, and we decide on a case-by-case basis whether they are reliable enough. I have mentioned earlier two of the principles that guide us in making that decision. There are sites where we don't find any strong enough indicators for a server-side language, and your list above can get much longer if put more effort in finding them. As long as the percentage of undetected languages is relatively low, and as long as we have no indication that this leads to a bias in our statistics, I believe that our surveys are useful. I have said before, we know that our surveys can never be 100% accurate. I tried to explain why I still don't think that our "entire data set is bogus", but if this is your conclusion, you might be better off not using our data. In that case I can recommend another statistics site, which I enjoy reading regularly: https://twitter.com/madeupstats Hi Mattias, I wasn't being emotional, I was looking for any evidence to back your claims. Being able to detect and measure server-side tech when they aren't advertising it in headers and/or filenames would be a useful tool. Firstly for being able to measure usage of older styles of programming vs newer frameworks (and thus the impact of new frameworks like mojolicious, or catalyst, and secondly for being able to ensure that you don't give away such clues when locking down a site that you anticipate will attract hostile attention. "There are sites where we don't find any strong enough indicators for a server-side language, and your list above can get much longer if put more effort in finding them." umm... I was making an effort to find a site where your tool could correctly detect the backend technology, I was just working through partners, clients and employers of mine that I know wouldn't make an effort to hide the technology they used. Furthermore, you seem to be using mod_perl to detect perl version, that means your data is going to be hugely unreliable as several of those on the list I provided still use mod_perl. I'm disappointed that what looked like it could be useful and interesting tool and news item turned out to be almost useless - none of the statements made in your news item can be backed up, and you won't provide details of your methodology or even a sample of the results. Adding insult to injury, you won't admit that your ability to detect the backend technology of any modern professional website (i.e. mvc/rest paths instead of filename.ext and not advertising platform and version in it's headers) is pretty much non-existant. Your caveats are misleading too - Your only available data is filenames, extensions and headers - that's nowhere near enough to draw any conclusions except for off-the-shelf platforms. Also by ignoring subdomains you're not seeing large parts of the internet - subdomains are often entirely different sites within an organisation, often entirely different platforms on seperate hardware (or even seperate continents) Thank you for taking the time to express your reservations. I mean that. Having the best statistics is not enough if users don't understand well enough how they are made. We have extended our methodology description and FAQ page several times in the last year. I take your input as an advise that we have to keep adding to that information. We are ready to do that, it's just a question of how to describe several thousand indicators in a way that is easy to understand and provides real insight. I'm sorry if anything on our site or in the article made you believe that we have means to detect a site's server-side language that go beyond close examination of whatever information the site provides. The only thing I honestly don't understand in your comment is that we don't provide a sample of our results. We provide all our results in the site information page. You have access to our whole data set via that page. Regarding your remark on subdomains: we don't ignore subdomains, but we don't count them as separate sites. If we see some technology used on sub.example.com we count that technology for example.com. The reason for that is simple: many sites provide webspace for users in the form username.example.com. Some of these sites have millions of users, and in many cases they all share the same technology. Counting them millions of times would have a severe, and in our opinion misleading impact on our statistics. There are, of course, instances of subdomains that could well be justified to be counted as a separate site, but we have no means to detect them automatically, and we don't have the resources to classify subdomains manually. I would be lying if i wasn't motivated by my emotions. Nevertheless this does not let me set aside my integrity. The issue i take with your presentation is that it is simply bad science. You are displaying statistics and as you admit your data is not perfect. It is still perfectly fine to display such statistics, but as you will see in any given scientific publication, it is necessary to also display the error range. In your case the error range is 17.6%. As such your graph would need to look like this: http://dl.dropbox.com/u/10190786/language_use_on_websites.png Similarly, your copy should state that Perl is now used on 1%-18.6% of all websites, same for all of these: PHP: 77,30% - 94,90%

ASP.NET: 21,70% - 39,30%

Java: 4,00% - 21,60%

ColdFusion: 1,20% - 18,80%

Perl: 1,00% - 18,60%

Ruby: 0,60% - 18,20%

Python: 0,30% - 17,90% To omit this data does the reader a disservice, since it could very well be that Ruby on its own has long surpassed Perl, ColdFusion, Java, and Python together and you simply cannot detect it. > We provide all our results in the site information page. You have access to our whole data set via that page. Since you do not provide a list of the sites you have data for, you provide only the data for all guessable sites on that page. If i were to try and download ALL your data for ALL site you have available, i'd need to write a brute force script that goes through all possible domain + TLD combinations and tries to download every single one of them. I'm sure your sysadmin wouldn't be too happy if i tried that. Hi Mattias, I misunderstood how you handle subdomains - that does seem the optimal way to do it. The problem you haven't resolved is the bold claims you've made about how widely used any of the platforms you've surveyed are - despite your efforts to balance - most modern web development doesn't advertised details of backend technology in it's http response headers, filenames or paths : there is simply no way you can come to the conclusions you have based on such an unreliable and sparse sample. Your statement that only 17% of domains don't have any backend technology dynamically serving content doesn't account for the fact that many of those that do happen to have a couple of .php or .asp or .cgi filenames lying will predominantly be using something unrelated for most of the site. Your limited abilities to detect serverside framework etc as well as the low proportion that advertise themselves means that your survey only covers a particularly self-advertiising tip of the iceberg. On that basis the news story here is that "we can't reliably detect most modern perl backends" rather than any bold and unsubstantiated claims about it's actual usage the same applies for many of the other technology platforms - php and asp/.net are massively over-represented, others massively under-represented. Hi Mithaldu, hi Aaron, I just made an action point for our project: "Investigate ways to show the error margin in our surveys". The proper way to do this is probably giving an interval of percentages in which the real distribution is with 99% probability or so, as it is sometimes done in opinion polls (the probability that all the undetected sites use the same technology is close to zero). Thank you for contributing to that improvement. Cheers, looking forward to seeing that. :) Hi Matthias, I'd give up on detecting server-side development languages outside of the low-hanging fruit like php and asp/.net that tend to have very obvious filenames altogether. The web isn't developed using cgi scripts or index.php anymore - and even when those types of filenames are used, it's usually behind mod_rewrite or similar url rewriting so you'd never see them. You've created a story based on the noise that now drowns out the signal from the data in your survey here, I'd look at different things to find out - this survey technique may have been appropriate up until 7 or 8 years ago, but now it's woefully out of date and the fact I've been unable to detect server-side tech with your tool on any of the sites I've worked on in the past (a lot longer than the list I've already posted) 5 years shows that. There are some good stories you could be researching and publishing, but until you realise that your backend tech survey isn't working as you use it you won't see them - for instance - you could use exactly the results we've all seen to chart the growth of MVC/Rest urls and seo-opttised paths, the decline of the old cgi-bin directory is interesting and how we now do things in different ways, but you need to get yourself up to date on how different platforms are developed and you will find interesting and worthwhile results - this story about perl (and python / ruby) is a total red herring though - it just highlights that it's time you tried a new approach. Hi Aaron, Certainly, the web keeps changing every day. Rest assured, that we are well aware of the frameworks you mentioned and the differences they make in technology detection, including the very valid points that you made. I don't think we need a new approach, but we need to refine and extend our approach permanently, and that is what we are doing. You might be surprised, for instance, to see how low the Ruby usage would be if we wouldn't include the Ruby-on-Rails sites. Also, I don't think that this is a language-specific problem, there are just as many PHP-based MVC framework used (proportionally), than for other languages. You called our 82.4% detection rate the tip of the iceberg, I would call it the iceberg, but we keep working to cover more and more of the remaining tip as well. Discussions such as this one keep us aware that our work never stops. Hi Matthias, Your 82.4% detection rate is based on finding an indication of an instance of a language being used accross all the subdomains of a domain - that is very much the tip of the iceberg - finding a formmail.pl or index.php on one site out of maybe 10 or 40 on a domain is certainly the tip of the iceberg. As I've said - I've used your tool, I even looked at why gumtree showed as having used perl but slando not (they were both started by the same people, and use a very similar platform to do roughly the same thing) - it came down to a /cgi-bin/login.pl script - out of thousands of dynamic pages, you found one or two cgi scripts - that's hardly representative. You're also failing to deal with the fact that most large websites are multi-tier so they will use a mix of java and other tech that won't be exposed in the pages you crawl - several of those on the list I posted used different languages to build parts of the site such as Solr for searching, etc so you didn't just fail to spot the Perl building the frontend but Java providing search, and sometimes even PHP tools on the same site. Sorry, guys, but we use the .asp extension for our Apache::ASP (perl) based websites. This is possibly a small number compared to Windows-based sites using such file extension, but still non zero. I think it is safe to suppose that .asp on UNIX is not ASP.NET. :) Nevertheless - without looking at your counts and numbers - it's evident, that PHP is more common in the web. You can run a PHP-based CMS without even knowing the basics of data-exchange and server-function (ever setup Joomla or OSCommerce). And it's the customers who want to have these programms installed. But I see in my practice, that sever-based function, i.e. DB-clearing and all that server-based functionality is done in Perl - but: where's the problem? It's the way it is. With regard to Perl 5.8 being popular, CentOS and RedHad both, at least in the versions I worked with this past year, still distribute this version, so it may be that the admins have just used whatever was installed as part of their OS installation. Perhaps if the distros themselves would upgrade we'd see more sites using newer versions. this is quite the anticlimax of what i was thinking. From the number of projects posted about perl I was assuming that perl is a growing language but the stats are totally opposite. Or maybe there are not that much the developers for perl which make these projects live long on freelance sites. After I show my embedded software team how things is easier in Perl, everybody starts to learn Perl. I am trying to help them in every aspect in using Perl and up to now, they have been able to handle all scripting relevant problems successfully and quickly. We made Perl our team's formal scripting language to be able to speak the same language in the team. Recently I checked the web capabilities of Perl other than CGI, I saw amazing frameworks like catalyst, mojolicious, dancer, etc. and also PSGI/Plack implementation, I think that Perl is absolutely ready to compete with other web languages. Client = HTML5+Javascript+CSS3, Server = Perl+Plack+Mojolicious, good combination for powerful sites in my oppinion.



This entry is closed for comments.