DNS comes with a set of rules defining valid domain names. A domain name cannot exceed 255 octets (RFC 1034) and each label cannot exceed 63 octets (RFC 1035). It can contain any character (RFC 2181) but extra rules apply for hostnames (A and MX records, data of SOA and NS records): only alphanumeric ASCII characters and hyphens are allowed in labels (we’ll talk about IDNs at the end of this post), and they cannot start nor end with a hyphen.

Until now, there was no PHP’s filter validating that a given a string is a valid domain name (or hostname). Worst, FILTER_VALIDATE_URL was not fully enforcing domain name validity (this is mandatory for schemes such as http and https) and was allowing invalid URLs. FILTER_VALIDATE_URL was also lacking IPv6 host support.

<?php // PHP 5.6.3 // Label ends with a hyphen var_dump(filter_var('http://a-.bc.com', FILTER_VALIDATE_URL)); // string(16) "http://a-.bc.com" // Label is more than 63 octets var_dump(filter_var('http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com', FILTER_VALIDATE_URL)); // string(81) "http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com" // Lack of IPv6 support var_dump(filter_var('http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]', FILTER_VALIDATE_URL)); // bool(false)

These limitations will be fixed in PHP 7. I’ve introduced a new FILTER_VALIDATE_DOMAIN filter checking domain name and hostname validity. This new filter is now used internally by the URL validator. I also added IPv6 host support in URL validation:

<?php // PHP 7.0.0-dev // Validate a domain name var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN)); // string(33) "mandrill._domainkey.mailchimp.com" // Validate an hostname (here, the underscore is invalid) var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)); // bool(false) // Label ends with a hyphen var_dump(filter_var('http://a-.bc.com', FILTER_VALIDATE_URL)); // bool(false) // Label is more than 63 octets var_dump(filter_var('http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com', FILTER_VALIDATE_URL)); // bool(false) // Lack of IPv6 support var_dump(filter_var('http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]', FILTER_VALIDATE_URL)); // string(48) "http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]"

There is still a big lack in PHP’s domain names and URLs handling: internationalized domain names are not supported at all in the core. I’ve already blogged about an userland workaround, but as IDNs becomes more and more popularsa core support by PHP in streams and validation is necessary. For instance, almost all french registrars support them, and even TLDs – such as the Chinese one – are available in the wild in a non-ASCII form). I’ve started a patch enabling IDN support in PHP’s streams. It works on Unix but still lacks a Windows build system. As it requires making ICU a dependency of PHP, I’ll publish a PHP RFC on this topic soon!