What is a URL?

A URL (Uniform Resource Locator) most commonly references web pages (HTTP) but can also apply to file transfer (FTP), email (mailto), database access (ODBC) and many other applications. An HTTP URL can also describe the location of external resources such as an image, style sheet, script and even a specific section within an HTML document. Complete or at least partial URLs appear in the address bar of most web browsers.

This article focuses on HTTP URLs, also known as web addresses. To help explain the various parts of a URL, I have put together the following reference diagram.

1. Scheme

The scheme is usually the name of a protocol, which defines how the resource will be obtained. However, schemes like "file" and "mailto" do not specify a protocol. Clients such as web browsers connect to websites over the Hypertext Transfer Protocol, otherwise known as HTTP. To learn more about HTTP and its secure alternative HTTPS, please read my previous articles HTTPS Fundamentals and HTTPS Everywhere.

2. Subdomain

While www is most common, subdomains can be configured with any value consisting of case-insensitive alphanumeric ASCII characters (a-z, 0-9). Hyphens are permitted if surrounded by other ASCII characters or other hyphens. Hyphens cannot be the first or last character of any domain or subdomain. Subdomains can also be removed, shortening and simplifying the entire URL.

www.commentary

Unless a website already ranks high in search engines, I do not recommend using www subdomains. Because many people have formed a habit of typing www prior to any domain, the server should be configured with a permanent 301 redirect to the non-www domain. This way no traffic is lost and the complete URL is more concise.

Allowing both the www and non-www domain to load without redirecting to one or the other is also something I do not recommended. To me, this practice implies a lack of attention to detail but beyond my own opinion is how most search engines treat each subdomain (or the lack of one) as a separate entity. This means that any domain authority blog.example.com has built up may not transfer to the rest of example.com . Instead, the blog should be published to example.com/blog .

3-4. Domain + Top-level

In October 1984, the original set of top-level domains were defined as:

.com

.edu

.gov

.mil

.org

.net

Since 1984 a handful of new top-level domains were created but the most significant change came in 2012 when ICANN (Internet Corporation for Assigned Names and Numbers) authorized the creation of nearly two thousand new top-level domains. A complete list can be found on icann.org. Several hundred of these new top-level domains are now available for registration and more will become available in the near future.

5. Port

The purpose of a port is to uniquely identify different processes or applications running on a single server. Port numbers enable each process or application to share a network connection. Public URLs often do not include a port number. In those cases, the scheme's default port is used. Default ports for HTTP and HTTPS are 80 and 443, respectfully.

6. Path

A path describes a specific location within a file system. In the context of an HTTP URL, a path describes a specific resource such as an HTML document. A path describing a static HTML (no server-side scripts) resource reflects the publicly accessible file structure of the server. A path describing resources generated by server-side scripts like Ruby or PHP do not necessarily reflect the server's file structure. This is due to the dynamic nature of server-side scripts.

7. Query String

The query string contains data that will be sent to the server for additional processing. It may contain name/value pairs separated by an ampersand (&), like so:

?first_name=Alfred&last_name=Pennyworth

In the example above, the data would be:

"First Name: Alfred"

"Last Name: Pennyworth"



This data is then passed to server-side scripts for processing. If this is a valid query, the server will return information pertinent to Alfred Pennyworth.

8. Fragment Identifier

Within an HTML document, individual sections can be assigned unique names by way of an ID attribute. An example would look like this:

<section id="foo"> <!-- section content --> </section>

If the fragment identifier #foo is attached to the end of a URL, web browsers will jump directly to that section. This specificity can be very helpful on web pages with a significant amount of content. IDs and fragment identifiers also form the basis of navigation options within single-page applications.

As a technical note, ID attributes can be used with any valid HTML tag. They are not limited to section tags as illustrated above. Regardless of the HTML tag, the fragment identifier uses the same syntax within the URL.

SEO Friendly URLs

If SEO (Search Engine Optimization) is a goal of your website, here is how to make the most of URLs.

HTTPS Everywhere

Use the HTTPS protocol instead of HTTP. Besides protecting the privacy of your visitors, Google has publicly announced they are giving a small SEO bonus to websites that correctly employ HTTPS on all pages. To learn more about this, please read HTTPS Everywhere.

Use Subdomains with Caution

If you choose to use subdomains, be careful because most search engines treat each subdomain as a separate entity. This dilutes the effectiveness of any domain authority your website has earned.

Keyword Separators

Use hyphens (-) to separate keywords, not underscores (_). This is because most search engines treat alfred-pennyworth-biography as three separate words and alfred_pennyworth_biography as a single word.

By using hyphens, queries for "alfred pennyworth", "alfred biography" and "alfred" will all be considered relevant. Otherwise, only queries for exactly "alfred_pennyworth_biography" will be considered relevant.

Reduce URL Length

Keyword effectiveness decreases as URL length increases. Keep a URL keyword rich but concise. Minimizing URL length is another good reason not to use a www subdomain.

Remove Stop Words

Words that carry little to no keyword value are known as "stop words". Grammatically, the best keywords are generally nouns, verbs and adjectives. The worst keywords are generally function words, which should be removed from URLs. Here are some examples:

articles (the, an, a)

auxiliary verbs (am, is, can)

conjunctions (and, or, but, while)

particles (then, thus)

prepositions (of, on, for)

pronouns (he, her, we, which)

Avoid Query Strings

Search engines prefer clean, human readable URLs. If markup is generated by server-side scripts, some URLs might contain query strings like this:

https://example.com/index.php?article=1234

This URL would be far more effective in search engines if it read:

https://example.com/articles/create-seo-friendly-url

The second example is longer but it describes the content far more effectively.

Consider Fragment Identifiers

Use of fragment identifiers may help search engines better understand page structure. If a fragment identifier contains more than one word, separate the words with a hyphen.

Content is Key

As a parting SEO thought, remember that technical optimizations like these are helpful but the key to ranking higher in search engines is to publish high quality content on a regular basis. With quality content comes inbound links, which leads to higher search engine rankings.

If you have any URL / SEO questions or comments, I can be reached on Twitter @BenjaminPatch. Until next time, take pride in crafting great URLs!