A secret note to Bug hunters about URL structure and its parsers. Simgamsetti Manikanta Follow Mar 14 · 3 min read

I wrote this article because there is no proper resource about the URL structure on the internet.

Introduction-Uniform Resource Locator (URL)

We all are familiar with the internet, so we are also familiar with URLs. We can easily recognize a string whether it is URL or not by seeing the ‘Protocol Scheme’ followed by “://” and then sequence of characters separated by “dot”.

#examples

https://example.com/resource/test.img

https://abcd.example.com/resource/index.html

Q. OK! But how the browser recognizes that the given input string at the address bar is the URL?

A. Actually, the browser first looks for the URI instead of the URL.

oh Wait!…. What is the URI?

Before we understand about the URL we should know about the Uniform Resource Identifier (URI). It worthy to directly go to the URI syntax instead of going into depth of its characteristics and rules.

RFC3986 URL Structure 1

URI Components

Scheme: In this we have to define the protocol. (ex: http, ftp, ldap etc.,)

hier-part: authority (This is so complicated )

Path: which defines the resource location

query: It defines the query parameters

N ow you will get a doubt. What is the difference between the URI and URL.

Actually, the secret is in the Authority part.

Authority:

RFC3986 URI components

This is the part I missed when I was learned about the URL.

Userinfo: Which is nothing but the username, password fields which are separated by the “:” (ex: username:password)

And the remaining parts we already know.

So that the actual URI is look like below.