The OS controls the execution of the program by incorporating the virtual memory, a mapping for the physical memory that allows each process to feel itself the only running program at a time. The OS also divides the virtual memory into blocks that control the execution of the program to some extent. For example, to call the functions properly, the OS uses the stack, which automatically allocates/deallocates memory space for the function arguments and local variables.

We write programs in high-level programming languages, such as C++. But the program, in the end, should be translated into machine code to be run by the CPU. The translation process (known as a compilation) is simplified by using middle states of translation, i.e. the program translates to an assembly language and then to machine code. Let’s take a look at the low-level representation of the logic mentioned above.

Let’s suppose we have the function request() which makes a Google search request or makes an HTTP request based on the result of the parsing of the text typed in the address bar. We introduce the following simplified blah-code (a blah-code is a mix of pseudocode and whatever the author wants). Let’s say we set the url property to 1 if the address is a valid URL and set it 0 if it’s not (a search term).

void request(address) {

if (address.url == 1) {

makeHTTPRequest(address);

}

else { // url == 0

makeGoogleSearch(address);

}

}

(Obviously, the program also defines makeHTTPRequest and makeGoogleSearch functions somewhere in the code.)

The CPU executes instructions sequentially one by one, and instructions are simple commands doing exactly one thing. We can use complex expressions in a single line in a high-level programming language such as [obviously] C++, while the assembly instructions are simple commands that can do only one simple operation at one CPU cycle (remember the chewing?): move, add, subtract, XOR, and so on. The CPU fetches the instruction from the code segment of the memory, decodes it to find out what it should exactly do (move data, add numbers, subtract them, etc.), and executes the command. In order to run at its fastest, the CPU stores the operands and the result of the execution in registers (think of registers as temporary variables of the CPU). Registers are physical memory units that are located within the CPU so the access is much faster compared to the RAM. To access the registers from an assembly language program, we use their specified names such as rax , rbx , rdx , etc. The CPU commands operate on registers rather than the RAM cells, that’s why the CPU has to copy the contents of the variable from the memory to registers, execute operations and store the results in a register and then copy the value of the register back to the memory cell.

For example the high-level expression:

a = b + 2 * c - 1;

that takes just a single line of code will have the following assembly representation (a blah-code, again). Comments follow after semicolons.

mov rax, b; copy the contents of "b" located in the memory to the register rax

mov rbx, c; the same for the "c" to be able to calculate 2 * c

mul rbx, 2; multiply the value of the rbx register with immediate value 2 (2 * c)

add rax, rbx; add rax (b) with rbx (2*c) and store back in the rax

sub rax, 1; subtract 1 from rax

mov a, rax; copy the contents of rax to the "a" located in the memory

A conditional statement suggests that a portion of the code should be “skipped”, for example, calling request(“world better place how”); means the if block will be omitted. To express this in the assembly language, the idea of jumps is used. We compare two values and based on the result we jump to a specified portion of the code. We label the portion to make it possible to “find” the set of instructions. For example, to skip adding 99 to the register rbx , we can “jump” to the portion labeled “MEH” using the unconditional jump instruction jpm .

mov rax, 2

mov rbx, 0

jmp MEH

add rbx, 99; will be skipped

MEH:

add rax, 1

...

The jmp instruction performs an unconditional jump, i. e. starts the execution of the first instruction at a specified label without any condition check. The good news is that the CPU provides conditional jumps as well.

The body of the request() function will translate into the following assembly code (simplified), where the je is interpreted as “jump if equal to” and jne as “jump if is not equal to” (based on the results of the comparison using the cmp instruction):

mov rax, address.url; copy the "address.url" into the rax register

cmp rax, 1;

je IS_URL; jump if url is 1

jne IS_SEARCH_TERM; jump if url is 0

IS_URL:

call makeHTTPRequest

IS_SEARCH_TERM:

call makeGoogleSearch

The browser executable file consists of thousands of lines of similar codes but in the form of zeroes and ones, where each combination specifies a command or data (10001 means add, 01101 means move, etc.).

Typing the address and pressing the “Enter”!

The OS handles user events such as mouse click or keyboard key press and passes them to the running application. When you click on the address bar of the browser, type the address of the website and hit the “enter” button, all of that events are passed to the browser by the OS. The OS in its turn gets them from the CPU via interrupts. An interrupt is the “hey” of the digital world. Imagine you take a walk and think around the pitch of your startup the aim of which is to convince the investors that all you want to do is to make the world a better place (instead of making a sh*tload of money and the opportunity to sign on your fans’ boobs (which are hot chics, obviously)); and someone interrupts you with a “hey!” , you take a pause from your obviously fantastic thoughts and turn to that someone with a “why C++ is better than C#” face and that someone asks you how to get to the nearest station blah blah. You answer them and get back to your obviously fantastic thoughts. The “hey!” interrupted you from your “default” routine the same way the CPU is interrupted when the user clicks or types something. These events are then passed to the browser which properly reacts to them by rendering letters on the address bar while you are typing, for example.

An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention (Wikipedia).

After typing the address, for example, “facebook.com”, and hitting the “enter” by signaling the browser to start loading the contents of the website, the browser starts the parsing of the address. Most of the browsers allow you to search the term rather than enter the address completely, for example, in Chrome (I use Chrome), typing “facebook” without a “.com” performs a Google search and results in a list of websites that match the query the most, and obviously the first result would be the link to facebook.com . To achieve this and also to properly request the website, the browser parses the contents of the address bar to find out what exactly is it. Is it an “asdf”, or is it an IP address (138.201.20.123) or is it a fully typed address of the website (“http://facebook.com”)? In any situation, the browser “must” perform a network request to the target server to retrieve the contents of the website to be rendered.

Name lookup

Let’s suppose that you’ve typed “http://facebook.com” and now you are waiting for the browser’s response. The browser has to somehow find the one and only server that contains the contents of the website and ask that server to send the contents over. It’s the same as if you look for the guy named Valod that lives on the planet Earth and has a tattoo of a mermaid on his left shoulder.

This might be our Valod. (image source: TNW)

To do so, the browser has to lookup the actual IP address of the server that is mapped to the name “facebook.com”. The browser first performs a DNS lookup. DNS stands for Domain Name Server, a server that holds the names mapped to the IP addresses. The lookup starts from your local internet provider to so-called root internet servers to get the name servers first for the “.com” top-level domain and then for the “facebook.com”.

The browser caches the response to access the same website faster by skipping the DNS lookup phase. It usually takes microseconds to retrieve the IP address of the website, the browser then makes an HTTP request to the server. This is where the hard stuff begins.

Network requests

Both to make the DNS lookup and to make an HTTP request to the server, the browser has to do it over the internet. To be able to do so, it uses a concept called sockets. Sockets are abstractions provided by the operating system and allowing to access a remote computer on the other side of the planet.

Sockets are a way of accessing the other world, in order to send to or receive data from the other world, a stable connection should be established between the worlds using sockets. Sockets are files that are treated differently by the OS. The program tells its intention to access the internet by telling the OS to create a socket for it, a special file the program could write data that would be transferred via the network (could be a local network as well). Whenever the program writes something to the socket, the OS transfers it to the specified endpoint (another socket in another computer). So the browser creates a socket (asks the OS to create one) after the user types the address of the website and fills the socket with data regarding the request. It doesn’t matter if the request is a DNS lookup or is an HTTP request to the Facebook’s server, it happens using a socket. The type of the socket the browser creates is a client socket because it should be used to “ask” for something. The one that serves the results (the Facebook server) listens for incoming network requests and creates its own “server” socket to handle requests and serve data.

How would the OS cope with several programs that need network access? For example, suppose you are opening the facebook.com in your browser and chat with your friend at the same time using the Skype desktop application. Both applications require a network connection and both ask the OS to create sockets for them. Eventually, the OS creates two sockets for two different applications. To know the difference between these sockets, it uses port numbers. Each socket has its unique port number which cannot be used by any other application. For example, the socket created for Skype uses the port 5678, and the socket created for the browser uses the port 8765. This way the OS can distinguish the data sent by the browser from the data sent by Skype and also can correctly pass the response to the proper application. The port number is specified by the program developer and should be chosen wisely.

Why you can’t access the internet from your pocket calculator

It doesn’t have a network adapter, a device that makes it possible for bits to flow from a computer to another computer by wires (or radio signals). This is the lowest level of network communication, called the physical layer. Data is sent via the wires as a sequence of bits which are then collected into something meaningful by the receiver (the beauty of the bits). The form of communication is known as a protocol. The protocol defines the form of the data, the exact location of data and headers describing the data in the packet. The packet is the unit of information passed through the network.

To transfer the packet, the transferrer should mark it somehow for the receiver the same way the mail is marked with stamps, the delivery address, and the sender address. The protocols exist for that purpose. Each layer of the network model adds level-specific metadata to the packet. The packets formed by the browser are HTTP (HyperText Transfer Protocol) documents sent over TCP (Transmission Control Protocol).

An HTTP document consists of two parts, the header, and the body. The header of a document contains the metadata related to the request and the document body. For example, it may contain the details of the browser that made the request, the size of the data located in the body of the document, and so on.

Google Chrome debug tools

When you request the contents of facebook.com, the request-document doesn’t contain a body, it only contains a header describing what does the browser need from the server. The server responds with a document that contains the full contents of the requested web page in the body of the HTTP-document.

So the following happens during a network communication:

the browser creates a socket (by specifying the port number, the transmission protocol);

the browser creates an HTTP request document;

the browser writes the HTTP-document into the socket and commands the OS to make the request;

the OS transforms the data into TCP packets and passes to the network adapter;

the network adapter sends the bits of packets to the network;

the packet(s) are being received from the server’s adapter, which then passes it to the higher levels of the network model;

the OS of the server fetches the data from the received packet and passes to the web server;

the web server parses the contents as an HTTP-document and creates the response HTTP-document;

the web server sends the response the same way the client sent it;

the client receives the response and renders the contents as a web page.

Serving the request

A single computer that runs a special software called a web server might be considered as a server. The web server creates a server socket and listens for incoming connections on port 80 (the default HTTP port). You can create your own web server by designing an application that creates a socket, listens for the port 80, and accepts incoming connections and serves data following the rules of the HTTP protocol.

Facebook (and lookalikes) consists of thousands of servers in a complex architecture to serve millions of users as fast as it might be achieved.

The request comes to one of the front servers, which decide which processing server should serve the request that has the minimum load and is at the nearest location [geographically] to the user.

Processing the request most of the time requires accessing databases. Let’s suppose you’ve opened the facebook.com in your browser and didn’t log out the last time you visit it. So the browser stored the information regarding your last authorization in its “memory” and now requests the facebook.com by passing that information (usually a request key in a form “kjhlkjhjhl1234324WEASDASF34534FDFGDfhhf877”) along with the HTTP-request (in the HTTP header). The Facebook servers authorize the request and fetch the user data that is mapped to the request key specified in the HTTP request.

Authorization involves accessing in-memory databases in order to efficiently retrieve user information.

Finally, having the user information (authorizing the request at the same time), now the news feed (the wall, Mr. Snow) can be fetched from the database. To make this retrieval faster, a database index is used which usually represents a B-tree. The final content of the wall and other user-related data (didn’t fit in this simplified version of the story) is then sent as an HTTP-response document.

Rendering the server response

When the browser receives the HTTP-response from the server, it starts by looking at the header of the document. The header specifies the type of the content, whether it’s an HTML text, or a binary data to download, or an image to render on the screen, and so on. Usually, the response contains the HTML text of the web page. The browser discovers this by examining the HTTP-document for the Content-Type header.

Content-Type: text/html

Now the browser has to parse the HTML and show the buttons, input boxes, tables, links, images, etc. on the page. The browser constructs so-called DOM to conveniently manage and render the web page. DOM (Document Object Model) defines the model for the document that can be pictured as a tree consisting of HTML elements.

(image source: Wikipedia)

The image above illustrates the DOM constructed from the following HTML code:

<html>

<head>

<title>My title</title>

</head>

<body>

<h1>A heading</h1>

<a href="">Link text</a>

</body>

</html>

The HTML is the declaration of how the content should be rendered by the browser. Besides HTML, the browser deals with CSS and JavaScript as well. The CSS specifies the styling of the elements.

JavaScript

JavaScript adds some dynamicity to elements. The browser represents a mini version of the OS that runs a JavaScript program inside of it. The OS doesn’t actually know about the JavaScript and even about the browser’s capability to run a “program”. It’s like renting an apartment from someone and renting a room out to someone else without the knowledge of the landlord.

For example, Facebook’s main page won’t make a login request unless you type the login and the password in the input fields. It must have a JS-code that checks if the fields are not empty before enabling the “Submit” button. Take a look at the following JS blah-code (a blah-code can be ill-formed, never mind)

function checkFields() {

let login = document.getElementById("login");

let pass = document.getElementById("password");

if (login.value != "" && pass.value != "") {

document.getElementById("submit_button").value = 1;

} else {

document.getElementById("submit_button").value = 0;

}

}

The browser takes care of the code. It creates an environment for the JS to be executed and provides access to the elements rendered from the HTML text fetched from the server (the JS code is fetched from the server as well). To achieve the finest result, the browser contains a fully functional JavaScript virtual machine, such as Google’s V8. It runs the JS by interpreting it (not compiling), which means that instructions are being executed one by one by the virtual machine (V8). To execute each command, it should be translated into machine code, because the final executor of the command must be the CPU. So the translation is done by the virtual machine, and the browser runs the translated code as its own. Similar to our apartment renting example, it’s like asking the landlord for a spare key and giving it to our guest who rents the room from us.