What are Indexes?

Overall, databases are for storing data, queries for fetching data and indexes are a way of bridging the gap and speeding up the queries to return the data faster.

Under the hood, indexes help with this by organising the data in ways you define.

A good example of this would be your fridge.

If we have 10 items in our fridge, they're probably put in somewhat randomly or to a specific order based on how you would like it.

Now, if I was to look for something in your fridge, I would have to look at each item randomly, our brains all do this slightly differently, but the general purpose works here.

So, now I want to cook you a meal, using the item with the shortest shelf life. As I wasn't the person who purchased, or put away the groceries I have no way of knowing what item that will be without sorting through them all and checking them one at a time.

Now, wouldn't it be a lot fast and easier if there was a list on the front of the fridge, that said the item and then when it expires in chronological order?

(Yes, I am aware I failed to post this until after the theoretical gammon is out of date :P)

From here, I can then quickly see, “Oh I should have the gammon today as its got the shortest life span.”

In short, this is what an index is.

A lot of you are now thinking “Damn, if I have to make a list each time I go shopping ordered by the items expiry date it is going to take forever.” This is precisely why we don’t have a list of whats in the fridge on the fridge.

So, manually creating a list would probably take 5 minutes which might not be worth it. Since you’re fridge is a limited space, searching through all of the items doesn’t generally take too long.

Now, imagine your fridge is a massive warehouse, or a library and you need to know where every book is kept. This is going to take a massive amount of time to write a list for. Luckily for us, computers are super fast at this sort of thing and for them its generally faster to add an index / ordered list than manually searching.

To summarise my terrible example:

Indexes are similar to ordered lists to make searching through large data sets easier and faster.

Indexes, like maintaining an ordered list in real life costs more time to keep updated (Adding items, removing items). However, it makes searching considerable faster.

Due to the above, on small data sets it might not always be necessary to create indexes. Aside from the primary key.

How does MySQL actually do this?

Generally most tables don’t store results in the order we want them. Data is commonly stored as it arrives at the database and the most common example of sorting you will see is alphabetical.

As such, we will look at two queries

Query 1 (On average this takes 17 seconds):

Query 2 (On average this takes 57 seconds):

Now, this query returns the same amount of results, but takes 57 seconds on average, this is over 3 x longer.

Why is this?

I’m not sure if you have ever wondered about what happens under the hood when you save a record or query a database. But I have spent many sleepless nights wondering what happens under the hood of SQL Servers.

So, the first query we don’t have an order, as such we are returning the database in the order it was inserted into the database. This is easy as you take from top to bottom.

Now, if i want to order by the ‘tablename’ I need to sort this.

Firstly, there is 500,000 results. As such I cant put all the data in memory and sort them. So I need to take first 50,000 results for instance, sort them and store a reference somewhere. Taking my first 50,000 records I create a temp file on disk of order number, and then a reference to the row / data. I then repeat this until i have gone through my entire result set. After this I then merge the first 2 files on disk, this gives me 1 larger file in the correct order. I then repeat until I am done and have the full result set sorted. Now I've got the result set in the correct order I can then return it back to the query itself.

Now, if I add an index on ‘tablename’ to our database table and run the following query again:

My results now return in 17 seconds.

As you can see this dramatically speeds up the query.

Hopefully, this gives you a little bit of insight on how indexes and ordering work.