In computer science, there is the concept of linear data structure, which means that the datas are structured in a linear way in which order matters. There are Array and Linked Lists, but today I'll be talking mostly about arrays only, and a little about linked lists.

Most object-orientated languages come with Arrays, whereas most functional languages comes with Linked Lists (see why in another one of my article, mentioned at the bottom of the article).

There's a good reason for this differentiation which we will dive into later, but for now let's just have a quick revision about what are the differences between the two data structures, and to know this, we'll need to take a trip down memory lane.

Rewind Time

Objects and functions and everything that we know about computers, are fundamentally stored in bits and bytes in computer.

In languages like Java and C, you have to explicitly declare the size of an array beforehand.

Hold up, but Ruby does not do that?

In Ruby, we use Array for our linear data structure, we can add seemingly infinite stuffs into an Ruby array and it would not matter as far as we are concerned.



array = [] while true array << 0 puts " #{ array . length } " end

That's great isn't it?! That means arrays are infinitely large right? And that Ruby is the superior language? Lucky us!

But not so fast. *pops your bubble*

There are no infinite size array, what you see in Ruby is what we call a Dynamic Array, and it comes with a cost.

To understand what dynamic arrays are, let's first take a look at how arrays are represented in memory. Since MRI Ruby (Matz' Ruby Interpreter) compiles down to C code, we'll look at how arrays are represented in C.

C-ing is believing

We'll dive into a little bit of C code to help us C a little better.

In lower level languages like C, you have to deal with pointers and memory allocation yourself, and even if you had not dealt with C before (Disclaimer, neither have I), you may have C-een one of the most (in)famous example below:



/* Example */ ptr = ( cast - type * ) malloc ( byte - size ) /* Usage */ ptr = ( int * ) malloc ( 100 * sizeof ( int ));

taken from https://www.programiz.com/c-programming/c-dynamic-memory-allocation

Let's break down this bits of code:

malloc doesn't have any magic meaning behind it, it literally stands for memory allocation

doesn't have any magic meaning behind it, it literally stands for malloc returns a pointer

returns a pointer malloc takes in an argument, which is the size the memory you want the program to allocate for you.

takes in an argument, which is the size the memory you want the program to allocate for you. 100 * sizeof(int) tells the program that we want to store 100 integers, so allocate us 100 * the size of what each integer would occupy.

tells the program that we want to store 100 integers, so allocate us 100 * the size of what each integer would occupy. ptr /pointer stores reference to the memory address.

Timmy stores luggage!

If the above example did not really make sense, try this analogy. Think of memory allocation as like, a luggage concierge. It works like this:

Timmy goes to the counter, tells the concierge that he have 2 pieces of luggages, about this big, and that that he'd like to store it in the storeroom.

The concierge takes a look over the store room and go "Yes, we've got some room at the designated B4 area and will allocate that space to store your luggages".

area and will allocate that space to store your luggages". They hand Timmy a pick-up card with the designated area on it, B4 in our case.

with the designated area on it, in our case. Timmy is happy, goes around doing whatever, and when he wants to pick-up his luggages, he goes back to the counter and show them his pick-up card. "Have you C-een my luggages?"

In our example, Timmy's luggage is the data, the pick-up card is the pointer (it states where Timmy's bag is stored). The place that the concierge is storing Timmy's luggage is the memory block, and the counter is the program.

By showing the counter (the program) Timmy's card (pointer/memory address), Timmy can retrieve his luggage (data). Bonus? Because they know exactly where Timmy's bag is stored -- B4 , this means that they can retrieve all Timmy's luggages relatively quickly!

Also, ever wondered why you access elements in array with index, like so?



array [ 0 ] # => first element array [ 5 ] # => sixth element array [ 99 ] # => hundredth element

This is because array holds the references to the memory block, and the index is telling it the offset.

An analogy of that is if I ask you to look for Timmy in a queue of 20 people, you would logically have to ask each of them if they were Timmy -- But, if I told you Timmy is the 6th (index) from the first guy (your original pointer), you know exactly where to look.

Retrieving elements in arrays are fast exactly because of this - the program doesn't have to look through all 100 elements to find what you're looking for - if you've got the index, it just has to add the offset to the original memory address, and the droid that you were looking for will be right there!

What are dynamic arrays then?

So I've told you a little about how array is represented in memory, but now it's time to talk about some cons.

Remember how you have to explicitly declare the amount of memory you need? This means that the array will find a spot that will fit exactly your size; there's no guarantee that it will be able to fit more than what you have (because the memory block right behind it might be occupied).

Back to our luggage analogy, think of it as if Timmy was to store 2 pieces of luggages, and B4 can store exactly 2 luggages, so they allocate that to Timmy. Now for some reason Timmy wants to store another piece of luggage, but B4 can't store 3 luggages, only 2, so what do they do?

They take all of his existing luggages, move it into a new spot that can fit more than 3 luggages, and then store all of them together.

That is an expensive operation but it's exactly how memory works too!

In Ruby, you don't have to declare a specific size before hand, but that's because Ruby handles it for you automagically through dynamic array.

What dynamic array does is that if the array is nearing its full capacity, it'll automatically declare a new, bigger array and move all existing elements into it, and the old array is then garbage collected. How much bigger? The growth factor is usually 2; double the size of the current array.

In fact, don't take my word for it.

Ruby has a ObjectSpace module that allows us to interact with current living object in memory. We can use this module to take a peek at the memory usage of our dynamic array, sounds exactly like what we want!

I have written a small Ruby script that calculates the growth factor of the dynamic array. Feel free to take a look at it here, and if you do, you can see that Ruby arrays have a 1.5x growth factor (that is, they make an array that's 50% bigger on every copy)

I know what arrays are, what are linked lists?

Keep in mind that although arrays and linked lists are both considered as linear data structures, they have one big difference amongst them.

Elements in an array are stored literally right next to each other in memory (so we can have index for fast lookups), but nodes in linked lists have no such restriction (which is why there's no index lookup for linked lists) - each and every item can be stored anywhere on the memory block and it'd be fine.

It's almost as if Timmy's trying to store 5 luggages, and the concierge does not have a space and decides to leave 5 of them all over the place. Sounds unorganized?

Also if they are stored in different places, how do you know which bags are Timmy's? Hint: Just keep track of the next node/bag! In our case, the concirege keeps them separately but with a tag on each of them that points to the next bag.

A node in linked list consists of two parts - the data part and a pointer to the next node. This is how they're able to maintain the linear part of it -- they still have the concept of order, they just don't have to be stored in order literally!

node = [ data | pointer ]

For example, given the following example stored in memory:

[C | D] [A | B] [B | C] [D | nil]

This bits look like it's out of order - but if I had told you that the first element is A , you would be able to tell me the exact order of the list:

list = [A -> B -> C -> D -> nil]

There's a lot of interesting things that you can do with linked lists which I'm not going to dive into (also a lot on Big O that I didn't talk about), there are already a tonne of good articles out there on data structures. If you made it here, I suggest you to read from Ali's blogpost here.

You can also read more about List/Cons on Wiki here

Closing Note

I initially wrote this article for a slightly different topic - [ Elixir | Why Linked List?], but found it took way too long to explain how arrays work before I was able to explain explore why Elixir uses linked lists, hence I have separated them into two separate articles.

In that article I talk about why functional langauges use linked lists as their linear data structure. Do check it out!

Sources