JavaScript code is loading...

Contents

Introduction

Hi! This is an explorable explanation of Python dictionaries. This page is dynamic and interactive — you can plug in your data and see how the algorithms work on it. (Once the javascript loads) To start with, let's say we have a simple list of distinct integers (change it if you want - the page will update):

Instant updates Update Undo 0 Redo 0

Python lists are actually arrays — contiguous chunks of memory. The name "list" may be misleading to people who know about double-linked lists but are unfamiliar with Python. You can picture a Python list as a row of slots, where each slot can hold a Python object: 1 56 50 2 44 25 17 4 To check if an element is present in a list, we can use the in operator like this: number in simple_list , which returns either True or False . Under the hood this short snippet does a linear scan. This can be a lot of work. To see this, let's reimplement it in Python. Try another Let's say we're looking for the following number in the original list:

First step Step Play Step Last step Autoplay speed 0.5x 1x 2x 4x 8x def simple_search (simple_list, key) :

idx = 0 ~ Start from the beginning of the list

while idx < len(simple_list): ~ [Try #6] 5 < 8 , so some elements have not been processed yet, and the number may be there

if simple_list[idx] == key: ~ 25 == 25 — the wanted number is found

return True ~ so return True

idx += 1

return False

simple_list 1 56 50 2 44 25 17 4

(The visualization is interactive. The buttons allow you to step through the code. Notice here that the time slider is draggable - feel free to rewind the time or move it forward. Also, feel free to mess with the input and the original list - the visualization will update automatically) What's not so great about linear scans? If we have a million distinct numbers, in the worst case scenario, we may need to scan the whole list. But scanning over a few elements is no big deal. We need to have some order and predictability to make the search fast. We need to have some idea of where the searched element is located. A Python dict implementation is basically a scan of a list (but a pretty weird scan). We'll build the actual algorithm and data structure inside Python dictionary step by step, starting with the code above, which is intentionally verbose. Chapter 1: searching efficiently in a list A Python dict is a collection of key-value pairs. And, the most important part of it is handling keys. Keys need to be organized in such a way that efficient searching, inserting and deleting is possible. In this chapter, to keep things simple, we won't have any values, and "keys" will just be plain integers. So, the simplified problem is to check if a number is present in a list, but we have to do this fast. We'll tackle the real problem in the following chapters. But for now, bear with me. Accessing a single element by index is very fast. Accessing only a few elements would be fast too. We don't want to be doing a linear scan over the whole list every time we look up a number, so we need to organize our data in a clever way. Here's how. Let's begin by creating a new list of slots. Each slot will either hold a number from the original list or be empty (empty slots will hold None ). We'll use the number itself to compute an index of a slot. The simplest way to do this is to take the remainder of number divided by len(the_list) : number % len(the_list) and put our number in slot with this index. To check if the number is there we could compute the slot index again and see if it is empty. Would this approach work, however? Not entirely. For example, 50 will get the same slot index ( 3 ) as 2 , and it will be overwritten. Situations like these are called collisions.

First step Step Play Step Last step Autoplay speed 0.5x 1x 2x 4x 8x def build_not_quite_what_we_want (original_list) :

new_list = [ None ] * len(original_list) ~ Create a new list of 8 empty slots



for number in original_list: ~ [8/8] The number to insert is 4

idx = number % len(new_list) ~ Compute the slot index: 4 == 4 % 8

new_list[idx] = number ~ Collision of 4 with 44 in slot 4 - the number is overwritten

return new_list ~ Return created list with some numbers missing: 50 , 1 , 25 , 44

original_list new_list 1 56 50 2 44 25 17 4 56 17 25 1 2 50 4 44

To make this approach viable, we need to somehow resolve collisions. Let's do the following: if the slot is already occupied by some other number, we'll just check the slot that comes right after it. And if that slot is empty, we'll put the number there. But, what if that slot is also occupied? Once again, we'll go ahead and check the next slot. We'll keep repeating this process until we finally hit an empty slot. This process is called probing. And because we do it linearly, it is called linear probing. In code, we would write this as (idx + 1) % len(simple_list) , so it wraps around back to the beginning at the last index:

If we make the new list the same size as the original list, we'll have too many collisions. If we make it 10x larger, we'll have very few collisions, but we'll waste a lot of memory. So what size should it be? We want to hit the sweet spot where we don't use up too much memory but also don't have too many collisions. Twice the size of the original list is reasonable. Let's transform the original list using this method (when reading this code, keep in mind that original_list is a list of distinct numbers, so we don't need to handle duplicates just yet).

First step Step Play Step Last step Autoplay speed 0.5x 1x 2x 4x 8x def build_insert_all (original_list) :

new_list = [ None ] * ( 2 * len(original_list)) ~ Create a new list of 16 empty slots



for number in original_list: ~ [8/8] The number to insert is 4

idx = number % len(new_list) ~ Compute the slot index: 4 == 4 % 16

while new_list[idx] is not None : ~ After 1 collision, an empty slot (at 5 ) is found: the collision is successfully resolved

idx = (idx + 1 ) % len(new_list)

new_list[idx] = number ~ Put 4 in slot 5

return new_list ~ Return created list with all original numbers present

original_list new_list 1 56 50 2 44 25 17 4 25 44 1 50 2 17 4 56

To search for a number, we retrace all the steps necessary to insert it: we start from the slot number % len(new_list) and do linear probing. We either end up finding the number or hitting an empty slot. The latter situation means that the number is not present. Try another Let's say we want to search for

First step Step Play Step Last step Autoplay speed 0.5x 1x 2x 4x 8x def has_number (new_list, number) :

idx = number % len(new_list) ~ Compute the slot index: 2 == 2 % 16

while new_list[idx] is not None : ~ [Try #2] Slot 3 is occupied, so check it

if new_list[idx] == number: ~ The number is found: 2 == 2

return True ~ Now simply return True

idx = (idx + 1 ) % len(new_list)

return False

new_list 25 44 1 50 2 17 4 56