Last month, I had a programming interview. It hadn't gone as I would've liked, but I did get asked a question that I found interesting. The question is deceptively simple, but has a lot of depth to it. Since I failed to solve the problem correctly in the interview, I decided explore the ways in which I could optimize my initially O(n2) solution.

After a few attempts at solving the problem from different angles, I've come to appreciate the importance of understanding complexity, as well as its limitations.

This is wrong on so many levels, I don't even know where to begin.

The naive solution to this will give you an O(n 2 ) algorithm. Be warned, the following code may burn your eyes.

So, how can we turn this problem form an O(n2) solution into an O(n) solution? Using hash-maps correctly. Unfortunately, I did not come up with the brilliant idea of using a hash-map, but rather my interviewer told me that the way to get O(n) was to use a hash-map.

Hashing The Right Way In general, whenever you hear the word "hash" you think md5 or SHA. But in reality, a hash is a way to map data in a uniform way. Think of it like this, if you have a the word pool and loop, in the eyes of the anagram solver, they are the same. Why? Because both words use the same characters. In other words, there had to be a uniform way to converting these two words into the same thing. If we were to simply sort the characters in the word, we'd get exactly what we're looking for. Here's a demonstration: >>> sorted ( "loop" ) [ 'l' , 'o' , 'o' , 'p' ] >>> sorted ( "pool" ) [ 'l' , 'o' , 'o' , 'p' ] >>> "" . join ( sorted ( "loop" )) 'loop' >>> "" . join ( sorted ( "pool" )) 'loop' With that, I had my hashing function and with it, I had my linear solution. def hasher ( w ): # 1 return "" . join ( sorted ( w )) def anagram_finder ( word_list ): hash_dict = {} # 2 for word in word_list : h = hasher ( word ) # 3 if h not in hash_dict : # 4 hash_dict [ h ] = [] # 5 hash_dict [ h ] . append ( word ) # 6 return [ anagram for l in hash_dict . values () for anagram in l if len ( l ) > 1 ] # 7 if __name__ == '__main__' : print ( anagram_finder ([ "pool" , "loco" , "cool" , "stain" , "satin" , "pretty" , "nice" , "loop" ])) In 1 , we create the hasher function. The hash is simple, it sorts the string alphabetically using sorted , which returns a list, which we then use as an iterable for "".join to create a string. We do this because python lists are not hashable (because they are mutable). In 2 , inside the anagram_finder function, we create a hash_dict , a dictionary for all our hashes. It must be pointed out that the dictionary, when adding new keys, will hash those keys as well. The worst case for hasher is O(n) where n is the length of the word in question, so no issue with the size of the list we're given. In 3 we actually call the hasher to hash the string. In 4 , we check to see if this hash exists in the keys of hash_dict . If not, then we create a new list so that we can append words to it in 5 . In the end, we always append the word to the list of that key in 6 . This means, that every key will always have at least one value stored in its list, and these values are the ones we don't one. The simplified version of 7 is as follows: _ret = [] for l in hash_dict . values (): if len ( l ) > 1 : _ret += l The Pythonic Version The above is great for explanation, but the pythonic version is much smaller: from collections import defaultdict def hasher ( w ): return "" . join ( sorted ( w )) def anagram_finder ( word_list ): hash_dict = defaultdict ( list ) for word in word_list : hash_dict [ hasher ( word )] . append ( word ) # 1 return [ anagram for l in hash_dict . values () for anagram in l if len ( l ) > 1 ] if __name__ == '__main__' : print ( anagram_finder ([ "pool" , "loco" , "cool" , "stain" , "satin" , "pretty" , "nice" , "loop" ])) We've made the code significantly smaller by using a defaultdict in 1 . defaultdict allows us to give it a factory, in this case list , that will automatically create a list with a key if a key does not exist. If it does exist, then it will return that list, and we can append to it. But wait, we forgot about the ordering.

Ordering Done Right We have a quick fix for the ordering, and that is simply to loop through all the words in the initial list and include only those that is in the list of anagrams. The solution is still O(n), but remains highly inefficient. One thought might be to use the collections.OrderedDict class. But although that might seem to work, the ordering will still not match the original in the case where anagrams are not next to each other. For example, the following piece of code will return: from collections import OrderedDict def hasher ( w ): return "" . join ( sorted ( w )) def anagram_finder ( word_list ): hash_dict = OrderedDict () for word in word_list : hash_dict . setdefault ( hasher ( word ), []) . append ( word ) return [ anagram for l in hash_dict . values () for anagram in l if len ( l ) > 1 ] if __name__ == '__main__' : print ( anagram_finder ([ "nala" , "pool" , "loco" , "cool" , "stain" , "satin" , "pretty" , "nice" , "loop" , "laan" ])) [ 'nala' , 'laan' , 'pool' , 'loop' , 'loco' , 'cool' , 'stain' , 'satin' ] nala and laan should not be next to each other. This is because collections.OrderedDict remembers the order in which the keys were added. So, in the end, I stuck to the following: from collections import defaultdict def hasher ( w ): return "" . join ( sorted ( w )) def anagram_finder ( word_list ): hash_dict = defaultdict ( list ) for word in word_list : hash_dict [ hasher ( word )] . append ( word ) return [ word for word in word_list if len ( hash_dict [ hasher ( word )]) > 1 ] if __name__ == '__main__' : print ( anagram_finder ([ "nala" , "pool" , "loco" , "cool" , "stain" , "satin" , "pretty" , "nice" , "loop" , "laan" ])) With that, we solved the ordering problem and still managed to make it linear.