Photo by Start Digital on Unsplash

In our daily lives as data scientists, we are constantly working with various Python data structures like lists, sets, or dictionaries or to be more general we are working with iterables and mappings. Sometimes it can become quit code intense to transform or manipulate these data structures. That can lead to unreadable code and increases the chance of introducing errors. Fortunately, there is a neat Python module called funcy that helps us to facilitate these tasks. In this short article, I show you both plain Python code and the corresponding funcy functions that allow you to achieve five different tasks more efficiently with better readable code. After having read this post, I am sure you’ll become funkier :).

Omit / Project

Sometimes you are given a dictionary, and you want to continue working with only a subset of that dictionary. For example, imaging you are building a rest API endpoint, and you only want to return a subset of your model’s attributes. To that end, funcy provides two functions namely omit and project

from funcy import project, omit data = {"this": 1, "is": 2, "the": 3, "sample": 4, "dict": 5} # FUNCY

omitted_f = omit(data, ("is", "dict"))

# PLAIN PYTHON

omitted_p = {k: data[k]

for k in set(data.keys()).difference({"is", "dict"})

} # FUNCY

projected_f = project(data, ("this", "is"))

# PLAIN PYTHON

projected_p = {k: data[k] for k in ("this", "is")}

With funcy, you not only have to type fewer characters, but you also get more readable and less error-prone code. But why do we need two functions? If the number of keys you want to keep is smaller than the number of keys you want to remove, chose project otherwise chose omit.

Flattening Nested Data Structures

Assume you have a nested data structure like a list of lists and lists and you want to flatten this out into a single list.

from funcy import lflatten data = [1, 2, [3, 4, [5, 6]], 7, [8, 9]] # FUNCY

flattened_f = lflatten(data) # PLAIN PYTHON

def flatter(in_):

for e in in_:

if isinstance(e, list):

yield from flatter(e)

else:

yield e flattend_p = [e for e in flatter(data)]

As you can see, the funcy version is just a single line, while the plain Python version looks pretty complex. It also took me some time to come up with the solution and I am still not 100% confident about it. So, I would stick to the funcy one :) Apart from the list version lflatten, funcy also offers a more generic one for iterables that is called flatten, without the l prefix. You find that for various funcy functions.

Dividing into Chunks

Assume you have an iterable of n items and you want to divide that into chunks that contain k < n elements. The last chunk can be smaller than k if n is not dividable by k. This is like having a training set of n samples that you want to split up into batches of size k to do batch processing

from funcy import lchunks

data = list(range(10100)) # FUNCY

for batch in lchunks(64, data):

# process the batch

pass # PLAIN PYTHON

from typing import Iterable, Any, List def my_chunks(batch_size:int, data:Iterable[Any])->List[Any]:

res = []

for i, e in enumerate(data):

res.append(e)

if (i + 1) % batch_size == 0:

yield res

res = []

if res:

yield res



for batch in my_chunks(64, data):

# process the batch

pass

Note that I have used the lchunks version, for list partition, and not the more general chunks version for partitioning iterables. All you have to do is pass in your wanted batch/chunk size, and the iterable you want to partition. There is another funcy function, called partition, which returns only those chunks that have exactly k items. Hence, it will leave out the last one if n is not dividable by k.

Combining Multiple Dictionaries

Assume you have multiple dictionaries that hold data from different dates but the same object. Your goal is to combine all these dictionaries into one and merge data from identical keys using a specific function. Here, the merge_with function from funcy comes in handy. You just have to pass in the merging function and all dictionaries you want to merge.

from funcy import merge_with, lcat

d1 = {1: [1, 2], 2: [4, 5, 6]} # FUNCY VERSION

merged_f = merge_with(lcat, d1,d2) # PYTHON VERSION

from itertools import chain

from typing import Callable, Iterable, Any, Dict

def _merge(func: Callable[[Iterable], Any], *dics:List[Dict])->Dict:

# Get unique keys

keys = {k for d in dics for k in d.keys()}

return {k: func((d[k] for d in dics if k in d)) for k in keys} merged_p = _merge(lambda l: list(chain(*l)), d1, d2)

There is also the function join_with, which is like merge_with but instead of passing each dictionary as a single argument, you pass in an iterable of dictionaries. Oh, and I have “accidentally” sneaked in another funcy function lcat, which combines various lists into one.

Cached Property

Last but not least, something completely different but very useful; the cached_property decorator. As the name says, it enabled you to create properties that are executed only once than cache the result of that execution and return that result in all subsequent calls. I use that often when building Dataset classes as it gives me very clean and readable interfaces while reducing loading times.

from funcy import cached_property

import pandas as pd # Funcy Version

class DatasetsF:

@cached_property

def movies(self) -> pd.Dataframe:

return pd.read_csv("the_biggest_movie_file.csv") # PYTHON VERSION

class DatasetsP:

def __init__(self):

self._movies = None @property

def movies(self) -> pd.Dataframe:

if self._movies is None:

self._movies = pd.read_csv("the_biggest_movie_file.csv")

return self._movies

Conclusion

In this post, I introduced you to funcy, showing you a very small but yet handy subset of the functionality it offers. To get a quick overview of all the functionality, check out this cheat sheet. I hope this article motivates you to learn some funcy moves. It has much more to offer than what I have shown you here. Thank you for following along and feel free to contact me for questions, comments, or suggestions.