In the Python courses I teach, I always talk about generators and decorators. The students, who often come from (non-applied) data science or data analysis, have trouble understanding why these two concepts are important in their daily work.

I talk, of course, about how generators will free your mind, freeing you from presumptions about use and return type, i.e.:

If your goal is to return a long-ish sequence, with generators you can decide how much of that sequence is produced (presumption about use);

The caller can decide whether it wants to store the results in a list, set, or tuple (presumption about return type).

The examples I showed, however, were always geared towards math-y computations and would never really grab the students' attention...and inevitably the students would resort to Numpy to make a faster version. Bad teacher!

Decorators suffer a bit from the same fate among data scientists: they know how to use them (hopefully) by virtue of libraries such as Flask, but they struggle to see their utility in simplifying their own code.

Last week, however, I came across a problem that could be elegantly solved using both.

API, the boon and bane of the modern data scientist

Whenever I talk about applied data science, I cite APIs as a new data source (or sink) that is normally not handled by data scientists/data analysts. The reason they're so different is that they introduce a new paradigm, and with this paradigm a new way of thinking.

This API I was dealing with is no exception. To gain access to the API, you need to use a key/secret but not directly: using the secret, you can get a token from a token service. The token lasts for 30 minutes.

If you could always assume that the token is valid for 30 minutes, it would be easy to cache requests to the token service. However if somebody else made a request to the token service using your key/secret (it could be another user within your organization if the key/secret is shared, or it could be a different process of your program), then the time the token is valid becomes something between 0 and 30 minutes.

Luckily this particular token service let us know the validity, so we could store this validity somewhere.

Storing this state somewhere can be done in two ways: either by instantiating a class for this particular purpose and by letting the class do the work, or by using a — drum roll — generator!

The particular piece of code is the following:

import requests as r Authenticator = Iterator[Dict[str, str]] def get_authenticator(server: str=None, key: str=None, secret: str=None) -> Authenticator: endpoint = server + "/auth" header_auth = {"key": key} # 1 body_auth = {"secret": secret} # 1 expire = 0 # 2 while True: # 3 if expire < time.time(): # 4 response_auth = r.post(endpoint, json=body_auth, headers=header_auth).json() access_token = response_auth.get('accessToken') # 1 expire = response_auth.get('expiration') yield {**header_auth, 'authorization': f"Bearer {access_token}"} # 5, # 1 authenticator = get_authenticator(server, key, secret) auth_header = next(authenticator) # 3 # do stuff new_auth_header = next(authenticator) # 6

The code needs some explanation:

This is specific to the token service I'm interacting with; Since when this generator starts for the first time, it needs to make the request, the if in point # 4 has to pass. expire = 0 is a convenient way to accomplish this; I'm deciding that people will be able to get new tokens indefinitely using my generator. It makes therefore sense to let it run indefinitely; In principle I could insert some wiggle time here, for example expire < time.time() + 5 but I i) don't like magic numbers and ii) things can still go wrong and it might be that 5 (or whatever magic number you choose) is not enough. More on this later; In this case yield will, well, yield the authorization header, hopefully always valid. Beautiful new Python 3.6 syntax by the way, to create new dictionaries. Note that I'm using the typing module here, so some things might seem magical if you're not used to it. new_auth_header could be identical to auth_header . As the caller, I don't need nor want to care.

This is nice. I've solved with a lightweight generator what otherwise would have been solved with a ponderous class.

So, how would I use my generator?

Well, the first step would be to write a function like this:

def get_endpoints(server: str, auth: Authenticator): "Get the available API endpoints" endpoint = server + "/apis" response = r.get(endpoint, headers=next(auth)).json() return response.get('apis')

This seems like a nice function. It also assumes a perfect world (data scientists like myself often do that) where all network requests are instantaneous and where errors never occur.

One of the things that could go wrong is the network request to the apis endpoint. For instance, it could take too much time; in that case the token might expire or the whole request might fail. Let's handle the first case here and defer the second to a later moment. A way to do so is the following:

def get_endpoints(server: str, auth: Authenticator, num_retries: int=3): "Get the available API endpoints" endpoint = server + "/apis" for _ in range(num_retries): response = r.get(endpoint, headers=next(auth)).json() return response.get('apis')

Ok, this is already nice. But what if our authenticator is using the wrong key? We will never get the proper response and won't know what we're up against. The particular endpoint I'm using has a way of telling us this, in response['error'] . We can then change the code in the following way:

def get_endpoints(server: str, auth: Authenticator, num_retries: int=3): "Get the available API endpoints" endpoint = server + "/apis" for _ in range(num_retries): response = r.get(endpoint, headers=next(auth)).json() error = response.get('error') if not error: return response.get('apis') # this code will only execute if we always get an 'error' after retrying num_retries times code = error.get('code') message = error.get('message') raise AuthenticationFailed(f"http code {code} with message {message}")

Aside: what's the story with AuthenticationFailed?

The AuthenticationFailed class is not something that exists in pure Python. It's a custom class I created to segregate my errors in the try/except statements. I always encourage students to create exceptions specific to their module/package. These custom exceptions allow a much finer level of control over the exception handling flow. It is in principle a good a idea to create a root exception for your code (inheriting from Exception ) and then have all your specific exceptions inherit from that one. In my case I have a single exception that is possible and therefore it directly inherits from Exception .

That said, the definition of AuthenticationFailed is really simple:

class AuthenticationFailed(Exception): "Class to indicated that an authentication request failed" def __init__(self, message, **kwargs): self.message = message super().__init__(**kwargs)

Incorporating more exception handling in our flow

But a failed authentication is not the only thing that could go wrong. A request using the requests library can raise many different exceptions, including timeouts and DNS issues. We want to somehow distinguish between the two. The main reason is that if the network is down there's not much the user can do about it, while if authentication fails it means that you are probably using wrong parameters somewhere.

The code to do so is:

def get_endpoints(server: str, auth: Authenticator, num_retries: int=3): "Get the available API endpoints" endpoint = server + "/apis" exception = error = None for _ in range(num_retries): auth = next(authenticator) try: response = r.get(endpoint, headers=next(auth)).json() error = response.get('error') if not error: return response.get('api') except r.exceptions.RequestException as e: exception = e if not error: raise exception code = error.get('code') message = error.get('message') raise AuthenticationFailed(f"http code {code} with message {message}")

Much better! The code is very explicit about what goes wrong.

But wait a second! This code is now pretty complex for very limited functionality. All we wanted to do, at the end of the day, was to get r.get(endpoint, headers=next(auth)).json().get('api') !

If we will need different functions for calling different APIs, maybe POSTing or PUTting in each case, we will have tons of duplicate code.

Decorators to the rescue

This is where decorators come in handy. We can encapsulate much logic in an "helper" function, and decorate our API call. The decorator function could look like this:

from functools import wraps def retry(f): """ Decorator to retry functions that fails because of authentication/network issues """ @wraps(f) # 1 def wrap(*args, **kwargs): """Decorator to retry network calls""" n_times = 3 authenticator = get_authenticator(SERVER, KEY, SECRET) # three globals exception = error = None # 2 for _ in range(n_times): auth = next(authenticator) # 3 try: ret, error = f(*args, **kwargs, auth=auth) # 4, #5 if not error: return ret except r.exceptions.RequestException as e: exception = e if not error: # * raise exception code = error.get('code') message = error.get('message') raise AuthenticationFailed(f"http code {code} with message {message}") return wrap

There's a lot to digest here:

Here wraps ensures that the decorated function will keep its own docstring, instead of inheriting wrap 's docstring; We initialize any error and exception to None so that if an exception in raised in the for-loop, then the line marked with * will not fail; Here we defer generating a new authenticator to the decorator; the reason is that the API call shouldn't really need to care about these details; And here we augment the function call with auth . In other words, the function we will decorate does accept auth as parameter, but if we decorate it using retry , we will never have to pass it (see below for how to call it); You can see that the return value(s) of the decorated function are slightly different from the previous version; we will have to update get_endpoints to reflect this change.

As just said in point # 5, how do we change our get_endpoints function to make it work with the decorator? The answer is simple:

@retry def get_endpoints(server: str, *, auth: Dict[str, str]): "Get the available API endpoints" endpoint = server + "/apis" response = r.get(endpoint, headers=auth).json() return response.get('apis'), response.get('error')

This is much better. We now always return the error (which might be None), and we defer to the decorator the task to extract the apis response. Nice!

Are we happy about our code though? Maybe. I'm bothered that each function gets its own authenticator , defying the fact that we want a generic authenticator. Second, the number of retries is the same for each function. We would like more flexibility. Decorators can be changed into second-order decorators, allowing them to accept parameter. The syntax can be a bit scary when you start, but here's how it looks:

from functools import wraps def retry(n_times: int, authenticator: Authenticator): """ Decorator to retry functions that fails because of authentication/network issues """ def fun_wrapper(f): "Wrapper" @wraps(f) # 1 def wrap(*args, **kwargs): """Decorator to retry network calls""" exception = error = None # 2 for _ in range(n_times): auth = next(authenticator) # 3 try: ret, error = f(*args, **kwargs, auth=auth) # 4, #5 if not error: return ret except r.exceptions.RequestException as e: exception = e if not error: # * raise exception code = error.get('code') message = error.get('message') raise AuthenticationFailed(f"http code {code} with message {message}") return wrap return fun_wrapper

Now we can write get_endpoints as:

authenticator = get_authenticator(SERVER, KEY, SECRET) [...] @retry(3, authenticator) def get_endpoints(server: str, *, auth: Dict[str, str]): "Get the available API endpoints" endpoint = server + "/apis" response = r.get(endpoint, headers=auth).json() return response.get('apis'), response.get('error')

At this point we could go a bit further, and rewrite our function like this:

@retry(3, authenticator) def get(endpoint: str, *, response_part: str=None, auth: Dict[str, str]): "Get the available API endpoints" response = r.get(endpoint, headers=auth).json() if response_part: return response.get(response_part), response.get('error') else: return response, response.get('error') get_endpoints = lambda endpoint: get(endpoint, response_part='api')

This makes it easier to have a generic get function but beware: I'm assuming every endpoint will place the error in response.get('error') . That might not always be the case!

Ok, this kind-of concludes the long rant. I hope you got an idea of how generators and decorators can be useful beyond classic textbook examples!

If I am allowed, here're my last remarks:

Real code should have a kind of backoff built-in. This can be done in the generator. I didn't include it to avoid distracting you from generators and decorators. But you just need to call the backoff(_, t) function in the for-loop, which can be defined as:

import time def backoff(n: int, t: float): time.sleep(t * 2 ** (n - 1))

The initial POST to get the authenticator should use the Retry class from urllib3. Again, for simplicity this has been avoided. But see here to get an idea on how to implement it .

class from urllib3. Again, for simplicity this has been avoided. But see here to get an idea on how to implement it . Since my token service doesn't renew the tokens before they expire, there's so reason to call it in the background before expiration. This is the whole reason I need the for-loop in the decorator and I don't just use the Retry object here: the token service needs to be called after expiration to get a new token.

I know you know it, but just in case you don't: we're hiring. So if you love to do data science and write good software, ping us!

Improve your Python skills, learn from the experts!

At GoDataDriven we offer a host of Python courses from beginner to expert, taught by the very best professionals in the field. Join us and level up your Python game: