When using a Python decorator, especially one defined in another library, they seem somewhat magical. Take for example Flask’s routing mechanism. If I put some statement like @app.route("/") above my logic, then poof, suddenly that code will be executed when I go to the root url on the server. And sure, decorators make sense when you read the many tutorials out there that describe them. But for the most part, those tutorials are just explaining what’s going on, mostly by just printing out some text, but not why you might want to use a decorator yourself.

I was of that opinion before, but recently, I realized I have the perfect use for a decorator in a project of mine. In order to get the content for Product Mentions, I have Python scrapers that go through Reddit looking for links to an Amazon product, and once I find one, I gather up the link, use the Amazon Product API to get information on the product. Once that’s in the database, I use Rails to display the items to the user.

While doing the scraping, I also wanted a web interface so I can check to see errors, check to see how long the jobs are taking, and overall to see that I haven’t missed anything. So along with the actual Python script that grabs the html and parses it, I created a table in the database for logging the scraping runs, and update that for each job. Simple, and does the job I want.

The issue I come across here, and where decorators come into play, is code reuse. After some code refactoring, I have a few different jobs, all of which have the following format: Create an object for this job, commit it to the db so I can see that it’s running in real time, try some code that depends on the job and except and log any error so we don’t crash that process, and then post the end time of the job.

def gather_comments(): scrape_log = ScrapeLog(start_time=datetime.now(), job_type="comments") session.add(scrape_log) session.commit() try: rg = RedditGatherer() rg.gather_comments() except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit() def gather_threads(): scrape_log = ScrapeLog(start_time=datetime.now(), job_type="threads") session.add(scrape_log) session.commit() try: rg = RedditGatherer() rg.gather_threads() except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit()

If you know a bit about how decorators work, you can already see how perfect an opportunity using this concept is here, because decorators allow you to extend and reuse functionality on top of functions you already use. For me, I want to log, time, and error check my scraping, and reusing the same code is not ideal. But a decorator is. Here’s how to write one.

Decorator Time

First thing to do, is write a function, that takes a function as parameter and call that function at the appropriate time. Since the work of the functions above is done with the same format, this turns out really nice.

def gather_comments(): rg = RedditGatherer() rg.gather_comments() def log_and_time(function, job_type): scrape_log = ScrapeLog(start_time=datetime.now(), job_type=job_type) session.add(scrape_log) session.commit() try: function() #running code specific to this job except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit() log_and_time(gather_comments, "comments")

Very nice, and we could stop here if we wanted to even, but let’s not, because to run the gathering functions as it is now, we’d have to remember to wrap them in that log functionality What we really want is to just define the gather_XXXX functions, and know that whenever we use them, we’ll get the logging built in.

Functions That Return a Function

Since the end goal is to just have a function like gather_comments which I can use wherever and not have to worry about the log() wrapper, let’s try something different. Here, we’re defining the function log_and_time to take as parameter a function, which in the code we run in the same place in the logging as before. But this time, we return a callable function. So in the last line here, we set gather_comments to be that function.

def gather_comments_wrapped(): rg = RedditGatherer() rg.gather_comments() def log_and_time(function): def log(): scrape_log = ScrapeLog(start_time=datetime.now(), job_type="comments") session.add(scrape_log) session.commit() try: function() #running code specific to this job except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit() return log gather_comments = log_and_time(gather_comments_wrapped)

So now if we call gather_comments() like so we get the functionality we want!

Syntax for Decorators

Now that we have the code set up, we can use the fancy decorator syntax to avoid having that extra line of the code block above. So nice and simple.

def log_and_time(function): def log(): scrape_log = ScrapeLog(start_time=datetime.now(), job_type="comments") session.add(scrape_log) session.commit() try: function() #running code specific to this job except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit() return log @log_and_time def gather_comments(): rg = RedditGatherer() rg.gather_comments()

Passing in Arguments

But wait! you say. That job_type parameter is hard coded to “comments” in the decorator function, and what if I have (like I do) a gather_threads function that searches for threads with amazon links? Luckily, we can also pass arguments into the decorator, with a little modification and another wrapper function.

def log_and_time(job_type): def log_decorator(function): def log_work(): print job_type scrape_log = ScrapeLog(start_time=datetime.now(), job_type=job_type) session.add(scrape_log) session.commit() try: log_info = function() except Exception as e: scrape_log.error = True scrape_log.error_message = e.message scrape_log.end_time = datetime.now() session.add(scrape_log) session.commit() return log_info #returning what the decorated function returns return log_work return log_decorator #returning the decorator function @log_and_time("comments") def gather_comments(): rg = RedditGatherer() rg.gather_comments() @log_and_time("threads") def gather_threads(): rg = RedditGatherer() rg.gather_threads()

Now think about what’s going on here first for a second, and you can see why this makes sense. Python is expecting whatever comes after that @ to be a function that takes a function as a parameter, and in the cases above it was. Now here, because of the parentheses, we’re calling the outermost function with an argument. We need to return a function that takes a function as argument from that function, and the modified version here does that.

Running as Background Job

Like I wrote in my last post, I’m running all this scraping as background jobs. But as of now, this code will fail. Why? When the job enqueuer enqueues the job, it’s not sending over all the code itself, it’s just going to send over the name of the function that the worker should run — specifically the __name__. Which is fine for most things, except that functions with decorators have their __name__’s changed. So the first time I run this with the worker set up, I get an error saying “AttributeError: ‘module’ object has no attribute ‘log_work'”. This is because Python is actually executing the log_work function whenever we call gather_XXXX from now on. Not ideal. Luckily, there is a workaround, which involves another decorator.

from functools import wraps def tags (tag_name) : d ef tags_decorator (func) : @wraps(func) def log_work(): #same code as above return log_info #returning what the decorated function returns return log_work return log_decorator #returning the decorator function @log_and_time("thread") def run_gather_threads(): rg = RedditGatherer() rg.gather_threads()

And now when I run this with a background worker, the code knows to look for run_gather_comments and not log_work , but the functionality is still there.

If you’re all the way here, follow on twitter, and also, like I linked above, poke around on Product Mentions for a little, and shoot me an email or on twitter about what you think!