Data science is only useful when it is actionable. If no one reads an analyst’s report or a recommendation engine sits untouched on an engineer’s laptop, all that was achieved was a high-cost training exercise. At a startup lacking the defined specialists teams and/or established processes, I’ve found that the responsibility to realize impact from data science lies first with myself, the data scientist. So in that light, I have been eager to see how AWS’s Lambda could be used to bring data projects into the real world.

Motivation. Or: keep going till you’re no longer blocked

A couple months ago I came across a fantastic tutorial on using AWS Lambda to create a callable model. Seriously, it’s worth a read.

But…I couldn’t get it to work. Sure, I could get the ‘hello, world’ version. But I couldn’t deploy the model as an API. Something was wrong with my environment.

Error message #291 I came across

Every change required zipping a folder, uploading to s3, and repointing the lambda to pick it up. It would work fine locally, in new environments, on a brand-new ec2, hell I even bit the bullet and learned docker to get a perfect environment, but every zip -> s3 upload -> rebuild Lambda resulted in an error, rarely the same. I was stuck in a frustrating loop that required spending more time clicking through multiple services than actually debugging.

Just when I had given up, I heard about AWS Cloud 9. A cloud-based IDE with more features than I’m aware of, my main draw was it could be used to build and test local versions of lambda.

Basic tutorial. Or: “hello, world”

Let’s start with a simple example.

Here’s what we’ll be doing

Create a Cloud9 environment Create a Lambda function Test the Lambda function in Cloud9 Deploy the Lambda function Set up API Gateway to call Lambda

Let’s get to it

Creating an environment from cloud9 console

Navigate to “Cloud9” in console, and click “Create environment”. Create an environment named “anotherLambdaTest”, leave all the settings on default.

Hello, cloud9

On the right tab, go “AWS Resource” > “Local Functions” > ƛ⁺(“Create a new Lambda function”). Let’s call the function “simpleTutorial” in the application “anotherLambdaTest”. And set it up as empty-python, with no trigger.

This should pop up lambda_function.py in the IDE with some barebones function. At this point, we just copy over the simple tutorial script:

def lambda_handler(event, context):

print(event)

result = 'Hello from ' + event['queryStringParameters']['msg']

return { "body": result }

Save it! There’s no autosave!

Ok, so now let’s test it. No need to zip and upload and deploy. Just…click test.

Wow that was painless

A payload window pops up to create a custom event. It runs from the IDE, and you get results and any print messages.

Ready to make this an API? Click “Deploy the selected Lambda function”.

et voila!, your lambda function exists.

Head over to the console for Lambda, select your newly deployed lambda (it’ll be prefixed “cloud9…” and have last been modified <1 minute ago). From here, it’s pretty much the same: Add an API Gateway trigger and configure it (create new API, make it open, and save the lambda). Then throw a test event against it via the API Gateway test

Naming things has never been AWS’s strong suit

Finally, hit the actual api by first deploying (“Actions” > “Deploy” > “default”) to generate your API url. You can copy that url, append your Lambda’s name and append the url params. In my case https://atkgc42ggg.execute-api.us-west-2.amazonaws.com/default/cloud9-anotherLambdaTest-simpleTutorial-X4VIU30USH5C?msg=TheWeb

Deploying a model. Or: adding data “science”

Here are the steps (this should look familiar)

Create a model in Cloud9 Create another Lambda function that uses that model Test the Lambda function in Cloud9 Deploy the Lambda function Set up API Gateway to call Lambda

Let’s get to it

Instead of building a new environment from scratch, we can just add on.

Back in Cloud9, add another Lambda function and application. Let’s call both “logitTutorial”.

Our model will need some additional modules. This is not quite as straight forward as might be liked, but relative to the normal virtualenv, pip, zip, s3, deploy, it’s a cake walk. In the application folder, open a terminal and navigate to the function’s folder ( ~/environment/logitTutorial: ).

In this newly opened terminal you can import by specifying the target.

there’s a space and double hyphen before “target”. Medium auto-formating has issues.

python3 -m pip install — target=./ numpy pandas scipy sklearn

Now, we need to create the model file (‘logit.pkl’). To do that, I’m just going to create a new file in the environment ( anotherLambdaTest/logitTutorial/build_logit_pkl.py ), copy over the code Ben Weber provided in his tutorial, and run that from Cloud9’s terminal. This will drop logit.pkl right in the same folder.

import pandas as pd

from sklearn.externals import joblib

from sklearn.linear_model import LogisticRegression

" df = pd.read_csv( https://github.com/bgweber/Twitch/raw/master/Recommendations/games-expand.csv ") y_train = df['label']

x_train = df.drop(['label'], axis=1) model = LogisticRegression()

model.fit(x_train, y_train) joblib.dump(model, 'logit.pkl')

The last python-ic step is to update lambda_function.py with code to call the model (again, see Ben Weber’s tutorial for full code).

from sklearn.externals import joblib

import pandas as pd model = joblib.load('logit.pkl') def lambda_handler(event, context):

p = event['queryStringParameters']

print("Event params: " + str(p))

x = pd.DataFrame.from_dict(p, orient='index').transpose()

pred = model.predict_proba(x)[0][1] result = 'Prediction ' + str(pred)

return { "body": result }

Then we’re ready for testing. Note: unlike the tutorial you must use double quotes when testing here.