Automate python scripts with AWS Lightsail

Follow along as I work my way through automating a python script on a AWS Lightsail Ubuntu instance.

In my work as a data scientist, I have come to realize how necessary it is to automate any and every aspect of the workflow. When most people hear the words data science, they often think machine learning and AI, but really most of a data scientist’s time is spent on various kinds of work. In this blog I will be focusing on the data collection automation using AWS Lightsail.

The project outlined below is all hosted on my github

1. Create a AWS Lightsail Ubuntu instance

2. Apply a dedicated IP address to the instance

3. Install Python3.7 and PIP on the Ubuntu instance

4. Clone python repository to the instance

The python script will call reddit’s api and store all submissions from reddit.com/r/learnpython into a csv

5. Create a cron job that will run every hour

Create an Ubuntu LightSail Instance on Amazon Web Services

If you’re an individual developer or hobbyist working on a personal project, Lightsail can help you deploy and manage basic cloud resources. Amazon Lightsail is the easiest way to get started with AWS if you just need virtual private servers. Lightsail includes everything you need to launch your project quickly — a virtual machine, SSD-based storage, data transfer, DNS management, and a static IP. After you create your instance, you can easily connect to it. You can manage your instances using the Lightsail console, Lightsail API, or Lightsail command line interface (CLI). (https://lightsail.aws.amazon.com/)

To begin you will need to sign up at Amazon LightSail. The first month is free which will give you plenty of time to decide if this service is what you need.

Once you have logged in, you should see the Lightsail dashboard.

Lightsail dashboard

Create an Ubuntu Instance

1. Click on the Create instance button (circled above).

2. Under pick your instance image, select Linux/Unix

3. Select OS only

4. Select Ubuntu 18.04

creating ubuntu instance

5. Choose your instance Plan: For this project I will be using the cheapest option ($3.50) as it is more than sufficient to run most python scripts. Also, don’t forget the first month is free!

6. Name your instance: For this project I named the instance “Ubuntu-Automation”

7. Select Create Instance

After selecting Create Instance you will be returned to the AWS LightSail dashboard. It will take the Ubuntu instance a few minutes to be created. While the instance is being created, the status will be “Pending” like in the screenshot below:

Pending creation

The status will change to “Running” once the instance has been created. You will also see the IP address assigned to the instance, for my instance the IP address is 3.227.241.208. This IP address is dynamic and will change every time you reboot the instance. Depending on the project you plan on hosting it may be necessary to set a Static IP address.

Ubuntu instance created and running

Create a Static IP Address

Creating a static IP is optional and only necessary if your project requires it. I will be creating a static IP address because I open my SQL server only to this IP address for security reasons. After the initial setup I prefer to SSH into the Ubuntu instance from my local machine and having a static IP makes this process easier.

1. Select the Networking tab in your Lightsail dashboard

2.Click on “Create static IP”

Networking dashboard

3. Select your Ubuntu Instance server under “Attach to an instance”

4. Give the Static IP a name

5. Click “Create”

You should then see your new static IP Address. This IP address will not change.

Moving forward, my static IP address will be 18.213.119.58 which is what i’ll be using for the remainder of this project.

Python Automation

For this project, I will be creating a python script that calls the Reddit API and collects all of the new submissions from reddit.com/r/learnpython. For the scope of this article, I will not be reviewing how this particular script works, however, you can view all of the code at GitHubLink.

Connecting to Ubuntu instance using SSH

From the Lightsail dashboard you can connect to your Ubuntu instance using the web based SSH tool. After the initial setup I prefer to use SSH as it is simpler, but I did find using the web based tool easier to interface with in regards to the setup reviewed in this blog.

Terminal SSH Connection

In the upper right-hand corner select Account > Account. This will bring you to the account dashboard where you can download your SSH Key.

Once in the Lightsail account dashboard select “SSH keys” and then Download.

On your local computer navigate to ~/.ssh by running the command cd ~/.ssh

cd ~/.ssh

Copy the downloaded key to this location

To check that the key has been copied to this location run the command ls to list all files. (Note this method will only work for Unix based operating systems.)

ls

To connect via SSH run the following command

ssh -i ~/.ssh/lightsail.pem -T ubuntu@{your_lightsail_IP_address}

My Ubuntu server’s IP address is 18.213.119.58. To connect I will use the following commands

The first time connecting you will see the following message:

Select Yes to connect to your Ubuntu Instance.

Once connected you will see the following:

Both the web based SSH connection and local terminal SSH connection are valid and work. I just prefer connecting via terminal.

Getting your python script on the Ubuntu Instance

My preferred method of downloading my python script onto the Ubuntu instance is via Git.

(If there are config files you need on your ubuntu instance that you don’t want hosted on github you can use Amazon’s S3 to transfer)

Installing Python3.7 and PIP

For installing Python and PIP I would recommend using the web based SSH through the Lightsail dashboard.

Once in the repository folder run the following command which will run the below code and install Python3.7 and PIP.

bash install_python.sh

Installing Python Libraries

Next install the python libraries praw and pandas. Run the batch file python_libraries.sh.

Praw: Python reddit aPI wrapper

Pandas: Data manipulation and analysis

bash python_libraries.sh

Setting a Cron Job

The software utility cron is a time-based job scheduler in unix-like computer operating systems. Users that set up and maintain software environments use cron to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals. It typically automates system maintenance or administration — though its general-purpose nature makes it useful for things like downloading files from the internet and downloading email at regular intervals (wikipeida.com).

In order to fully automate this process, the last step is to have a cron job run at a regular interval.

For this project I am going to have my script run every hour on the 15th minute. The cron command will look like this:

15 * * * * /usr/bin/python3 /home/ubuntu/AWS-Lightsail/learnpython_to_csv.py >> ~/cron.log 2>&1

If you would like to play around with setting different intervals for your cron job I recommend first taking a look at https://crontab.guru/ .

Creating a Cron Job

Set the editor to vim using the following command

export EDIOTR=vim

Enter vim and edit the cron jobs

crontab -e

At this point VIM will launch and you will be able to edit your cron jobs.

Press i to enter insert mode Once in insert mode copy and paste the cron job into the editor Press Escape Press : (colon), w (write), q (quit) to save and exit from vim

:wq

You are now finished and your script will run at the interval given in the cron job.

To check your cron jobs you can run the command crontab -l to see all current cron jobs.

crontab -l

For logging purposes, print statements and errors will be stored in the file cron.log. From the home directory run the following command.

cat cron.log

If you see similar output that means everything is working!

cron.log

Setup Video: