Then hit create droplet, and after a minute or so, you’ll be all set!

3. Initial Setup and Snapshots

When you’re done, you’ll get an email from Digital Ocean with an IP address, username and password. Copy the IP address, fire up your command line and enter:

ssh root@[IP Address]

If this hangs and nothing happens, you can try:

ping [IP Address]

and see if you have a connection. If not, you’re probably being blocked by an external firewall (your work or university). This happened to me, and there’s not a lot you can do:

Hit CTRL-C to stop it after 10 or so packets

If you do get a response from ssh root@[IP Address] , there will be a bunch of text asking you if you want to proceed. Write yes and press enter.

You’ll then be asked to enter the password. Copy and paste that in from the email and hit enter. You’ll then need to change the root password. Make this pretty secure as you won’t be logging into root that often, but if someone else does they can totally wreck your droplet.

Next, you want to create a user:

adduser [username]

where [username] is anything you want.

You’ll have to create a password for this user, and it’ll ask for a bunch of personal information. Enter this if you like, or skip it (press enter), it doesn’t matter.

Next, give your new account permissions to do stuff on the server

usermod -aG sudo [username]

gpasswd -a [username] sudo

This gives yourself the ability to make changes without having to be logged into root.

Finally, change the time zone:

sudo dpkg-reconfigure tzdata

and select your current city.

Nice! We’re all set up on root. Now we can exit and check out your user account.

exit ssh [username]@[IP Address]

Once you’ve logged in, the first step is to configure your bash profile to make your life easier.

nano .profile

If you haven’t used Nano before, it’s a simple text editor. Use arrow keys to move, Ctrl+k to cut a line, Ctrl+u to paste a line, Ctrl+o to save, Ctrl-x to exit.

You can add aliases here (shortcuts) to make life easier. I use Python 3 and Pip 3, so I generally avoid confusion by setting:

alias python="python3"

alias pip="pip3"

I also hate typing nano ~/.profile when I want to change or check my bash profile, so I alias that too:

alias obp="nano ~/.profile"

alias sbp="source ~/.profile"

You can also add any other aliases here. Save it with Ctrl+o and exit with Ctrl-x .

To implement these changes, you need to source the profile:

source .profile

(As a side note, you can’t use new alias shortcuts before you’ve sourced them, so simply entering sbp won’t work yet, but will after you’ve sourced it the long way once).

apt-get is a bit like Pip for the command line, and should be updated before we go any further.

sudo apt-get update

Python 3 is already installed, but Pip isn’t, so let’s go ahead and get and update that:

sudo apt-get install python3-pip

pip install -upgrade pip

Now you can pip install all your regular Python packages! Well, you can in a second. For some reason, Matplotlib and a few other packages have issues with Ubuntu, but you can solve these by running:

sudo apt-get install libfreetype6-dev libxft-dev

sudo apt-get install python-setuptools

Sweet, now you can go nuts with pip3 install . If you have a requirements.txt file which you use for all your installations, you can run that now.

Finally, AG is a great search tool for the command line which doesn’t come as standard, so you can install that too if you’d like:

sudo apt install silversearcher-ag

Perfect, you’re all set up! Now, this was quite a lot of work, and it would suck to have to do this each time. That’s where Digital Ocean snapshots come in!

This is a way to ‘save-as’ your current droplet, so that when you create a new droplet next time, you can just load that image and you’ll start exactly where you left off! Perfect.

Powerdown your droplet using sudo poweroff and head over to the Snapshots menu-item on your Digital Ocean droplet page:

Snapshots aren’t free, they cost $0.05/Gb/month. My snapshot taken with Pandas, Numpy, Matplotlib and a few other packages install was 1.85Gb, so that’s about $1.11 per year. Not bad considering it’ll save you 10 minutes each time you set up a droplet!

As your droplet is already powered down, hit the ‘Take Live Snapshot’ button and Digital Ocean will do the rest.

A side note here, it doesn’t appear you can read a snapshot into a cheaper droplet pricing scheme. Just something to keep in mind.

If you ever want to upgrade your pricing scheme, head to the Resize menu item. Again, you can’t downgrade here and if you want to, you’ll need to create a droplet from scratch and repeat the above steps.

To turn your droplet back on, head to Digital Ocean and in the top-right hand corner of the droplet page there’s an on/off toggle. Turn it back on and you’re ready to upload data.

4. Uploading Data via SFTP

Now, if you have the data on your personal computer and want to send it up to the cloud, you’ll need to do so via SFTP (Secure File Transfer Protocol). This is fairly easy, albeit it not lightening fast.

To start, make an SFTP connection:

sftp [username]@[IP Address]

Now, you can use the normal command line functions like cd , ls and pwd to move around the file system on the server. However, for some reason, autocomplete (tab) doesn’t work on SFTP. For that reason, I suggest uploading all your files and code into the home directory, and the logging back in with ssh and moving stuff around using mv and autocomplete. Trust me, it’s much easier.

That’s on the server side, but how about moving around on your personal computer? The same commands work, but you need to prefix them with an additional l , that is lcd , lls and lpwd . Manoeuvre your way to the directory where the files are held on your local, and then upload them to the server using:

put [localFile]

or for a whole directory:

put -r [localDirectory]

This isn’t fast, for my 1.5Gb data files it took about 20 minutes each. Annoying, but there’s not much you can do. You can upload in parallel, but this means opening up a new command line tab and logging in with SFTP again, plus both connections slow down by half, so the total time is the same :(

You can see how much of you storage you’ve used by running:

df -h

For future reference, downloading files from the server is as simple as:

get [remoteFile]

or

get -r [remoteDirectory]

Now, once your data files are up on the server, you can upload your python scripts too. Make sure that if your local directory configuration is different to the remote, to change any file locations in the code.

Deleting files from the server is also slightly different to locally, as rm -rf directory doesn’t work. You need to rm filename to delete a file, or empty a directory then rmdir directory to delete a folder.

This is enough to get you started, but if you’re interested in more advanced options, there’s a good tutorial here.

I thoroughly recommend testing all your code locally (I use PyCharm, which is great and has a free student license too) on small subsets of the data, then when it’s working, SFTP’ing the code to the server and running it once. Troubleshooting over remote connection can be annoying.

5. Running Computations

Now, you’re all set up to run some computations. Log back in with ssh , organise all your files, and run any code with python filename.py as you would on your local computer.

If you get a Killed response when you try and run some code, it means you don’t have enough memory to complete the job. You have two options. Either powerdown and upgrade your pricing plan, or create a Swapfile (more below).

Printing error logs will give you some more information on what has gone wrong:

sudo cat /var/log/kern.log

You can monitor job progress on the Digital Ocean dashboard that shows you some nice graphs of usage: