Introduction

I notice a lot of people these days using Heroku to host applications they’ve written with Django or Rails. This may save them some time and effort, but they are spending way more money on this specialized hosting than they would pay for a generic Linux host. They are also giving up control of a large portion of their application. I consider the entire stack to be part of my application, and complete control over all the pieces is mandatory.

Because I encourage people to administer their own servers, I have written this tutorial to remove just one more of their excuses not to do so. This tutorial covers everything you need to do to configure a blank Ubuntu Linux 12.04 LTS 64-bit server to run a Django application properly. I will not be covering anything about writing the application itself. I will also only provide minimal discussion of some topics, like database configuration. If you can finish this tutorial, you can Google the necessary documentation for other things. The only thing I completely skip is configuration of search using Haystack and Solr since it is not needed by most applications.

If you want to use a different Linux distribution, most of this tutorial will still be correct. You will have to translate all of the apt-get statements and package names to your distribution of choice. Configuration file paths and layouts may also differ. I chose Ubuntu 12.04 because it is very popular, available on almost every host, and its long term support will keep this tutorial relevant for a longer period of time.



Formatting

In this tutorial all commands and lines of code that you should actually enter are displayed in preformatted text. Within the preformatted text there will be lines that begin with an octothorpe (#). Those lines contain my personal comments on the surrounding lines, and should not be typed. Lines beginning with the dollar symbol ($) are commands that should be typed on the shell. In cases where there are variables that you should replace with appropriate values, I will surround the variable with <> symbols and name it with all capital letters.

# This text is describing the command below it. Don't type it. $ this is a shell command, type it # Type the following command, but replace the <USER> with your username $ sudo adduser <USER>

In cases where edits must be made to a file I will begin that section with a shell command to open the appropriate file in vim, which is my text editor of choice. You may use whichever text editor you wish. I will also describe the edits to be made in the comment text.

$ vim test.txt # add this line to the end of the file TEST = True

Tutorial

Beginning Steps

Begin by installing Ubuntu on a machine you own, or getting a new machine from a hosting provider. I recommend Linode or Amazon EC2. If you choose EC2 the Ubuntu AMI finder should be your starting point.

Once the machine is installed, you will need to login, mostly likely using SSH. I can not more strongly recommend you use SSH Agent Forwarding. I will not cover how to set it up in this tutorial, as it varies based on your client machine. If you haven’t done this already, do it now.

Once you are logged into the machine, you will probably notice it is telling you that a lot of packages need updating. You should update packages whenever they need updating, especially if there are security patches. Just be aware this may disrupt your site for a few minutes. If your site were important enough to where a minute of downtime was a problem, you would not need this tutorial. Here is how you update all packages on an Ubuntu or Debian system.

# update the local package index $ sudo apt-get update # actually upgrade all packages that can be upgraded $ sudo apt-get dist-upgrade # remove any packages that are no longer needed $ sudo apt-get autoremove # reboot the machine, which is only necessary for some updates $ sudo reboot

You may notice frequent use of sudo. Sudo is a program that allows us to execute other commands with root privileges. If you are bothered having to type your password when using sudo, then you can edit your sudoers file to make it easier in the future.

distribute and pip

Many of the Python packages we need are available through Ubuntu and apt-get. However, we do not want to use them. The packages provided by Ubuntu are very likely to be old and out of date. Also, it is only possible to install them globally, whereas we would like to install them in an isolated fashion. If we ran two sites on this server that used different versions of Django, this method would not work.

Many of the Python packages are wrappers around C libraries that will need to be available for linking. Those C libraries must be installed using apt-get before we install the associated Python packages. There are two such packages that are needed for just about everything, and should be installed now. They are build-essential and python-dev. Build-essential is the Ubuntu package which provides all the standard C compilation tools. Python-dev provides the necessary files for compiling Python/C modules.

$ sudo apt-get install build-essential python-dev

Now, there are various tools to install Python packages, and there is much drama and debate in the community about them. I will ignore all that, and say to use Distribute and pip. I have used them for years now, and have had no issues at all.

# download distribute $ curl -O http://python-distribute.org/distribute_setup.py # install distribute $ sudo python distribute_setup.py # remove installation files $ rm distribute* # use distribute to install pip $ sudo easy_install pip

Virtualenv(wrapper)

Now that we have used distribute to install pip, we will use pip to install every other Python package. This solves the problem of getting packages, but it does not solve the problem of isolating them. For that we must use virtualenv. Because virtualenv itself is a little clumsy, we will also use virtualenvwrapper to make things a little easier on ourselves.

# install virtualenv and virtualenvwrapper $ sudo pip install virtualenv virtualenvwrapper # edit the .bashrc file $ vim .bashrc # to enable virtualenvwrapper add this line to the end of the file source /usr/local/bin/virtualenvwrapper.sh #save and quit your editor # exit and log back in to restart your shell $ exit

You will notice after exiting and logging back into the shell that virtualenvwrapper will do a bunch of magic. That is a one-time setup that you will never see again. If you don’t see a bunch of stuff happen, then there is a problem you should fix.

Now, how do you use virtualenv? You create as many virtualenvs as you like, each with a unique name. Only one of these virtualenvs will be active at a given time. Any time you use pip the packages will only be installed into the active virtualenv. In order for a Python program to make use of those Python packages, it must be executed with that virtualenv activated. Let’s create a virtualenv for our project and activate it. You should just keep it active for the rest of this tutorial for ease. You can tell when a virtualenv is active because it will appear in your shell prompt.

# create a virtualenv, I usually give it the same name as my app $ mkvirtualenv <VIRTUALENV_NAME> # The virtualenv will be activated automatically. # You can deactivate it like this $ deactivate # to activate a virtualenv, or change which one is active, do this $ workon <VIRTUALENV_NAME>

One very helpful command you can use now is pip freeze. Pip freeze lists all available Python packages and their versions. It will let you easily see what has been installed. Try these commands out. You will see just how many packages have already been installed system-wide by Ubuntu. You will also see that your virtualenv is isolated from the rest of the system, and doesn’t have much installed in it yet.

$ deactivate $ pip freeze $ workon <VIRTUALENV_NAME> $ pip freeze

Django

Installing Django is very simple. Make sure the virtualenv is activated. This is the last time I will remind you to activate.

# install the newest version of Django $ pip install django # also install docutils, which is used by the django admin $ pip install docutils # If you want to test django, start a new project and run it $ django-admin.py startproject <APP_NAME> $ cd <APP_NAME> # make manage.py executable $ chmod +x manage.py # start the dev server $ ./manage.py runserver 0.0.0.0:8000

PIL / Pillow

PIL is the Python imaging library. It is needed if your application is going to do anything involving at all involving images. Even if you simply allow user to upload images, you will need it. There is a drop in replacement available called Pillow. I’ve never had problems with either one, but some people have problems with PIL. If you don’t know, just choose Pillow. Whichever one you choose, you will need the same set of prerequisite C libraries. If you don’t install these, then PIL might fail if a user tries to upload a jpg, or you try to perform certain image operations.

# install libraries $ sudo apt-get install libjpeg8-dev libfreetype6-dev zlib1g-dev # choose ONE of these $ pip install pillow $ pip install pil

Databases MySQL / PostgreSQL

It’s up to you to choose your database. I prefer MySQL because it is easier. The Django devs prefer PostgreSQL because it is more robust. You can learn to configure the databases using their respective documentation. It’s also up to you to edit the Django settings.py file to point at the appropriate database server. If you choose MySQL, you should configure it so that utf8 is always the default character set, utf8_general_ci is the default collation, and that InnoDB is the default storage engine.

# do this to go with Postgres $ sudo apt-get install postgresql libpq-dev $ pip install psycopg2 # do this to go with MySQL $ sudo apt-get install mysql-server libmysqlclient-dev $ pip install MySQL-Python

Whichever database you choose, you will have to create a database and a user for your application to use. Then you will have to make sure that the user has full permissions on that database.

South

South is a mandatory part of every Django project as far as I am concerned. It manages schema changes to your database about as well as can be expected, and makes life much easier in general.

$ pip install south # add south to the INSTALLED_APPS in your settings.py $ vim <YOUR_APP>/settings.py INSTALLED_APPS = ... 'south', ...

memcached

Just about every web site can benefit from having memcached. It creates a key:value store directly in memory that is very fast and easy. This can be used to store information to reduce database queries and increase performance. There have been many different libraries and modules for memcached, but PyLibMC is presently the preferred choice.

# install memcached server and library $ sudo apt-get install memcached libmemcached-dev # install pylibmc $ pip install pylibmc # edit the CACHES setting in your project's settings.py $ vim <YOUR_APP>/settings.py CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache', 'LOCATION': '127.0.0.1:11211', } }

Asynchronous Task Execution

Almost every site can benefit from asynchronous task execution. Even if your site isn’t big enough to make it a necessity for scaling and performance, why not do it? It’s very easy to set it up, and requires only free software.

What is asynchronous task execution? Let’s say you have a simple contact form. If you call send_email from the contact form view, then the user who submitted the form will not receive an HTTP response until the email has been sent. Their browser will sit there spinning while the server works. It will also tie up one of your web server threads. To make matters worse, if there is an email error, it will return that error to the user.

What you really want to do is send the user to a thank you page immediately after they submit. Then you can send the email some time later without tying up a web server. If there is an error, you can keep retrying it without bothering the user. To do this we use a Python package called celery.

Celery requires a few pieces to work. First, it needs celery workers. These are the programs that actually do the work, such as sending the emails. Next it needs a message queuing server. This is where the celery workers look to see if there is any work they should be doing. For this you should use RabbitMQ.

There is also celerycam, which will monitor celery by taking snapshots every few seconds. This will allow you to see the state of what celery is doing from the Django admin interface. Lastly, we need a place to store results if any celery tasks produce them. For that we will just use the existing MySQL or PostgreSQL database.

RabbitMQ

When you install RabbitMQ you also have to create a user and grant them permissions on a virtual host. You can pick whatever username, password, and vhost name you like, but I always use the name of my app for all three.

$ sudo apt-get install rabbitmq-server $ sudo rabbitmqctl add_user <RABBIT_USER> <RABBIT_PASSWORD> $ sudo rabbitmqctl add_vhost <RABBIT_VHOST> $ sudo rabbitmqctl set_permissions -p <RABBIT_VHOST> <RABBIT_USER> ".*" ".*" ".*"

Celery

Installing celery itself is a one liner, but it requires a lot of modifications to the Django settings file to get it working properly.

$ pip install django-celery $ vim <YOUR_APP>/settings.py # add djcelery to INSTALLED_APPS INSTALLED_APPS = ... 'djcelery', ... # add these settings BROKER_URL = "amqp://<RABBIT_USER>:<RABBIT_PASSWORD>@localhost:5672/<RABBIT_VHOST>" CELERY_RESULT_BACKEND = "database" # choose the setting that matches your database of choice CELERY_RESULT_DBURI = "mysql://<DB_USER>:<DB_PASSWORD>@localhost/<DB_NAME>" CELERY_RESULT_DBURI = "postgresql://<DB_USER>:<DB_PASSWORD>@localhost/<DB_NAME>" # put these two lines at the very bottom of the settings file import djcelery djcelery.setup_loader()

To test celery you can start up the celery daemon using a management command. If your application is going to have recurring tasks, you should enable events and celerybeat. Otherwise, you can ignore those options.

# start celery with Beat and Event $ ./manage.py celeryd -B -E # start celery normally $ ./manage.py celeryd # press Ctrl+C to quit celery once you see it is working

gunicorn

When you want to serve a modern Python web application, you use a thing called the web server gateway interface (WSGI). That is how your application will talk to the web server. There are a lot of choices when it comes to using WSGI. Apache with mod_wsgi is the old trustworthy option. uWSGI is also very popular. I personally prefer Gunicorn because it works extremely well, and runs great out of the box without any configuration.

$ pip install gunicorn # add gunicorn to INSTALLED_APPS $ vim <YOUR_APP>/settings.py INSTALLED_APPS = ... 'gunicorn', ...

Like celery, you can test Gunicorn by running it from a management command. The Gunicorn management command has many command line parameters. You can read about the options in the Gunicorn documentation. Personally, the only options I use are to set Gunicorn to use four workers and to enable gevent. You will see more about this in the next section.

$ ./manage.py run_gunicorn -w 4 -k gevent # pres Ctrl+C to exit Gunicorn once you see it is working

Supervisor

You may have noticed by now that Ubuntu is very clever in the way it handles servers. When you install MySQL with apt-get it immediately starts it with a default configuration. The same goes for RabbitMQ, memcached, SSH, and almost every other such server. If the machine reboots, these services will start themselves automatically. We can trust the distribution to take care of this for us.

But we have a problem now. We need to start Gunicorn and celery as services, but Ubuntu won’t help us with things we didn’t install with apt-get. One way to solve this problem is to write upstart or init.d scripts for these services. That will work, but it’s a pain in the ass. Those kinds of scripts are not simple to write, and are worse to maintain. Luckily supervisor exists.

Supervisor is installed with apt-get, so it will start automatically. We give supervisor very simple configuration files for any further services we would like it to manage, and it will start those up for us. It has the added benefit that it can restart any service it is managing in the case of failure. If Gunicorn crashes, it will come right back without us having to do anything. Because of this feature, some people choose to manage all their services under supervisor. I do not do this because it is extra work, and it is very rare that it will help you. If something like MySQL crashes, odds are it will not be successfully or safely restarted by supervisor.

Installing supervisor is as simple as apt-get, but configuring it is another matter. We need to create a configuration file for each service that supervisor manages. These configuration files must be located in /etc/supervisor/conf.d/ and must have a name with the conf extension. I have included here the full contents of my three supervisor configuration files, but have only documented the first one, as they are so similar.

$ sudo apt-get install supervisor

# contents of /etc/supervisor/conf.d/celeryd.conf # the name of this service as far as supervisor is concerned [program:celeryd] # the command to start celery command = /home/<USERNAME>/.virtualenvs/<VIRTUALENV_NAME>/bin/python /home/<USERNAME>/<APP_NAME>/manage.py celeryd -B -E # the directory to be in while running this directory = /home/<USERNAME>/<APP_NAME> # the user to run this service as user = <USERNAME> # start this at boot, and restart it if it fails autostart = true autorestart = true # take stdout and stderr of celery and write to these log files stdout_logfile = /var/log/supervisor/celeryd.log stderr_logfile = /var/log/supervisor/celeryd_err.log

# contents of /etc/supervisor/conf.d/celerycam.conf [program:celerycam] command = /home/<USERNAME>/.virtualenvs/<VIRTUALENV_NAME>/bin/python /home/<USERNAME>/<APP_NAME>/manage.py celerycam directory = /home/<USERNAME>/<APP_NAME> user = <USERNAME> autostart = true autorestart = true stdout_logfile = /var/log/supervisor/celerycam.log stderr_logfile = /var/log/supervisor/celerycam_err.log

# contents of /etc/supervisor/conf.d/gunicorn.conf [program:gunicorn] command = /home/<USERNAME>/.virtualenvs/<VIRTUALENV_NAME>/bin/python /home/<USERNAME>/<APP_NAME>/manage.py run_gunicorn -w 4 -k gevent directory = /home/<USERNAME>/<APP_NAME> user = <USERNAME> autostart = true autorestart = true stdout_logfile = /var/log/supervisor/gunicorn.log stderr_logfile = /var/log/supervisor/gunicorn_err.log

You will notice the command for each of the supervisor configurations is strange. Instead of just running manage.py directly, we are specifically calling the python interpreter located inside of our virtualenv. There is no way for supervisor to easily activate itself the way we activate in our shell with the workon command. By specifically using the python interpreter from the virtualenv instead of the system version, celery and gunicorn will have the correct path settings, and will use the packages within the the virtualenv. Also, note the celeryd and gunicorn command line parameters we had discussed earlier to enable beat, event, gevent, etc.

To make supervisor recognize the new configuration files, we must restart it. It’s also pretty obvious from the example blow how to manually control the services that supervisor manages.

# restart supervisor itself $ sudo service supervisor restart # restart/stop/start all services managed by supervisor $ sudo supervisorctl restart all $ sudo supervisorctl stop all $ sudo supervisorctl start all # restart just celeryd $ sudo supervisorctl restart celeryd # start just gunicorn $ sudo supervisorctl start gunicorn

Static and Media

There is a great deal of confusion about static and media in Django. It is partially because the separation of the two was not very clear in earlier releases. This has since been fixed, and the static files system has been completely reworked. While the problem is solved in the current versions, the big changes may have added to the confusion. Allow me to help.

Static files are files that you would put into your repository that are not code. These are almost always css, jss, and image files. Your favicon would also be a static file. This is a file that you make, must be served to web browsers, but does not change unless you update your application.

Media files are files that are also served statically, but they were not made by you. They were uploaded by users. So on a site like YouTube, the YouTube logo in the top left would be a static file, but the actual videos themselves would be media. On Flickr, the css and the login buttons would be static, but the photos uploaded by users are media.

For media files we must simply take files that users upload, put them in a folder, and then serve the files in that folder. For static files we have a slightly trickier issue because they are not so easily located. The django admin keeps its static files in one place. You put your static files in another place. Maybe you pip install another third party module that includes some static files. In this case, you run a management command to collect the static files to a single folder from which they will be served.

# Create directories to hold static and media sudo mkdir /var/www sudo mkdir /var/www/static sudo mkdir /var/www/media # set permissions on these directories sudo chown -R <USERNAME>:www-data /var/www $ vim <YOUR_APP>/settings.py # these settings are explained below MEDIA_ROOT = '/var/www/media/' MEDIA_URL = '/media/' STATIC_ROOT = '/var/www/static/' STATIC_URL = '/static/' # run this whenever static files are changed $ ./manage.py collectstatic

In the django settings file there are four settings. MEDIA_ROOT, MEDIA_URL, STATIC_ROOT, and STATIC_URL. The ROOT settings are the local directories where the files are located. The URL settings are the URLs where these files will be served. Assume we have a file named css/main.css Using the above example, the file will be located in /var/www/static/css/main.css after it is collected. It will be served from the URL http://yourdomain.com/static/css/main.css.

One more benefit of the static system is that you can use pip to add static files to your project. For example, you can pip install django-staticfiles-jquery and django-staticfiles-bootstrap to automatically put jquery and boostrap into your static files when you run collectstatic.

Nginx

This brings us to the final piece of the puzzle. How exactly are these static files served? Gunicorn handles serving all the dynamically generated pages of our application, but it doesn’t know anything about these static files. The Django runserver will do some magic to handle these files to make development easier, but once you enter production and set DEBUG=False, you no longer have this luxury. That is why we put Nginx as our web server out in front of Gunicorn. It will serve all the static files, and also ask Gunicorn to handle any requests it doesn’t know what to do with.

# install nginx $ sudo apt-get install nginx

Much like Apache, Nginx uses two folders for configuration. There is /etc/nginx/sites-available/ and /etc/nginx/sites-enabled/. You put all your nginx configuration files into the sites-available directory. Then if you want a site to be active, you create a symlink to that file from the sites-enabled directory. You have to restart nginx after any configuration changes. Because Ubuntu starts nginx with a default configuration when you install it, you must first remove the symlink to that default configuration.

# remove the default symbolic link $ sudo rm /etc/nginx/sites-enabled/default # create a new blank config, and make a symlink to it $ sudo touch /etc/nginx/sites-available/<YOUR_APP> $ cd /etc/nginx/sites-enabled $ sudo ln -s ../sites-available/<YOUR_APP> # edit the nginx configuration file $ vim /etc/nginx/sites-available/<YOUR_APP>

Here is what the nginx configuration file should look like

# define an upstream server named gunicorn on localhost port 8000 upstream gunicorn { server localhost:8000; } # make an nginx server server { # listen on port 80 listen 80; # for requests to these domains server_name <YOUR_DOMAIN>.com www.<YOUR_DOMAIN>.com; # look in this directory for files to serve root /var/www/; # keep logs in these files access_log /var/log/nginx/<YOUR_APP>.access.log; error_log /var/log/nginx/<YOUR_APP>.error.log; # You need this to allow users to upload large files # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size # I'm not sure where it goes, so I put it in twice. It works. client_max_body_size 0; # THIS IS THE IMPORTANT LINE # this tries to serve a static file at the requested url # if no static file is found, it passes the url to gunicorn try_files $uri @gunicorn; # define rules for gunicorn location @gunicorn { # repeated just in case client_max_body_size 0; # proxy to the gunicorn upstream defined above proxy_pass http://gunicorn; # makes sure the URLs don't actually say http://gunicorn proxy_redirect off; # If gunicorn takes > 5 minutes to respond, give up # Feel free to change the time on this proxy_read_timeout 5m; # make sure these HTTP headers are set properly proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }

Restart nginx and your server is now running and serving your django application perfectly!

# restart nginx $ sudo service nginx restart

Conclusion

I’m sure there are going to be errors in this no matter how much I proofread. I also fully admit that there is always more to learn. I welcome any suggested fixes or improvements to any part of this. Just leave a comment below.

I also hope this saves someone some money by letting them run their site on a cheap Linux host instead of a managed host. Most of all I hope it gets some people to view their application as not just the one piece they wrote, but as a whole made up of many separate working parts. And that those other parts are not to be outsourced, but to be embraced. When you control them, you can solve problems with them.