Earlier this year, I inherited responsibility for the website of the Knight Digital Media Center at UC Berkeley’s Graduate School of Journalism. The site is built with Django, a web application framework written in Python. The J-School has primarily been a PHP shop, using a mixture of open-source apps — lots of WordPress, Smarty templates and piles of home-brew code. Because it’s grown organically over time with no clear underlying architecture and a constantly changing array of publications to support, the organization sits on top of dozens of unrelated databases.

These are my notes and observations on how the J-School got into this mess, why we’ve fallen in love with Django, and how we plan to dig ourselves out.

Bailing Wire

I personally have no formal CS or programming background. Over the course of the six years I managed most of the J-School web properties, I learned just as much PHP and shell scripting as required, on an as-needed basis. Job pressures and constant deadlines never allowed time for pure training.

Even without training, I’ve been extremely productive with ad-hoc PHP. I ended up developing and/or implementing some very effective web apps, such as a course and instructor review system, events database, story submission database, application and review system, online quiz system, etc. – all more or less stitched together with bailing wire and duct tape.

But everything people say about PHP systems not scaling well eventually came around to bite us – our systems are sprawling. Login systems are disconnected, disparate databases that should be unified are spread all the place. Instead of a centralized content management system, we have dozens of loosely connected CMSs. From a certain perspective, this made sense, as much of what we do is not about running a single site, but a loose federation of barely related sites. At a journalism school, students, classes, and organizations demand one publication site after another, few of which have anything to do with one another.

But at a certain point, non-unified systems simply fail to scale. It’s not about traffic, but about consistency. You can’t require students and faculty to create new logins all over the place, and you can’t expect web developers to master dozens of different platforms that work in dozens of different ways. Over the past year, we’ve known it was time to consolidate as much as possible into a single, centralized framework or CMS. The scaling problem is organizational — too many parties, too many tools, too many different ways of doing things, too many cooks in the kitchen. A classic example of PHP spaghetti.

Quite a while ago, we realized that something had to give. We needed to rebuild everything in a single system – one that came with a foundation flexible enough to model everything the organization is and does, and to build any kind of web-based representation or tools we needed on top of that model. We needed something more than a typical CMS. We didn’t want to have to shoehorn parts into places where they don’t belong. We wanted a system to work for us, rather than against us.

Framework vs. CMS

What’s the difference between a content management system and a framework? Look at it this way: If you write in a raw language, like PHP or Perl, it’s like going to Home Depot and buying lumber and nails. You have absolute flexibility, but you’re responsible for every square centimeter of the house you build. If you deploy a content management system, it’s like buying a house – you can rearrange the furniture, but you’re pretty much stuck with the floor plan.

A framework lies somewhere between those two poles – as if you’re buying parts of a pre-fab house. “Now I need walls. Now plumbing. Now kitchen appliances. Now air conditioning.” The nitty gritty stuff is taken care of, but you have full control over which pieces you use and how they fit together. “Now I need a commenting system. Now I need RSS. Now I need search. Now I need form validation.”

CMSs have a nasty habit of becoming inflexible. Editorial staff get tired of hearing “we can’t do x, y, or z because the CMS won’t let us.” CMSs make assumptions about how you work, and about how a site is structured. As long as you work within that set of assumptions, all is well. But try to bend a CMS to do things it wasn’t meant to do, and you open the door to failure.

For example, we do a ton of work with WordPress. WordPress assumes that the basic content type is the story, and that that story has a headline, summary, article body, and maybe a bit of metadata. WordPress is an amazing tool, ideally suited to publication-oriented sites such as blogs and focused publications. But building a custom database application involving arbitrary arrays of fields, or building custom workflow tools, would be a matter of bending WordPress in ways it wasn’t meant to be bent. You may or may not be successful, but you’ll probably never call your solution graceful. You’ll always be working against, not with WordPress’ assumptions about the structure of your content. It’s true that WordPress is much more than just a blogging system – it’s a darn good CMS for a lot of site types. But you wouldn’t use it to be build an equipment checkout system, or an alumni database, or an application processing system. Its limitations come from its assumptions.

In contrast, a framework is a toolkit you use to build the CMS that matches your organization’s specific needs. No assumptions, just tools and conventions you can use to model data structures and create a workflow based on those structures. A framework takes care of much of the heavy lifting involved in the creation of highly customized web apps, so you don’t have to pour countless hours into things like fighting spam, managing fine-grained permissions, building CRUD tools, etc.

Because a framework works at a lower level than a CMS, it generally has a steeper learning curve. In exchange for that learning curve, you get more flexibility as your organization changes, grows, and scales. It’s a middle ground between (on one hand) raw code that makes no assumptions but leaves you responsible for every bit, and (on the other) a full-blown CMS that makes too many assumptions about content types, and often feels convoluted and/or limiting (not to mention bloated, which most major CMSs are).

There are web application frameworks written in many languages. PHP has Cake, Symfony, Zend and others. Ruby is famous for Rails. Perl has Catalyst and Gantry. Python has Zope, TurboGears, and, in recent years, Django. I haven’t done enough work with frameworks to be able to offer meaningful side-by-side comparisons, but I can tell you what it’s been like for a PHP guy to learn Django over the past six months.

Starting From Scratch

Here’s why PHP is so popular: You don’t have to learn much up-front. Start with an HTML page, drop in one measly function call, and you get immediate gratification. Need to do something else? Look up another function and drop it in. Over time, you get a head-full of function calls you can use in a pinch, and get to call yourself a programmer. It’s a double-edged sword. The fact that PHP takes no real study to become productive means there are millions of PHP developers who don’t have a grasp of “real” programming concepts. Like me. That doesn’t mean we aren’t productive, but it does mean we tend to build things in non-optimal ways under deadline pressure. And we tend to build things that turn around and bite us in the ass when organizations and data models grow, change, and evolve, because our code is littered with secrets and mysteries, and because we lack the foundations needed to be “real” programmers. I’m generalizing grossly (and being unnecessarily self-deprecating) to make a point.

For me, learning a framework has been another story. There is no “instant gratification” entry point. You need to have a grasp of the language the framework is written in. For Django, that means learning some Python – at the very least lists, dictionaries and tuples. You need to learn that language’s programming structures – how it handles conditionals, loops, I/O, etc. You need to have some basic grounding in object-oriented programming principles – something I’ve simply never gotten around to. In most cases – certainly in Django – you need to learn MVC (model view controller) or in Django’s case MTV (model template view) principles. And you need to learn the framework itself. How it thinks, how data moves through the system, where to look when things go wrong.

Finally, you need to know how to deploy. Because PHP is built into virtually every Apache host out there, deployment is something most PHP devs don’t have to think about. For Rails or Django, it’s a different story. Suddenly you’re compiling apache modules, building database bindings, configuring the web server… or looking around for a host who’s done all of that for you.

I signed up for an online Python course through University Extension. I spent many, many hours reading documentation, blog posts, and watching tutorial videos. I went through the official tutorials, and struggled to get development environments working both locally and on production servers.

Nothing about it was easy. Which is ironic for a system designed to make you more productive. After years of extreme productivity, I felt like I had been thrown back to Square One, and was starting my web development education all over again. Painful, but much-needed. Only after I started to assimilate and internalize the absolute elegance of the Django weltanschauung did I realize just how bad the systems I’d built over the years really were – both from data modeling and coding perspectives. I needed to completely re-think the way I was doing things. It was time for a hard reset.

After several months of diving in, experimenting, failing, getting frustrated, and finally succeeding, I realized:

It’s all about the data model. Get this right and you win. Think long and hard about the data model your organization is built around, and build web apps around that. Get this right and you can build any view, any tool that will do right by the organization. Get it wrong and you’ll struggle forever.

Amateur approaches will only get you so far. Learning on a catch-as-can basis will only get you halfway there. You may be a hero in the short term, but in the long run, real accomplishments come from real programmers. It takes time to learn the bigger picture, but if you do it, you win.

As a PHP dev, I wasn’t used to having to spend long periods of time reading documentation. I read just enough to get each little job done. Django made me sit back and read everything. To grasp the gestalt of what I was dealing with before trying to move forward.

As long as you expect it to be easy, you’ll be disappointed. As with all things in life, there’s the easy way, and there’s the right way. The right way takes extra effort, but pays off in the end. Yes, Django will make you more productive and enable you to solve hard problems quickly – but only after you’ve put in the effort to wrap your head around completely new ways of doing things. The paradox is that Django (and Rails) claim to make web development easier. And it’s true – they do. But only after slogging through a long ramping-up period. More work up front in order to do less work later.

I’ve been incredibly lucky to be allowed a big stretch of time to learn Python/Django on the boss’ clock. The key is in getting people on board, convincing them that moving the organization forward means training time is required. You can’t just change languages/systems in mid-stream and expect everything to carry on at the same pace it has been. The larger organization has to feel the pain of the current systems as well, and therefore to support the time/expense of real training.

Why Django?

Since we were already a PHP shop, why wouldn’t we choose one of the existing PHP frameworks? Yes, we could have gone that route, but none of the PHP frameworks have the credibility or momentum of Rails or Django. I don’t want to say that PHP is a bad language, but it wasn’t built as an object-oriented language from the ground up. The web framework momentum today is in cleaner, more O-O languages. And from what we could see, the PHP-based frameworks themselves didn’t look as clean as Rails or Django. We had learned the hard way, and were ready to move on.

To be fair, I’ve only experimented briefly with Ruby on Rails, and am not qualified to say a lot about it. What I do know is that there have been a lot of complaints about RoR scalability and stability over the past few years. I’m sure the Rails community would refute those claims as myths, but for whatever reason, they have been prevalent. And in my reading of half a dozen comparisons between the two frameworks, I haven’t yet found one that showed faster development times or better scalability for RoR over Django. When you don’t have the resources to compare things side-by-side for yourself (how long would it take to learn enough about every framework on the market in order to do a truly objective comparison/analysis?), you have to lean on the opinion of developers who actually have. And from what I can tell, most developers who have tried both prefer Django over Rails.

The fact that the RoR community is much larger than Django’s was certainly a point in Rails’ favor, but not enough to tip the scales for us.

One of the real “wins” for Django is in its automatically generated “admin” interface. Once your data models have been defined, a mini-CMS is built for you around that data model, with all of the form widgets properly reflecting the models’ relationships. If a field is a ForeignKey to another model, you get a picklist of that foreign model’s instances. If a field is a DateTime, you get slick little date/time pickers. If a field describes a many-to-many relationship with another model, the admin gives you a combo box. If a field is designated required, the right database constraints and field validations are put in place automatically.

The Django admin isn’t perfect and isn’t suitable for all tasks, but for the most part it “just works,” and saves you tons of time. In most cases, you can forget about ever having to write CRUD apps. In the journalism context, it means the developer can sit down with the reporter, get a good handle on the data models that describe the feature, and the journalist can start entering data immediately, while the developer continues work on the public-facing site.

People will tell you “Django is a framework, not a CMS.” That’s true, but the presence of the admin means you essentially get a CMS for free, alongside the framework. The admin is a huge win for Django. If the Admin doesn’t serve a particular purpose, no problem – build your own custom workflow on top of the model you’ve defined, outside of the Admin. Rails doesn’t have anything comparable, as far as I know.

On the flip side, Django is currently missing an equivalent of Rails’ official “migrations” system, which helps keep database schemas in sync when an application’s data models change. Right now, it’s a mostly manual affair. However, there are currently three separate Django projects in the works to address this need. Thankfully, all three developers are now working together to bring their projects into a single unified solution, eventually to be bundled in Django core (listen to Russell Keith-Magee on episode 44 of This Week in Django to learn more about why model migrations is an inherently hard problem).

Unlike Rails, no one complains about Django not scaling to very high traffic situations, or about ongoing stability problems. And no one complains about the project being run/controlled by difficult people. Django grew out of the real publishing needs of a real-world news publication trying to do difficult things on tight deadlines (hence “The web framework for perfectionists with deadlines.”) It may look like a small player, but it’s battle-hardened. According to Google Trends, interest in Django over the past year has surpassed Rails. Something’s going on.

(The search above excludes “Reinhardt” to eliminate results on jazz guitarist Django Reinhardt)

To reiterate: I am not dissing on Rails. I haven’t used it enough to have an opinion, and I know it’s chock-full of excellent code and good practices. I just have the strong impression Django has an edge in terms of development speed, scalability, quality docs, and no-bull OSS leadership.

The open sourcing of Django and the creation of the Django Software Foundation opened the doors to wider adoption, and the September release of Django 1.0 – a version guaranteed to be production-ready and stable – will give bean counters confidence.

But nothing speaks to market confidence in Django more loudly than its blessing by Google. Google engineers have been Python-centric for years (Python creator Guido van Rossum works at Google). But when Google launched AppEngine – a cloud-based web app framework – they released it with just one deployment option: A somewhat modified version of Django. In their eyes, Django was not only good enough to run in heavy production, but also deemed easy enough to learn that they were willing to ask developers to learn it if they wanted to use AppEngine. Any other company would have looked for the lowest common denominator and made AppEngine work with the most popular languages/frameworks first, in order to snare the the most possible developers. Not Google – they let their love for the language and the framework speak for itself with the offering (of course AppEngine will support more languages in the future, but it’s been all-Django so far).

And oh yeah – Django has a magical pony.

Digging In

In September, two of us spent a few days at Google headquarters in Mountain View for the first annual DjangoCon – an international conference bringing together heroes of the Django world with developers who traveled from all corners for the privilege. Heard a lot of great talks, learned heaps about Django best practices, met some great people, etc. Drank the Kool-Aid and got all fired up (video from the complete session is available on YouTube).

Shortly after, I realized that part of my Django education needed to involve creating my own pluggable Django application, so set to work on a multi-user, multi-group task list management system called django-todo (yes, I could have built a blog engine from scratch like most Django newbies do, but the truth is I have no intention of ditching WordPress any time soon, so doing that would only have been an academic exercise).

I was able to finish Django-todo in about a week, and eventually got it up onto Google code, where it’s gotten some good feedback and a few dozen downloads. Even got a mention in episode 40 of This Week in Django!

Django is renowned for its excellent documentation. It’s true that the documentation is incredibly thorough and well-presented, but from a newbie’s perspective, I often found it frustrating, since you don’t know what to search on to find answers to specific problems. Trying to figure out how to make global variables available to my templates, I found nothing, because the answer is in Django’s “context processors.” All well and good if you already know that that’s what you need to search on. Similarly, quite a few of the code examples in the docs show interactions with the Django API through Python shell examples. Those examples do a good job of showing how the API interaction is occurring, but offer no help for the newbie trying to figure out how to actually implement things.

For example, I found the docs on Django’s pagination functionality completely inscrutable. Shell examples? Why should users be submerged by that level of abstraction? Show me a web-based example – with working snippets from model, view, and template – applicable to common scenarios, which I can use as a starting point. Fortunately, I found an excellent blog post on Django pagination in the wild. After corresponding with the author a bit, I was able to modify the official documentation (which is bundled with every Django checkout), submit a Trac ticket, re-build the docs, submit an svn diff patch, and get it accepted into the official docs — which is why you see it there now.

Hosting on Birdhouse

If there’s one big downside to working with a framework based on a language that lacks the widespread support of PHP, it’s in deployment. When you work in PHP, you never have to wonder whether your host will support your app or framework of choice. PHP is like air or water – it’s just there.

If you want to deploy a site with Ruby on Rails, it’s a bit more difficult – you’ve either got to run/manage your own server, or find a web host that supports RoR out of the box. Luckily for Ror developers, the world’s most popular web hosting control panel cPanel supports RoR natively in recent builds.

Django folks don’t have it so good. cPanel doesn’t yet support Django (though I’ve filed a feature request). So if you’re not managing your own server, you’ve got to choose from among a fairly limited selection. Trouble is, I already run a cPanel-based web hosting service. If I was going to be able to deploy any of the sites I was already starting to build, I wasn’t going to be able to wait around for official Django support in cPanel – I’d have to roll my own.

Fortunately, this problem had already been largely solved by @mandric, who has been the hidden mentor/kick-starter behind Django activity at the J-School, and who wrote up an excellent summary on setting up Django with cPanel. Based on his notes, I was able to get Django working on a cPanel host myself, and took my own notes on the process. Today, Birdhouse Hosting officially supports Django hosting. Looks like the first official Django user on Birdhouse will be a commercial wine database with a fairly sophisticated data model. More on that as the project comes along.

A Django Shop

Long story short, we finally came to a decision: The J-School is now officially a Django shop. For better or worse, we’re embarking on the long road toward unifying all of our toolsets around Python and Django. Which means A) Finally creating that mythical Grand Unified Database that represents the entire organization and B) Rebuilding a whole lot of tools around that new database. There’s a lot more web to the J-School than meets the eye — we’ve got a pretty expansive hidden intranet and a whole lot of less-visible sites to support that most J-School site visitors never see. It’s going to be a ton of work, and there will be plenty of bumps in the road, but I’m excited — not just at the prospect of getting all of our stuff into one bucket for a change, but at the learning opportunities this decision represents.

Despite the many frustrations I’ve had with the early stages of the learning process, Django development is fun. There’s nothing like watching things “just work,” the feeling that “web development was meant to be this way.” I look forward to not having to struggle against the expectations and limitations of a monolithic content management system, or of the ghostly chains of bad historical decisions. The process is going to rock as much as the final result.

Wish us luck.