(View a live django-springsteen example here.)

(Also, I've greatly simplified the process of deploying django-springsteen on Google App Engine, as explained here, but you will still want to read this article to understand how to customize Springsteen.)

Vik Singh premiered running Yahoo! BOSS on Google App Engine quite some months ago, but django-springsteen is a somewhat different magician than the BOSS Mashup Framework, so hopefully you'll forgive a bit of repetition.

In this article I'll walk through the brief steps necessary to deploy a slightly interesting search engine by taking advantage of Springsteen's builtin support for Yahoo! BOSS, Twitter, and Amazon. It shouldn't take more than half an hour to get up and running.

First register a new Google App Engine applicaton. I'm using djangosearch , because apparently I registered it the first time I did a lame Yahoo! BOSS on GAE tutorial. It's good to know I'm not stuck in a rut or anything.

Checkout the django-springsteen source from GitHub. git clone git://github.com/lethain/django-springsteen.git django-springsteen

Now we're going to salvage some relevant pieces of django-springsteen 's repository, adapt them to our new purposes, and throw away the rest. mv django-springsteen/example_project/ ./djangosearch mv django-springsteen/springsteen/ djangosearch/ rm -rf django-springsteen

Next we want to grab a recent Django tarball from djangoproject.com/download/. tar -xvf Django-1.0.2-final.tar mv Django-1.0.2-final/django/ ./ rm -rf Django-1.0.2-* rm -rf django/bin django/contrib/admin django/contrib/auth rm -rf django/contrib/databrowse django/test rm -rf django/contrib/admindocs django/contrib/gis (We need to remove these files to get under the 1000 file limit for Google App Engine. You can also do some file zipping magic to get around it, but this approach is a bit simpler.) Actually, django-springsteen pretty much works with Django 0.96 except for using the safe template filter in some of the templates. If you're willing to strip it out, then you could skip installing a more recent version of Django.

And next we need to scavenge several pieces from the django_example for Google App Engine. First create the djangosearch/main.py file with these contents. import logging , os , sys # Google App Engine imports. from google.appengine.ext.webapp import util # Remove the standard version of Django. for k in [ k for k in sys . modules if k . startswith ( 'django' )]: del sys . modules [ k ] # Force sys.path to have our own directory first, in case we want to import # from it. sys . path . insert ( 0 , os . path . abspath ( os . path . dirname ( __file__ ))) # Must set this env var *before* importing any part of Django os . environ [ 'DJANGO_SETTINGS_MODULE' ] = 'settings' import django.core.handlers.wsgi def main (): # Create a Django application for WSGI. application = django . core . handlers . wsgi . WSGIHandler () # Run the WSGI CGI handler with that application. util . run_wsgi_app ( application ) if __name__ == '__main__' : main () Next we need to create the djangosearch/app.yaml file. (Be sure to replace djangosearch with the name of the application you registered.) application : djangosearch version : 1 runtime : python api_version : 1 handlers : - url : /static static_dir : static - url : /.* script : main.py And finally djangosearch/index.yaml . indexes : # AUTOGENERATED # This index.yaml is automatically updated whenever the dev_appserver # detects that a new type of query is run. If you want to manage the # index.yaml file manually, remove the above marker line (the line # saying "# AUTOGENERATED"). If you want to manage some indexes # manually, move them above the marker line. The index.yaml file is # automatically uploaded to the admin console when you next deploy # your application using appcfg.py.

Next open up djangosearch/local_settings.py and add these at the bottom. ROOT_URLCONF = 'urls' MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware' , 'django.middleware.doc.XViewMiddleware' , ) INSTALLED_APPS = ( 'springsteen' ,) DATABASE_ENGINE = None DATABASE_NAME = None CACHE_BACKEND = "dummy:///"

Create the djangosearch/boss_settings.py file, which contains only the BOSS_APP_ID parameter, and AMAZON_ACCESS_KEY if you have one. (You'll need to sign up here and here to get a AWS Affiliate ID if you want Amazon search results, which is the uninspired man's choice for monetizing a Springsteen service out of the box.) BOSS_APP_ID = "abcdefghijlknop"

Tweak the djangosearch/urls.py file to remove all references to example_project , as well as removing the extra url patterns. from django.conf.urls.defaults import * urlpatterns = patterns ( '' , ( r'^$' , 'views.search' ), )

Time out. Let's pick a topic for our new search engine. Hmm... hmm.... Okay, let's make it a search engine that is specialized on Apple products. What could go wrong?

Next let's configure our search results. Go ahead and open up djangosearch/views.py , and start by removing everything. Start rebuilding by adding these imports: from springsteen.views import search as default_search from springsteen.services import Web , TwitterLinkSearchService , AmazonProductService from django.conf import settings Next let's create our Amazon product service (if you have an Amazon Affiliates AWS key). class ComputerAmazonSearch ( AmazonProductService ): _access_key = settings . AMAZON_ACCESS_KEY _topic = 'apple' Followed by creating an Apple flavored Twitter service. class AppleTwitterService ( TwitterLinkSearchService ): _qty = 3 _topic = 'apple' Finally we just need to mix in web results from Yahoo! BOSS and then expose our new search engine. def search ( request , timeout = 2500 , max_count = 10 ): services = ( ComputerAmazonSearch , AppleTwitterService , Web ) return default_search ( request , timeout , max_count , services ) A Short Warning Please note that Yahoo! BOSS search results won't be retrieved successfully when you are testing your springsteen application locally. However, they will be correctly retrieved once you push your app to production. I'll look into some kind of patch for this, but no need to panic.

At this point we have everything working correctly, but results are just stacked on top of each other. Sure, you might love having those Amazon affiliate links clustered at the top, but your users might not. Now's a nice time to dip our toes into relevency. You want the most relevant results to bubble to the top (feel free to make a bubble sort pun), but naively stacking results from different services doesn't permit that unless all results from source A are more relevant than those from source B, all results from B are more relevant than those from source C and so on. Let's take a stab at a very simple relevency algorithm to address these problems. You can think of two kinds of relevency approachs: Scoring results on their individual merits. We might call this intrinsic relevance. Scoring results in regard to each other. We might call this contextual relevance. We're going to do a little bit of both here. First we're going to boost results which contain the query term in their title, and second we're going to punish the 2nd-Nth results from an already encountered domain. Place this code in views.py above the search function. def ranking ( query , results ): query = query . lower () def rank ( result ): score = 0.0 title = result [ 'title' ] . lower () if title in query : score += 1.0 return score scored = [( rank ( x ), x ) for x in results ] scored2 = [] domains = {} for score , result in scored : domain = result [ 'url' ] . replace ( 'http://' , '' ) . split ( '/' )[ 0 ] times_viewed = domains . get ( domain , 0 ) new_score = score + times_viewed * - 0.1 scored2 . append (( new_score , result )) domains [ domain ] = times_viewed + 1 scored2 . sort () return [ x [ 1 ] for x in scored2 ] And then update the search function to use this ranking function. def search ( request , timeout = 2500 , max_count = 10 ): services = ( AppleAmazonSearch , AppleTwitterService , Web ) return default_search ( request , timeout , max_count , services , {}, ranking ) Now our results are ranked using the above ranking function. This is a pretty basic approach to relevancy, but hopefully shows the basic concepts.

Now we have our search engine up and running, and is a good time to customize your site's templates. First make a templates directory within djangosearch , as well as a templates/springsteen directory and a few empty files. cd djangosearch mkdir templates templates/springsteen touch templates/base.html touch templates/springsteen/base.html Then let's edit the templates/base.html file. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title> {% block title %} FruitySearch {% endblock %} </title> <link rel= "stylesheet" type= "text/css" href= "/static/css/reset.css" > <link rel= "stylesheet" type= "text/css" href= "/static/css/search.css" > </head> <body> <div id= "body" > <div id= "hd" ><h1><a href= "/" > FruitySearch </a></h1></div> {% block body %}{% endblock %} <div id= "ft" ><p> A <a href= "" > Your-Name-Here </a> production, 2009. </p></div> </div> </body> </html> Next edit templates/springsteen/base.html (this one if pretty brief). {% extends "base.html" %} {% block body %}{% endblock %} As you continue customizing the appearance of your results, you'll probably want to override ` templates/springsteen/results.html , but for the time being it should be a reasonable default.

Create some CSS to style the site. cd djangosaerch mkdir static static / css The current base.html assumes you'll have reset.css and search.css files. Recently I tend to use YUI's reset.css, and just mashed together some custom stylings for search.css .