Update 23.02.2016: I rewrote/improved this script in Python, dropping the database bit, focussing on getting a nice movie page of a list of movies. Cleaner code, better performance (< 1 minute)

In this post I will show you how you can easily import IMDb data for your movies and process the XML to get SQL for database import. From there you can start to build your movie site.

IMDb API

In this post I use a small DVD collection. The movie data (director, actors, year published, etc.) are from IMDb and thanks to IMDb API (Brian Fritz), you get the data easily via curl. At the end of this post you can see a 5 min video demo of the whole process.

How it works

First I create a database and insert the movie collection table: table.sql - all code is here

getMovieData.pl serves to import the movie data from the API with curl and converts the XML to SQL. The escapeSingleQuote function allows the output to contain single quotes.

batch.pl is a little wrapper to run the getMovieData.pl on each title of the movie list you provide as input file. So for the movielist.txt example file you get 41 INSERT statements.

Conclusion

In 3 simple steps you get a complete set of data for each movie title, in a standardized format. Importing this data in a database allows for easy app development. You can quickly lookup what movies you have of your favorite actor or director, what movies were released in 2009, which movies were highest rated or had the most votes, etc. Having the data in a database, makes life easier :)

An example in PHP



- code -

Video Demo

Feedback

Update 22.11.2011

Comments and suggestions regarding this post on hacker news. There are some improved versions of the perl script, for example here. I appreciate your feedback to improve my Perl skills. What the IMDB TOS is concerned: this is for personal use only.

Update 20.02.2016

This post still gets a lot of traction. If you need a Python solution contact me, I can probably do it much more better/elegantly as the Perl above. Also the imdbapi links does not work anymore, it would be omdbapi. You probably want to look into as well, see this recent post for an example in Python. I also used that API to build Sharemovi.es.