

SQL Test Scores Database

Russell Degnan Apropos to David Barry generously offering his collection of statistical data, and having finally got to updating my parser for the changes to cricinfo's scorecards, I'll do the same. Download the zip file available here: testscores.zip. Instructions for use: To begin Unzip testscores.sql and import into your database. I use mysql, running on XAMPP (which includes PHPMyAdmin). That is sufficient to look at the data. It includes all test matches (bar the ICC XI travesty) up until 24th October 2010. If anyone wanted to rewrite the parse script for ODI/T20 games feel free to do so and I'll add it to the file. The format is relatively straight forward but you'll need to work out what is what for yourself (and know some basic sql. I'll leave comments open on this post for questions).

team - numbered 1 to 10, in the order they first played test cricket.

- numbered 1 to 10, in the order they first played test cricket. game - a test match, use g_id as index reference, g_cricinfoid to reference the cricinfo scorecard.

- a test match, use g_id as index reference, g_cricinfoid to reference the cricinfo scorecard. innings - a team innings, referenced by i_id, references game with i_testid, team by i_teamid and bowling team by i_oppid

- a team innings, referenced by i_id, references game with i_testid, team by i_teamid and bowling team by i_oppid player - a player, uses the name first encountered, p_id for reference, p_cricinfoid for the cricinfo player id, none of the details are filled (TODO).

- a player, uses the name first encountered, p_id for reference, p_cricinfoid for the cricinfo player id, none of the details are filled (TODO). bat - a batsman's innings, references i_id via b_inningsid and p_id via b_bat_pid.

- a batsman's innings, references i_id via b_inningsid and p_id via b_bat_pid. bowl - a bowler's innings, reference i_id via w_inningsid and p_id via w_bowl_pid.

- a bowler's innings, reference i_id via w_inningsid and p_id via w_bowl_pid. extras - extras in an innings, references i_id via e_inningsid.

- extras in an innings, references i_id via e_inningsid. fow - the fall of wickets in an innings, including injured partnerships (marked as unbroken, with two partnerships having the same wicket number. FOW needs careful coding, naive queries will be slightly off because of not outs and retirements. References i_id via f_inningsid, b_id via f_open_bid (batsman in middle, or 1st in order for openers), f_dis_bid (batsman dismissed if any), f_no_bid (batsman not out).

- the fall of wickets in an innings, including injured partnerships (marked as unbroken, with two partnerships having the same wicket number. FOW needs careful coding, naive queries will be slightly off because of not outs and retirements. References i_id via f_inningsid, b_id via f_open_bid (batsman in middle, or 1st in order for openers), f_dis_bid (batsman dismissed if any), f_no_bid (batsman not out). close - score at close of play and batsmen/overs (if available, scorecards are incomplete). References i_id through c_bat_iid and batsmen b_id through c_bat1_bid and c_bat2_bid. Needs parsing of available game notes (g_notes) for luncheon/drinks intervals (TODO).

- score at close of play and batsmen/overs (if available, scorecards are incomplete). References i_id through c_bat_iid and batsmen b_id through c_bat1_bid and c_bat2_bid. Needs parsing of available game notes (g_notes) for luncheon/drinks intervals (TODO). series - collated series of games. References game numbers via s_testid_list. To update Download the scorecard from cricinfo and save somewhere. I have a batch downloader for people running their own webserver (again, XAMPP) called score.php that gets the 20 most recent scorecards; add ?page=X to the URL to get older cards. Copy them to a download directory. You need perl, I use Strawberry perl for windows, as apparently does Larry Wall, but suit yourself. You also need to download (via CPAN) the following packages:

Text::CSV_XS;

HTML::TreeBuilder;

DBI; Strip the crud around the scorecard with the command:

perl strip.pl < download/XCRICINFOID.html > clean/XCRICINFOID.html

Run the parser to add to the database. Will delete all records for that gameid, but can cause unwanted records if it errors (which it might if cricinfo changes their format or for other reasons).

perl players.pl clean/XCRICINFOID.html Update series data (and do any other post-processing):

INSERT INTO Series ( s_season, s_home_tid, s_away_tid, s_length, s_date, s_home_win, s_away_win, s_drawn, s_tied ) SELECT g_season, g_home_tid, g_away_tid, max( g_seriesnum ), min( g_date ), sum( g_home_tid = g_result_tid ), sum( g_away_tid = g_result_tid ), sum( g_result_type like 'drawn%' ), sum( g_result_type like 'tied%' ) from Game where g_seriesid is null group by g_season, g_home_tid, g_away_tid order by g_testnum;

UPDATE game,series set g_seriesid = s_id where s_home_tid = g_home_tid and s_away_tid = g_away_tid and s_season = g_season;

UPDATE series set s_testid_list = (select group_concat(g_id) from game where g_seriesid = s_id group by g_seriesid);

UPDATE game, innings set i_oppid = g_away_tid where g_id = i_testid and g_home_tid = i_teamid;

UPDATE game, innings set i_oppid = g_home_tid where g_id = i_testid and g_away_tid = i_teamid;

UPDATE bat,innings set b_testid = i_testid where b_inningsid = i_id;

UPDATE bowl,innings set w_testid = i_testid where w_inningsid = i_id;

UPDATE extras,innings set e_testid = i_testid where e_inningsid = i_id;

UPDATE fow,bat set f_testid = b_testid, f_inningsid = b_inningsid where f_open_bid = b_id; Any problems or suggestions drop a comment here. Known Problems Players who change names get the first instance of their name (notably MoYo), best manually edited.

Grounds are stored by name, not id.(TODO). This gets around the above problem however.

Cricket 24th October, 2010 12:23:18 [#] Comments SQL Test Scores Database

Thanks a ton!

soulberry 24th October, 2010 19:41:59

SQL Test Scores Database

You're welcome!

Russ 25th October, 2010 22:36:51

SQL Test Scores Database

Hi,



I've been coming to ur site once in a while and have been finding ur work very interesting, specially the test ratings. I tried what has been explained above, but still its a bit confusing for me. Can you help me out with this if you have time? My mail id: vishall2402@gmail.com

vishal shah 21st March, 2011 07:51:51

SQL Test Scores Database

Hi Vishal, happy to help, but it is probably easier (and more helpful for others) if you ask questions here and I'll answer them as best I can.

Russ 21st March, 2011 20:42:41

SQL Test Scores Database

Do you have a downloadable copy of the script which we can use to parse and download cricinfo scorecards?

Sumit Gupta 2nd January, 2012 18:03:30

SQL Test Scores Database

Sumit, the zip file contains the script. You can modify it as you need. I will update it further in the next few days.

Russ 3rd January, 2012 09:23:25

SQL Test Scores Database

Fantastic and immense effort, I guess. I was starting to write one myself, but thought of searching for one in the net and stumbled upon this. Excellent database with almost perfect database design. Thank you very much.

Unni 27th January, 2012 01:36:46





