I recently found a lot of data on Taipei’s MRT Ridership, and decided to do some analysis of it. I focused a lot on visualization during this project, as I am trying learn data visualization techniques better. One reason I was looking for this data is I had an idea for a research project which was to analyze whether or not Taipei’s air quality will influence the usage of public transportation. I’ve decided to delay this project for awhile, as I don’t think it’s novel enough, but I already had the data so I decided to do some data visualization using Taipei’s MRT Ridership statistics.

To start off, you can find all the data for Taipei’s MRT ridership here. That page contains files for each month from the year 1996 (minus January & February) to 2019 (up to June). This is a huge amount of data, which means I was able to do some pretty cool stuff with it. I’ve created a table which shows an excerpt of the data from January 2019. I removed most of the days just to save space, as the the table with all of the days is quite big.

Date Day Ridership Counts Jan.1 Tue 1,628,886 Jan.2 Wed 2,252,978 Jan.3 Thu 2,269,048 Jan.4 Fri 2,288,084 Jan.30 Wed 2,321,252 Jan.31 Thu 2,306,449 Monthly Total Ridership 68,003,177 Average Daily Transport Ridership in Jan. 2,193,651

Data Processing Stage

What I did to speed up the process of getting all of these files (264 in total) was to write a Python script which just downloaded all of the data from each file, then saved it all to one massive CSV file. This let me read the file in pandas, and do all of my sorting / filtering in Python. One thing that I did was to not include the 1998-1999 values. This was due to the fact that they have much smaller values than the following years. The number of riders for 1998 was 60737782.00, where the number of riders for 2000 was 268716740.00. This is quite a big difference, so to make my graphs look a bit better, I decided not to show the data from these first two years. I also skipped the years 1997 and 2019 as they do not have complete data, and I had decided to only use years with complete data for this portion of the project.

Analysis

The first thing I wanted to determine was over the years, on which day of the week is Taipei’s MRT the busiest. I found out that Friday is the busiest day throughout the years, not counting the year 2000 where Saturday is actually the busiest. With Sunday as the least busy day, I thought this was quite interesting. There is actually a significant drop on Sundays, which is what I thought was the most interesting. I really have no idea why this is.

Taipei mean ridership per year per day

Then I wanted to find some interesting days to take a look at it. So, I found the lowest days for each year from 1998 – 2018. I was wondering if there were certain days such as, major holidays, that would have interesting ridership numbers. I found out that it definitely is quite interesting. Below, I’ve created a table below which shows the year, and min / max ridership dates.

MRT Ridership min and max values per year

1997 8/18/1997 4077 12/31/1997 238289 1998 10/16/1998 13471 12/26/1998 450075 1999 9/21/1999 34107 12/31/1999 1154986 2000 2/4/2000 206990 12/31/2000 1395391 2001 9/17/2001 14116 12/31/2001 1330968 2002 9/6/2002 238223 12/31/2002 1407202 2003 5/11/2003 297953 12/31/2003 1366690 2004 8/24/2004 165502 12/31/2004 1422818 2005 7/18/2005 128496 12/31/2005 1565118 2006 1/28/2006 317600 12/31/2006 1576361 2007 10/6/2007 150664 12/31/2007 1929780 2008 9/28/2008 227616 12/31/2008 1950822 2009 8/7/2009 267406 12/31/2009 2168321 2010 9/19/2010 372517 12/31/2010 2500877 2011 2/2/2011 547309 12/31/2011 2041690 2012 8/2/2012 499453 12/31/2012 2065432 2013 8/21/2013 511823 12/31/2013 2752203 2014 7/23/2014 627479 12/12/2014 2304930 2015 8/8/2015 418145 12/25/2015 2448812 2016 9/27/2016 150242 12/23/2016 2539837 2017 1/27/2017 791835 12/15/2017 2507419 2018 2/15/2018 829655 12/14/2018 2562165

One interesting thing is that the following days with the least amount of riders are days where Taiwan had cancelled work due to an incoming typhoon; 8/7/2009, 9/19/2010, 8/2/2012, 8/21/2013, 7/23/2014, 8/8/2015, and 9/27/2016. Though, one thing to note is that I only have cancellation data from 2000-2018, so I wasn’t able to find any information about cancellations prior to the year 2000. The other days are all days which are during Chinese New Year. Which makes sense, as most people will either be traveling to other cities in Taiwan, at home with family, or traveling abroad. I’ve taken the MRT on these days, and it is very sparsely populated.

Taipei average MRT ridership increase per year.

Next, I decided I would analyze how much the ridership has grown over the years. I expected it to consistently grow every year, but I was curious by how much, and how quickly. I did this by creating a table which I have added below. The columns are the number of riders for the entire year, the year, the difference from the previous year, and the percentage change.

num_riders year change_from_previous_year percentage_change 31081505.00 1997 60737782.00 1998 29656277.00 95.41455 126952122.00 1999 66214340.00 109.0167 268716740.00 2000 141764618.00 111.6678 289642714.00 2001 20925974.00 7.787373 324433557.00 2002 34790843.00 12.01164 316189128.00 2003 -8244429.00 -2.54118 350141956.00 2004 33952828.00 10.73814 360729803.00 2005 10587847.00 3.023873 384003220.00 2006 23273417.00 6.451759 413963685.00 2007 29960465.00 7.802139 450024415.00 2008 36060730.00 8.711085 462472351.00 2009 12447936.00 2.766058 504286734.00 2010 41814383.00 9.041488 566404489.00 2011 62117755.00 12.31794 602199342.00 2012 35794853.00 6.319663 634961083.00 2013 32761741.00 5.440348 629800568.00 2014 -5160515.00 -0.81273 659348369.00 2015 29547801.00 4.691612 677551828.00 2016 18203459.00 2.760826 680425483.00 2017 2873655.00 0.424123 678304570.00 2018 -2120913.00 -0.3117

What I found first off is that average growth of passengers per year (this is based on individual trips, so I personally would contribute ~700 trips a year on average) is 30820145.95. One other very interesting thing to note is the years with a decrease in passengers. These years are 2003, 2014, and 2018. I also find it interesting that the early years have massive percentage increases. The year 2000 had a 111% increase in MRT usage. Where a year like 2017, has only a 0.42% increase.

Conclusion

I thought it was quite interesting to go through this data, and I was really surprised that my typhoon cancellation data was useful for this project as well. I am very curious what other correlations I could find if I dug into this data more. If I happen to find any other interesting things, I will create a second post for this. For all the plotting, I used matplotlib which is a Python library that makes it really easy to create graphs. I also used Excel to make the tables, and make a few calculations like the percentage change easier.

If you’re interested in other data analysis, or data science work I have done. You can take a look at this post https://codingoverload.com/2019/07/historical-taiwan-weather-data-online/. If you are interested in other cool Python libraries, I recommend taking a look at my Snips NLU post, and my Darkskylib post.