Methodology and data set

For this research we’ve used a CSV file provided by a reddit user that manages a private tracker. For obvious reasons, we are not publishing the name of our source, but the entire data set is available on Github for results verification and reproduction.

In order to create these results we’ve used SQL queries, mostly pretty standard stuff: CTEs, GROUP BY, ORDER BY and LEFT JOIN. The data set is organised into three tables:

feb18 – contains spawn info (spawnId, pokeId, spawnTime)

– contains spawn info (spawnId, pokeId, spawnTime) types – contains Pokemon specie typing information

– contains Pokemon specie typing information pokemon_types – contains generic Pokemon type descriptions

The spawns were aggregated by the hour (hh portion of DateTime) when they occurred. All times in the database are recorded as local server time.

IMPORTANT NOTE

The data in the data set is not normalized. This means that recorded values per hour depend on number of pings during that period. The provider of the data set did not specify if pings are organic from users or from automated bots. Thanks to j16sdiz for highlighting the problem on GitHub. We are investigating the issue and will update results here as soon as possible.