We all know what the best type is (It’s Psychic. Prove me wrong.), but what does the data say?

Sometimes, Data Science is about making hard choices, or earning a company tons of money. Other times though, Data Science can just be a fun hobby in a rainy weekend.

This week I’ve been very busy with a college assignment: a few classmates and I have had to code the PageRank algorithm — the one Google used to use for searches, before AI and NLP ate everything else.

What is PageRank?

PageRank is an algorithm used to get a ranking for connected parts of a system — perfect for ranking websites, its original purpose.

Based on its original task, it measures and ranks the importance or influence of many nodes (websites) that link to each other (edges).

The algorithm takes as its input a directed Graph, and returns a ranking of its nodes -along with some scorings between 0 and 1- with a few criteria:

You rank higher if more nodes link to you.

Linking to another node is less relevant if you link to more nodes.

Being linked by higher ranked nodes is better.

This is coherent with the idea that a big site, like Medium, will be linked by a lot of sources, whereas being linked by a big site (say, Facebook’s home page) also means your site’s pretty relevant. You could also use it to model relevance of scientific papers or publications (using citations as the links), or probabilities that an animal gets eaten in a given ecosystem (with a food chain as graph).

So I’d been meaning to write an article about something fun, and I had this PageRank implementation laying around… I couldn’t miss the chance.

Getting the data

First I got down to getting the data.

Since all I wanted to model was the type relationships, I was about to use Bulbapedia (Pokémon’s Wiki) — but then I figured someone else had probably already coded that bit. Effectively, a few seconds of search got me to this awesome link, from which I took the Python matrix of type advantages in the game.

This uses the types and relationships from the 6th generation. I haven’t gotten around to playing the 7th one yet, so I didn’t mind.

Having the raw matrix already loaded in Python, I had to give it the correct format: The links in the kind of graph PageRank takes as input carry no weight — they’re binary: either you link to something or you don’t (if there are different implementations of PageRank that address this, please let me know in the comments!).

In case you haven’t played Pokémon or you’re fuzzy on the details, each type can either be neutral towards another type (most types are mutually neutral), have an advantage (does 2x the damage), a disadvantage (does 1/2 the damage) or an immunity (receives 0 damage).

The way I moved that format (4 different relationships) to a binary domain was simply doing two different graphs: one for attackers, and one for defenders. I also bunched immunity and resistance into the same category (defensive advantage, if you will) — I hope that won’t offend any hardcore fans.

There’s also another issue: PageRank doesn’t take into account sites linking to themselves, and so we can’t use data about a pokémon type being weak to itself or effective against itself. (Full disclosure: I didn’t realize this in the first draft of this article, a reader made me notice my mistake. Results have changed.)

PageRank Results

Here are the results I got from modeling attacking types. I only linked type A to type B if type A was very effective against type B, without linking any types to themselves (since that goes againt PageRanks preconditions).