On that note: EU election week! Time to map our complex individual preferences to one of a few packages of lumped together stances.

Luckily, with the advent of political matching tool websites, this is as easy as answering a couple of questions on a one dimensional scale. The algorithm then takes care of the optimal mapping, you can cross the right box on the ballot and get on with your life, knowing you’ve been a good citizen and all will be well!

As much as I love one dimensional reduction, though, I have an odd preference for the two dimensional reduction that Kieskompas (Election Compass) has offered for years now in the Netherlands. Yesterday morning, I took the test.

The balls are the major parties (we got a lot of ’em in the Netherlands), the marker is me. Links = left, rechts = right, progressive, conservative, you get the idea.

Arrgh, right in the middle of the black hole of despair!

I’m definitely not going to vote for 50 Plus, the (seemingly) one-issue pensionado party. All the other parties are miles away, so what good is this advice? How could it even be that 50 Plus is so close to me? Is there some kind of weird averaging going on? Are my (or 50 Plus’) answers canceling each other out in such a way that we just happen to end up in the exact same place, even though we’re diametrically opposed on every issue, which I must assume we are?

The only logical explanation.

Spoiler alert: it turns out this party actually isn’t such a bad fit for me.

80s meme, suck on that, millennials!

But I sure as hell wasn’t going to accept that without a fight. I was sure something was wrong with the algorithms. I was going to get to the bottom of this and find out a way to get me a better match. Get me my data sciencing tools!

Me, sciencing them datas! This must have been something like a couple years ago… wait, 2006…?!

Data gathering

The Kieskompas contains 30 questions that can be answered on a 5 point scale: “strongly agree” (2), “agree” (1), “neutral” (0), “disagree” (-1) and “strongly disagree” (-2). You can also skip questions. The answers are clustered into 7 topics, but I won’t use that here.

So there’s two basic pieces of data I need here: my answers and those from the parties.

My own answers

I already filled out all the questions on the website, so the data should be around somewhere in my browser cache. Using Firefox’s development console, I was able to find it in the storage section (Shift-F9) under Local Storage.

There’s my data.

This JSON data can be easily copy-pasted into a Jupyter notebook, where I was going to Python-fu this thing into submission. Using an advanced JSON to Python conversion technique (first cell below) and some Pandas magic, we end up with a neat dataframe:

The important column is answer.value , which I can compare to the answers given by the political parties.

Party answers: enter ipysheet

That party answer data, however, was not as readily scrapeable. Possibly, there is some copyright involved there as well (which is also the reason I’m not sharing all this stuff publicly), which forced them to hide the data a bit. Possibly, I’m just not looking in the right places. Whatever the case, I couldn’t find it.

Luckily, the website offers a page to compare all your answers to all the parties’ answers, so we can just manually copy all 30 of them for all 12 available parties!

That’s only 360 numbers.

“Manually copy?!” I hear you object. Normally, I would agree with you, even if it’s just 360 numbers, manual labor goes against everything I stand for as a lazy programmer (or rather: sit for), but I was getting desparate for data and I had hit a wall. Also, this gave me a good excuse to try out ipysheet!

The ipysheet Jupyter widget gives you an editable spreadsheet table right in your Jupyter notebooks. We can generate the table we need to fill in all the answers like so:

That fills it up with zeros that we can then change to their proper values based on the answers on the Kieskompas website. After doing that, you can export them to a dataframe, which you can then easily save to your favorite file format:

Obviously, csv is the only file format we’ll ever need. None of that newfangled mumbo jumbo for me, no sirree!

Science that data!

Now for the fun part: comparing the numbers in such a way that makes the results more palatable.

T-SNE

My first thought was to try an alternative method of dimensional reduction. And what do we all think of first as true hipster data scientists? T-SNE!

If you’re interested in learning about this method, check out this article on T-SNE that looks pretty good. Since I was impatient, I didn’t read it at all, but just went straight for scikit-learn, copy pasted the first thing I found, and tried to reduce the differences between my answers and those of all the parties to two dimensions:

Reload and mangle the data.

Transform the 30-dimensional answer-difference space with T-SNE to a two-dimensional space.

Being an optimist, I completely expected the T-SNE dimensions to be better than the Kieskompas dimensions (which must have been meticulously designed and perfected over the past decade or even longer).

Perhaps not surprisingly, I was disappointed. Naive application of T-SNE just gives random results. Each run gives a totally different result: