A small code block later (as everything is usually more interesting than just studying), I was getting all articles related to economics from Polish Wikipedia (for example, articles related to economic policy). A moment later, I was getting views per day for each article! If you go deeper, you can get precise hours after playing with database files a bit. Cool! But what is the purpose?

Views per day for an article about Nash equilibrium. This was a small inconvenience as exams are not held on the same date each year.

Results

Optimistic version —we know what phrases people study before exams (previous years’ notes, etc.), what they are checking just after the exam (most likely it was on the test itself), or in next couple of days (checking correct results, talking with friends etc.). If we observe a sudden spike in popularity of articles that are going to be on the exam (just before it), most likely exam is compromised and leaked somewhere online. In my case, I focused on articles that suddenly got popularity after the exam but not before — potentially unexpected topics.

Realistic version — as you might expect, Wikipedia is not only for SGH students and, for example, in 2014, just before the qualifying test itself, there was a nation-wide knowledge competition about economics. As you can guess, results in such case were useless. So any such anomaly is very problematic. Also, as you go back in time, results are less and less significant (due to not much internet access, fewer smartphones and smaller FB groups).

Results from 2014, sorted by home-calculated deviation from expected. The last title is the name of a nation-wide competition.

So, can I stop learning?

For now… no. The above work can be treated as an interesting fact with some other potential uses — detecting leaked exams, or trends in topics over the years for any kind of ‘mass’ exams (qualifying exam for SGH is for 2000+people). In my case, I don’t recall any topics or articles on my exam that I learned thanks to this way of studying. But mixing passion with learning was a perfect solution for me — I was learning economics while doing interesting stuff! Also, you don’t expect to get the same questions each year, so overall trends might be more useful than single articles from Wikipedia.

Last words

I’m very curious if with more and more smartphones (googling questions just after exam) and Facebook groups to share who-remember-what from test, this method will over time be more and more accurate. Maybe even to the point that universities will start to ‘mask‘ important exams with other tests or end-of-term examinations to generate noise in the analysis? Also, it would be very interesting to see trends over the years on what people learn before exams (like Brand24, but for education), or just monitor it to detect too many correct searches in the hours before the test itself to watch for potential leaks. For sure the future for Big Data is still very interesting as we uncover more and more interesting patterns, and myself, despite not going for a Big Data specialization, I’m very happy at E-Business.

And now I’m going back to old-school learning for my exams tomorrow. Let me know if you have any other possible applications for such Wikipedia-visit-monitoring strategy in a comments below!