About this classification:

The classification works on the corpus of all the parliamentary questions (oral and written) presented during the VIII term. The corpus is projected in a vector space where the dimensions are the keywords selected through a technique that involves the use of Markov Chains.

In this space, every text is represented by the TF-IDF (term frequency–inverse document frequency) vector.

On this vector space we've trained two different classifiers (svm and random forest). Combining the two classifiers we reach a precision of 81% on our test set.

As you may understand, classifying parliamentary texts involves knowledge of the domain, care when combining the classifiers and a high quality training. Even when all these elements are there, this semi-automatic classification can hardly be perfect, but it's good to continously try to improve it.

Every feedback and help is then more than welcome!