During the early part of its development, Wikidata used a hierarchical taxonomy to organize its data entries. The system was called GND—a German initialism, Gemeinsame Normdatei, which translates to “Integrated Authority File.” GND was originally meant to organize bibliographic information across library systems, though it was expanded recently by Internet technologists to work for non-library systems, too.

Which sounds pretty good, right? If you have to schematize the set of known information, you might start with a system originally built for librarians. After all, libraries are institutions tasked with the sustainability and categorization of knowledge. They’re the historical experts.

Which all sounds good... until you encounter certain problems. Silly, wonderful, ridiculous problems.

Gerard Meijssen, an employee of the Wikimedia Foundation, talked about them to Emw, a Wikidata editor, in a recent blog post. Here are the kind of problems GND entailed for Wikidata:

GND groups everything into huge taxonomical categories. Those categories are:

Person.

Organization.

Place.

Event.

Work.

Term.

... and that’s it. Everything known in the world must fit into one of those containers. Everything ever knowable, to some degree, must fit into one of those containers. Information, that system proposes, comes in six essential types.

(Which makes the “person” macro-classification a little poignant: At the universal base layer, we give a starring role to ourselves. A person is unlike an organization or place or work; a person is so unlike anything else to deserve its own piece of cosmic tupperware. It’s an anthropocentric view, and, given our current understanding of what kind of things live in the universe, maybe a bit of a correct one.)

So what are the problems of this system, with its six terms?

First of all, it doesn't differentiate, at this level, between the physical and the abstract, and that includes people. Turns out giving “person” a starring role is its downfall, as Barack Obama and Jay Gatsby then have to share a macro-taxonomical level. You can’t separate out the fictional from the non-fictional at this level.

Second, in the words of EmW:

Any item that is not a person, place, event, organization or work is classified as a “term,” which contains virtually no information. We need to be able to classify things like gravity, carbon, DNA, cancer, clarinet, Twelver Shia Islam, fashion boot, dog and potato as more than simply "terms".

Sousaphone, selenium, sex: They’re so similar, said GND, that they should be sorted as the same species of stuff. But that’s, well, silly, and when Wikidata ditches GND it will lose that organizational fluke.

Overall, writes Emw, “these main types are fine as a way to classify items of general interest in a large library, but they're much too small to form a sound basis for a classification system for all human knowledge.”