White Space

White space is hard to see and can cause confusion

Perhaps the most common source of annotator disagreement is inconsistent labeling of trailing and leading whitespaces and punctuation. That is, one annotator, might label “Tal Perry” and the other will label “Tal Perry “ or “ Tal Perry” or “ Tal Perry “. This issue also appears with trailing punctuation such as “Tal Perry.”

When measuring annotator agreement or deciding on a golden source of annotation, these conflicts lead to lower agreement scores and ambiguity in the golden set. These errors are particularly frustrating because the annotation is conceptually correct, and a human wouldn’t really notice or care about the difference.

In fact, that subtlety is the root cause of these kinds of errors. Typically, your annotators are not concerned with how your algorithms calculate agreement and won’t notice or care about the difference between “Tal Perry” “Tal Perry “ unless explicitly told to do so.

In that regard, the solution is simple, your annotation tool should visually indicate to annotators when they have captured trailing and leading white spaces and let them decide if that is correct according to the guidelines you have set.