How does the cognitive system encode the location of objects in a visual scene? In the past decade, this question has attracted much attention in the field of visual-word recognition (e.g., “jugde” is perceptually very close to “judge”). Letter transposition effects have been explained in terms of perceptual uncertainty or shared “open bigrams”. In the present study, we focus on note position coding in music reading (i.e., a 2D scenario). The usual way to display music is the staff (i.e., a set of 5 horizontal lines and their resultant 4 spaces). When reading musical notation, it is critical to identify not only each note (temporal duration), but also its pitch (y-axis) and its temporal sequence (x-axis). To examine note position coding, we employed a same–different task in which two briefly and consecutively presented staves contained four notes. The experiment was conducted with experts (musicians) and non-experts (non-musicians). For the “different” trials, the critical conditions involved staves in which two internal notes that were switched vertically, horizontally, or fully transposed — as well as the appropriate control conditions. Results revealed that note position coding was only approximate at the early stages of processing and that this encoding process was modulated by expertise. We examine the implications of these findings for models of object position encoding.