For theoretical neurobiologist and author Mark Changizi, “why” has always been more interesting than “how.” While many scientists focus on the mechanics of how we do what we do, his research aims to grasp the ultimate foundations underlying why we think, feel and see as we do. Guided by this philosophy, he has made important discoveries on why we see in color, why we see illusions, why we have forward-facing eyes, why letters are shaped as they are, why the brain is organized as it is, why animals have as many limbs and fingers as they do, and why the dictionary is organized as it is.

His latest book, The Vision Revolution, is a trenchant and insightful investigation into why humans see and interact with the world as we do. His findings are challenging and often surprising, and his witty, engaging style is accessible to a broad range of readers . He was generous enough to spend a few minutes with me recently to discuss his book and other topics.

NN: What originally led you to write a book about human vision in particular, instead of any of the other human evolutionary adaptive traits?

MC: Indeed, I don’t consider myself solely a vision scientist. I call myself a theoretical neurobiologist, more generally, and I have had a number of non-vision research directions, including, for example, the shape and evolution of the brain, and why animals have as many limbs and digits as they do. Some of these research directions were central parts of my first book, The Brain from 25,000 Feet.

I was led to a book on vision because that’s where my research led me, and so the question is, Why did I end up with quite a few research directions in vision?

As a theoretical neurobiologist, I try to find interesting phenomena that I can wrap my head around, with the hope of putting forth and testing rigorous and general explanatory hypotheses. That’s not easy, but there are a number of reasons why it’s easier for vision.

First, relative to other senses and/or behaviors, the amount of data we possess for vision is huge. There’s a century-sized pile of data, much of it not well explained, much less in a unifying manner.

Second, vision is theoretically approachable. You have a visual field, you see objects, and so on. We know how to at least begin thinking about the phenomenology. It’s more difficult for audition, and practically impossible for olfaction, where we have little idea how to even describe our perceptions. …forget about explaining anything!

And, third, for vision we have the best understanding of the underlying mechanisms.

My point is that, as a theorist struggles for phenomena he or she can crack, vision appears as a large attractive target compared with many other aspects of brain and behavior. One may end up attacking vision problems even if one isn’t excited by vision, merely because it’s juicy. (I am excited by it, though, especially to the extent that I can find exciting hypotheses.)

I was intrigued by the “mind reading” aspects of vision. In a nutshell, how does this work, and how do humans benefit from this ability?

Our color vision fundamentally relies upon the cones in our retina, and I argue in my research that color vision evolved in us primates for the purpose of sensing the emotions and states of those around us. We primates have an unusual kind of color vision – our cones sample the visible spectrum in a peculiar fashion – and I have shown that one needs that kind of peculiar color sense in order to pick up the color modulations that occur on our skin when we blush, blanch, redden with anger, and so on. Our funny primate variety of color vision turns out to be optimized for seeing the physiological modulations in the blood in the skin that underlies our primate color signals.

So, we evolved special mechanisms designed for sensing the emotions and states of others around us. That sounds a lot like the evolution of a “mind-reading” mechanism, which is why I (only half in jest) describe it that way.

You mention in the book that reading and writing are relatively recent advances in human development, and yet we take for granted that we “see” and understand words, as if our brains were simply meant to see and understand them. What’s really going on that allows us to make sense of symbols on a page—and why can we do this at all?

In talks I often show a drawing of a child reading a book titled “How to Somersault.” The “joke” is that most kids are able to read very early, often even before they can do stereotypical ape behaviors like somersaults and monkey bars. Sure, they comprehend speech much earlier, but they’re getting orders of magnitude more speech thrown at them than writing. Kids learn to read very early, and very well; and as adults we are ridiculously capable readers, and spend nearly all our day reading.

Aliens might be excused for thinking we evolved to read.

But the invention of writing is only thousands of years old. In addition, for most of us, our grandparents, great grandparents or great great grandparents didn’t read at all. Writing is much too recent for our brains to have evolved to have reading mechanisms.

How does our brain do it?

Is it because our visual system can become good at reading whatever we present to it? No. Kids would surely not be capable readers by around six if they were tasked to read bar codes or fractal patterns.

The solution is that culture made writing easy on the eye, by shaping letters to be what the eye likes. The idea that culture shapes our artifacts to be good for us is not new. What’s new here is a specific hypothesis for what writing should look like in order to be good for us.

To be easy on the eye, writing needs to “look like nature,” just what our illiterate visual systems are fantastically competent at processing. The trick of that research direction was making this “writing looks like nature” idea rigorous, and coming up with ways of testing it. I show that there are certain signature visual patterns found in nearly any natural environment with opaque objects strewn about, and that these signature patterns are found in human writing. In short, writing has evolved so that written words look like visual objects.

You say that there are several visual “tricks” that makes us anticipate the next moment, while ensuring that the next moment never comes (one of these tricks you call “representational momentum”). Explain a little bit about how this works, and how does this misperception affect us in our day-to-day lives?

When light hits our retina, what our brains would like to do is instantaneously generate a perception of what the world looks like. Alas, our brain can’t do this instantaneously. Our brains are slow. It takes around a tenth of a second for your perception to be built, and that’s a long time when you’re moving about. If you perceived the world the way it was when light hit your eye, you’d be having a tenth-of-a-second old view of the world.

Because of this, visual systems have evolved mechanisms to try to generate a perception not of the way the world was when light hit the eye, but generate a perception of the way the world will be by the time the perception occurs in a tenth of a second. By the time the perception is elicited, the anticipated future will have arisen, and the perception will be of the present. That is, in order to perceive the present (have perceptions at time t that are of the world at time t), our visual systems must anticipate the near-future.

These mechanisms are, I argue, up and running at all times, looking for all sorts of cues in the stimulus in an attempt to guess the way the world will change in the next moment.

And this is where “tricks” come in. If we can cotton on to the cues your visual system is looking for in its attempt to guess the near future, then we can concoct artificial visual stimuli having these cues, but make sure they do not change as they “should” in the next moment. That way, when you look at them, your brain will generate a perception of what “should” happen next, but it will now be wrong due to the mad, evil psychologist.

The classical geometrical illusions are one of many classes of illusion that can be explained in this way. An example can be seen below, a variant of the Orbison illusion, where all the squares are actually the same, but appear quite different depending on where they lie within the radial display. Here is a link where I have written up a very short explanation of illusions like this:

If you were in charge of designing a new human vision system, what improvements would you make to the one we have now? What would you fix – and do you think there’s still evolutionary “room” left for human vision to evolve?

If we were still living out in nature as we Homo sapiens were for nearly all of our evolutionary history, then I’d be skeptical about any attempt to improve us. The most brilliant engineering masterpieces in the universe are found in biology, and my guess would typically be that, unless one has a very strong argument otherwise, we’re not smart enough to make any improvement.

But, of course, we’re no longer living in the kinds of environments where we evolved, and the biology may no longer be the “right” kind. In particular, I argue in the book that our forward-facing eyes are like a fish out of water. Forward-facing eyes are, I argue, the optimal eye design for large animals living in forested habitats; in those circumstances that eye design allows the animal to see the most. But we’re no longer living in cluttered habitats. Our world is now filled not with leaves, but with trash cans, pillars, and cars. In such circumstances sideways-facing eyes see the most, and animals in such environments accordingly have sideways-facing eyes. In a million years I bet we’ll be fish-faced.

One of the perennial questions in philosophy is: do we see things as they really are, or perceive them as we think they are? Given your work on human vision, what’s your take on this?

Let me give you three examples from the book that help us ponder these philosophical issues.

First consider illusions like those I discussed above. One often feels as if what we see is due to some kind of direct “reading” of the real physical world. But our brain can’t just passively react to the incoming stream of visual information, lest it have an old perception of the world. Instead, it must actively generate a guess about the near future, which helps drive home that our perception is always an internal concoction by your brain. In fact, most of the input to your visual system is feedback from that very visual system.

Second, consider forward-facing eyes and binocular vision. When we see with two eyes in the same direction, we have one unified visual perception. We have what feels like a single viewpoint, one that is emanating roughly from a point between our two eyes. Furthermore, our single viewpoint is always filled with two copies of the world that you hardly ever notice. When you fixate on something out in front of you, then objects nearer and farther split into two perceptual copies, each rendered as transparent in your perception.

This allows you to see objects, and to see beyond them. For example, you can see your own nose from opposite sides at all times, but it is rendered as partially transparent and so does not block your view of the world beyond. The more one analyzes the phenomenology of binocular vision, the stranger it seems. But it doesn’t feel strange, because these are perceptual facets that our brain knows how to interpret. They are needed as part of your unified view of the world in order to incorporate the fact that it is really built out of two views of the world. Although, in a sense, you are perceiving fictions, they are fictions that allow you to more veridically see the world.

And, lastly, consider color vision. This is a case that helps us better understand that it is not so much whether you see the world as it is, but how much of the world’s reality your are privy to seeing. Colors are primarily about the underlying emotions and states of those around us, as seen through the window of skin, and the physiological changes in the blood. The spectrum of skin is complicated, but it varies over two dimensions that matter most for sensing the states of others, the concentration and oxygenation of skin.

The question is, what does the concentration and oxygenation of blood in the skin of others “truly” look like? Or, what do the emotions those blood variables signify “truly” look like? The interesting thing here is that these blood dimensions and these emotions are “really there”, but there is little sense to what their “real look” might be. Colors serve the role of what they look like, but does red really look like oxygenated blood or really look like anger? I’m not sure this is a sensible question. What matters is that that qualitative perceptual state is given a meaning or association to us, and so serves its purpose.

What’s your next project? Any new books in the works that we should be looking out for soon?

I just recently finished my new book, tentatively titled Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man. Remember the child reading the book titled “How to Somersault”? If that kid can read so early because writing has culturally evolved to shape writing to look like nature, couldn’t it be that speech has culturally evolved (perhaps over hundreds of thousands of years, rather than just several thousand years) to sound like nature? Could it be that speech has shaped itself to sound like the natural events that our auditory systems evolved via natural selection to be fantastic at processing? And, similarly, could it be that music has culturally evolved over time to sound like some other auditory aspect of nature that taps into ancient auditory mechanisms of ours, evocative ones? The short story of the book is, Yes. And, in particular, I argue that speech sounds like solid-object physical events, whereas music sounds like people moving about.

Check out Mark Changizi’s website here, and his blog here.