Ten minutes after your roommate leaves for the store, you realize you’re out of toilet paper. Do you call her? Send a text message? Why not just send her a picture of the empty toilet-paper holder—or a four-second video of you standing next to it, despondent?

Last week Nick Bilton of The New York Times heralded the development of a new language—one composed entirely of social media-friendly images. Photos and videos shared using mobile applications such as Instagram, Vine, and Snapchat, Bilton argues, are on their way to becoming a preferred medium for conveying real-time information to others: “The cutting-edge crowd is learning that communicating with a simple image, be it a picture of what’s for dinner or a street sign that slyly indicates to a friend, ‘Hey, I’m waiting for you,’ is easier than bothering with words, even in a world of hyper-abbreviated Twitter posts and texts.”

Sending a photo of a loaded baked potato to dinner guests certainly could be informative. But it doesn’t reach the status of language. Instead, it’s more like a digital gesture, a long-distance hand wave toward the kitchen table. And like most gestures, it relies heavily on context to be understood. Sending an image of a potato to someone you haven’t invited to dinner would be, at best, a non sequitur. And how would you communicate a more nuanced or unexpected message like, “I made loaded baked potatoes yesterday; tonight we’re having something else!”

Still, the Times’ rather loose use of “language” got me thinking: What would a new full-fledged pictorial language—not the hieroglyphics of old, but a language built to suit the social media-mentality of “right here, right now, this is who I am, who I’m with, and what I’m up to”—look like?

Considering that snapshots and videos currently function much like gestures, the place to look for clues about forming a more sophisticated system might be sign language. We know that as sign languages evolve, signers shift away from using holistic gestures—gestures that “act out” a message in its entirety—and toward using a series of discrete gestures, each of which represents a single component of the message. Discreteness is what gives a language its flexibility, making it possible for our finite brains to tackle an infinite number of messages. Instead of learning a new holistic gesture for each and every message, after all, we can mix and match a much smaller number of components to form any message we please.

How might discreteness manifest itself in our new language? We’d have to come up with an agreed-upon set of associations between individual concepts and things in our environment we can realistically capture with a snap. It seems to me the latter would have to be, at least sometimes, symbolic—even arbitrary, composed largely of things we carry with us wherever we go. After all, we have to be able to communicate about abstract, absent, and imaginary subject matter; what good is a language that cannot opine on, say, unicorns?

We’d also need a grammar to tell us how concepts should be combined. In English, much of this work is done with word order and affixes like –ed—something that could be approximated rather simply with, say, a nifty app that ensures our pictures are viewed in the right order.

But sign languages tend to take a more exhaustive approach: sign order, sign direction, the number of times a sign is produced, even body position and facial expressions—all contain important grammatical information. I wonder: might we concoct a grammar of camera movements? The angle, speed, and path of a pan—manipulated independently of the video’s content—could show us, for instance, that the event described is about to begin, or that it already happened but to somebody else. Another potential grammatical tool? Colored or distressed lenses to indicate whether a message should be interpreted earnestly or ironically, haltingly or hypothetically.

If this all sounds complicated—like something that would take more than a few lunch hours to master—it’s because language is complicated. Creating a language based entirely on real-time images is at least theoretically possible. (And I’d love to hear others’ suggestions on just how the mechanics might work.) Indeed, generating such a language might be an awesome thing to do. Think of the potential for photoplay (quipography?)!

But it wouldn’t make communication “easier than bothering with words.” Nor would it dumb us down, or ruin language as we know it. That’s because it would have to become language as we know it, in all the ways that matter, to have the flexibility necessary to replace—let alone compete with—our primary mode of communication.