Descriptive Camera™, 2012

The Descriptive Camera works a lot like a regular camera—point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype uses crowd sourcing to output a text description of the scene. Modern digital cameras capture gobs of "parsable" metadata about photos such as the camera's settings, the location of the photo, the date, and time, but they don't output any information about the content of the photo. The Descriptive Camera only outputs the metadata about the content.

As we amass an incredible amount of photos, it becomes increasingly difficult to manage our collections. Imagine if descriptive metadata about each photo could be appended to the image on the fly—information about who is in each photo, what they're doing, and their environment could become incredibly useful in being able to search, filter, and cross-reference our photo collections. Of course, we don't yet have the technology that makes this a practical proposition, but the Descriptive Camera uses crowd sourcing to explore these possibilities.

Technology

The technology at the core of the Descriptive Camera is Amazon's Mechanical Turk API. It allows a developer to submit Human Intelligence Tasks (HITs) for workers on the internet to complete. The developer sets the guidelines for each task and designs the interface for the worker to submit their results. The developer also sets the price they're willing to pay for the successful completion of each task. An approval and reputation system ensures that workers are incented to deliver acceptable results. For faster and cheaper results, the camera can also be put into "accomplice mode," where it will send an instant message to any other person. That IM will contain a link to the picture and a form where they can input the description of the image.

The camera itself is powered by the BeagleBone, an embedded Linux platform from Texas Instruments. Attached to the BeagleBone is a USB webcam, a thermal printer from Adafruit, a trio of status LEDs and a shutter button. A series of Python scripts define the interface and bring together all the different parts from capture, processing, error handling, and the printed output. My mrBBIO module is used for GPIO control (the LEDs and the shutter button), and I used open-source command line utilities to communicate with Mechanical Turk. The device connects to the internet via Ethernet and gets power from an external 5 volt source, but I would love to make a another version that's battery operated and uses wireless data. Ideally, The Descriptive Camera would look and feel like a typical digital camera.

Presentation Video

Results

After the shutter button is pressed, the photo is sent to Mechanical Turk for processing and the camera waits for the results. A yellow LED indicates that the results are still "developing" in a nod to film-based photo technology. With a HIT price of $1.25, results are returned typically within 6 minutes and sometimes as fast as 3 minutes. The thermal printer outputs the resulting text in the style of a Polaroid print. Below are a few samples from the Descriptive Camera:

This is a faded picture of a dilapidated building. It seems to be run down and in the need of repirs.

Looks like a cupboard which is ugly and old having name plates on it with a study lamp attached to it.

Corner of a wood floored room with a tool chest, bike, stack of books, box leaning against the wall, an open door with a bag hanging off the doorknob, and a pair of closed double doors with cables hanging on the handles.

Acknowledgements