You can explore this embedding above. Click+drag to pan, scroll to zoom. Note that this visualization could take several seconds to load.

The resulting embedding is shown below. These images are typically arranged into discrete clusters that capture content and media. Cluster 1 shows watercolor images; Cluster 2 is comprised of oil paintings; Cluster 3 has vector art; Cluster 4 contains gloomy photographs of abandoned buildings or lonely landscapes; Cluster 5 shows various pen and pencil sketches.

We consider a subset of 1,000 images that score high on each attribute. We then take the final layer off of a pre-trained ResNet and embed these images into 512-dimensional feature space. Finally, we use t-SNE to project these features down to two dimensions.

Here is a sample of the top-scoring images for your chosen attributes. At our quality threshold, you should expect 90% precision from these results. This should give you a sample of results that you can expect.

Artwork on Behance spans many fields, such as sculpture, painting, photography, graphic design, graffiti, illustration, and advertising. Graphic design and advertising make up roughly one third of Behance. Photography, drawings, and illustrations make up roughly another third. This artwork is posted by professional artists to show off samples of their best work.

Our dataset is built from Behance , a portfolio website for professional and commercial artists. Behance contains over ten million projects and 65 million images.

Computer vision systems are designed to work well within the context of everyday photography. However, artists often render the world around them in ways that do not resemble photographs. Artwork produced by people is not constrained to mimic the physical world, making it more challenging for machines to recognize.

This work is a step toward teaching machines how to categorize images in ways that are valuable to humans. We collect a large-scale dataset of contemporary artwork from Behance, a website containing millions of portfolios from professional and commercial artists. We annotate Behance imagery with rich attribute labels for content, emotions, and artistic media. We believe our Behance Artistic Media dataset will be a good starting point for researchers wishing to study artistic imagery and relevant problems.

Our dataset is built from Behance, a portfolio website for professional and commercial artists. Behance contains over ten million projects and 65 million images.

Artwork on Behance spans many fields, such as sculpture, painting, photography, graphic design, graffiti, illustration, and advertising. Graphic design and advertising make up roughly one third of Behance. Photography, drawings, and illustrations make up roughly another third. This artwork is posted by professional artists to show off samples of their best work.

Our dataset requires some level of human expertise to label, but it is too costly to collect labels for all images. To address this issue, we use a hybrid human-in-the-loop strategy to incrementally learn a binary classifier for each attribute. Our hybrid annotation strategy is based on the LSUN dataset annotation pipeline.

At each step, humans label the most informative samples in the dataset with a single binary attribute label. The resulting labels are added to each classifier's training set to improve its discrimination. The classifier then ranks more images, and the most informative images are sent to the crowd for the next iteration. After four iterations, the final classifier re-scores the entire dataset and images that surpass a certain score threshold are assumed to be positive. This final threshold is chosen to meet certain precision and recall targets on a held-out validation set. This entire process is repeated for each attribute we wish to collect.

Quality guarantees

As a quality check, we tested whether the final labeling set meets our desired quality target of 90% precision. For each attribute, we show annotators 100 images from the final automatically-labeled positive set and 100 images from the final negative set using the same interface used to collect the dataset. The mean precision across all attributes is 90.4%, where precision is the number of positive images where at least one annotator indicates the image should be positive.

These checks are in addition to our MTurk quality checks: we only use human labels where two workers agree and we only accept work from turkers with a high reputation who have completed 10,000 tasks at 95% acceptance.