If you’re a famous dead artist, nothing welcomes you to the canon quite like someone sitting down and meticulously recording every piece of art you ever made. These records become a compendium, often several volumes long, called a catalogue raisonné. Such a catalog can itself represent the life’s work of the scholar who compiles it. It took Jacob-Baart de la Faille 11 years to complete van Gogh’s catalog. Monet’s catalog was published over a span of 18 years by a French billionaire. And it took 46 years for all of Picasso’s catalog to be released, while its publisher sold his car and apartment to finance the project.

Tucked away in Massachusetts, one man is making his life’s work out of those other life’s works. For the past three years, Jason Bailey has been hunting these catalogs down. He’s baffled librarians with his voluminous requests. He’s searched for libraries with liberal lending policies, so he can spend time with these pricey rare books. He’s scoured eBay and Amazon. He’s sought out a friend with a Ph.D. in Italian to decipher one rare catalog.

His mission: to turn them all into a proper digital database.

A catalogue raisonné typically lists each piece’s title, dimensions, date, medium, location, provenance, exhibition history, condition and occasionally even more. Together, these represent a comprehensively large and remarkably rich set of data on the most beautiful, seminal and expensive works of modern art. But this data is scattered, unsearchable and unanalyzable, locked away in countless books high on dusty library shelves, or shrink-wrapped and bearing stupefying price tags at boutique bookshops. “It’s locked up in these old books — hard to find, out of print, not very dynamic,” Bailey said. “For all this talk about ‘big data,’ I’ve seen over and over again — you need the data.”

On a coffee table in his living room in Ashland, Massachusetts, amid the clutter of everyday life — books, power cables, beer bottles — sits a boxy, homemade frame of PVC pipes. In the center of the frame lies an angular cradle, holding open a large volume of a catalogue raisonné, its facing pages held flat by two clear plastic panes; a light dangles above. Flanking this rig are two high-end digital cameras that record the artwork data within. Each volume takes about two hours to scan. Bailey then ships the resulting PDF files to the internet, where he takes bids from freelancers who do the painstaking transcription. The data comes back in a spreadsheet, where Bailey cleans it. Finally, he takes it to his central database: ordered, searchable and analyzable.

This moonshot project is called Artnome (as in “genome”). So far, Bailey says he has completely extracted, liberated and reassembled the data from the print catalogs of 35 major artists (including Cézanne, Dalí, Monet, O’Keeffe, Pollock and Rothko), and at least 10 more are currently in progress.

Other art databases exist, of course. Artnet maintains a massive database of auction sales, and The Metropolitan Museum of Art and MoMA recently made their own databases public. The Wildenstein Plattner Institute, an arts nonprofit in New York, is currently digitizing and publishing online a century’s worth of Impressionist archives. But “a database of complete known works across the most important art and artists of the 20th century,” as Bailey describes his goal, does not exist.

“Everybody thought I was crazy to do it, which made me want to do it more,” he said. “A single raisonné across all artists is the Holy Grail.”

Even a simple chart of Bailey’s data so far, a sampling of which he provided to FiveThirtyEight, reveals the artistic depth running beneath: the human-size color fields of Mark Rothko, the delicate intricacies of van Gogh, the panoramic abstractions of Lee Krasner. “Just playing in the virgin snow, I’m able to discover some interesting things,” he said.

Coming from a family full of engineers, “I was sort of the black sheep,” Bailey said. When he and his friends skipped school, he would eschew more typical adolescent hijinks and read art history books in the woods instead. But he got his start in data collection at age 11, when his father taught him to use Excel so he could catalog his comic book and baseball card collections. He went to school for studio art and design, but his current day job is at a company called Tamr that unifies data for large companies. His art project was sparked after he listened to a book on tape about art forgeries on his commute to work.

The project is very much a work in progress. The list of interesting artists is endless, and the catalogs of Picasso and Francis Bacon are Bailey’s white whales at the moment. Picasso’s catalog, often called “the Zervos,” after its original publisher, is 33 volumes long and retails for $25,000. It contains information on over 16,000 Picasso pieces. Bacon’s catalog was published just last year, and retails for a relatively meager $1,300. Bailey is still on the hunt for the data locked on paper inside them.

Bailey said he’s received a warm reception from art scholars, and one I spoke with agreed that the project had potential. “To the extent that projects are collaborative across institutions or between scholars, independent researchers, and institutions to make those works available worldwide, that’s all to the public good,” Carole Ann Fabian, the director of the Avery Architectural & Fine Arts Library at Columbia University, told me.

But Bailey also said many were skeptical it could be done. “These are very complicated projects he’s talking about,” Fabian said. “He’s an enthusiast, and these are areas of extreme expertise that the field dedicates tremendous effort toward.”

Ultimately, Bailey hopes Artnome can do for the art market what Zillow does for real estate. More than $45 billion worth of art was sold in 2016 — a lot of demand despite no comprehensive ledger of art’s supply side. Just how many Rothkos are there, anyway? And where are they? Who owns them? What are they titled? These are surprisingly difficult questions to answer.

While Bailey is hoping to spark a “Moneyball” era for art, the traditional auction houses are still acting as scouts. His database could allow them to do all sorts of new things, though. He’s beginning to weave data on auction prices into his universal raisonné, adding market valuations to the art-historic descriptions. That data helped him argue that Christie’s was overvaluing its pre-sale estimate by millions of dollars for a Wassily Kandinsky painting it plans to sell this fall. (Christie’s did not respond to a request for comment.) And the auction houses have shown interest in data. Last year, Sotheby’s acquired the Mei Moses Art Indices, “a constantly updated database of 45,000 repeat sales of objects.”

The Artnome project thus far has relied on a long list of high technology unthinkable to de la Faille or the other assemblers of early catalogs — the internet, the gig economy, data science and interlinked databases. Bailey plans to leverage much more tech. For his next phase, he’s drawing from artificial intelligence, machine learning, image matching and a Slack community where art historians, art-loving programmers and deep learning specialists from around the world have flocked in recent days. In concert, these technologies could help gather, verify, match, unify and enrich this budding Holy Grail.

“I don’t see it as a project that scales, long term, as one person’s crazy hobby,” Bailey said.

Despite the mountainous engineering challenge, another potential hurdle remains: These catalogs, no matter how hard won, don’t belong to Bailey. He didn’t spend the grueling years their authors took assembling and editing them, or the expenses their publishers incurred publishing them — and he certainly doesn’t hold their copyrights. But Bailey was optimistic about the copyright issues, citing the scanning done by the Google Books project and reading he’d done on the matter. One copyright lawyer I spoke with was optimistic, too, citing the example of the periodic table of elements: It was really hard to come up with, but it isn’t copyrightable. Copyright protects expression, not facts.

Still, Bailey offered a bit of gallows humor. “I don’t know if I’ll get blacklisted by libraries and have to wear a disguise,” he said. Bewigged or not, Bailey continues to pursue his “crazy hobby.” And as you read this, the database continues to grow.