It is a heartwarming story: In spite of the endless onslaught of digital content, American readers have collectively put down their screens and decided to embrace once more that beloved tactile rectangular prism that reminds us, with its weight at the bottom of our bags, of its immeasurable heft. Since 2015, major news outlets, including this one, have reported the triumphant return of print: that “real” books are back, and ebooks have lost their gleam.

Of course, it’s not entirely true. Yes, ebooks are doing just fine: Americans consume hundreds of millions of them a year. But many of their authors are writing and publishing books, and finding massive audiences, without being actively tracked by the publishing industry. In fact, the company through which they publish and distribute their books, a tech behemoth disguised as a benevolent, content-agnostic retailer, is the only entity with any real idea of what’s going on in publishing as a whole.

Amazon’s power over self-publishing, a shadow industry running outside the traditional publishing houses and imprints, is insidiously invisible. As a result, the publishing industry has a data problem, and it doesn’t look like Amazon will be loosening its grip any time soon.

A wave rises

They don’t often get nominated for huge book prizes, noticed by the New York Times book review, or endorsed by the president. But over the past seven years, self-published books—predominantly sold as ebooks–have offered a rare avenue through which writers can make a living just from writing, as opposed to speaking, teaching, and/or consulting. By cutting out publishers, writers sidestep print and distribution costs, increase their revenue, and are beholden to readers and algorithms, not critics, editors, marketers, or sales people. A decent writer with a flair for self-promotion, or a decent entrepreneur with writing chops, can earn serious cash.

Amazon launched its Kindle Direct Publishing platform in 2011, and by 2012, it had its first breakouts in the mainstream. Hugh Howey, author of Wool, and Andy Weir, author of The Martian, were early success stories. But so were dozens of people you’ve potentially never heard of: H.M. Ward, Rachel Abbott, Bella Andre, all getting paychecks that left authors in the rest of the industry salivating.

Self-publishing has since exploded, particularly in romance, fantasy, and science fiction. Though an average is impossible to estimate, top-selling authors can sell hundreds of thousands of self-published books on Amazon, which, with revenue of $2 per book, can generate millions of dollars. For the past few years, mega-selling romance writer H.M. Ward has been making a seven-figure salary across self-publishing platforms, more than half of which came through Amazon. At one point, she cracked double-digit millions in sales. According to one estimate, last year 2,500 self-published authors made at least $50,000 in book sales across self-publishing platforms, before the platforms’ cuts.

Self-published authors price their books lower than traditionally published ebooks, but authors can make up to 70% in royalties from Amazon; that’s double, even triple, the royalties they could make with a publisher. Even though an author could get a big advance from a traditional publisher, advance amounts vary widely—and this assumes she can get a book deal at all.

More data, more problems

We’re not just talking about a few women reading erotica on their phones during their lunch breaks; by most accounts, self-publishing is massive. But only Amazon knows its true scale.

The information asymmetry between Amazon and the rest of the book industry—publishers, brick-and-mortar stores, industry analysts, aspiring writers—means that only the Seattle company has deeply detailed information, down to the page, on what people want to read. So an industry that’s never been particularly data-savvy increasingly works in the dark: Authors lose negotiating power, and publishers lose the ability to compete on pricing or even, on a basic level, to understand what’s selling.

When it comes to print books, NPD Book’s BookScan is the industry standard. The group collects data from point-of-sale purchases, an estimated 85% of the US print trade market, and from retailers including Amazon, Barnes & Noble, Walmart, Costco, and independent bookstores.

But ebook sales are anybody’s guess. Amazon doesn’t report its ebook sales to any of the major industry data sources, and it doesn’t give authors more than their own personal slice of data. A spokesperson from Amazon writes by email that “hundreds of thousands of authors self-publish their books today with Kindle Direct Publishing,” but declined to provide a number, or any sales data.

NPD tracks digital books in a way that’s crucially different from print—via publishers, not retailers. But since hundreds of thousands of authors are behaving as individual publishers on Amazon without being tracked, any picture painted by the group has a gaping hole in it. “NPD PubTrack Digital tracks ebook sales but because it is a publisher data-share model, the data does not include self-published ebooks,” writes NPD’s Allison Risbridger by email. “Therefore we cannot comment on the size of the self-published ebooks market.”

Bowker, which issues ISBNs, the unique number you see above the barcode on a book, says 786,935 self-published titles came out in 2016. But there’s no way to know how close that is to the actual number of self-published books, because ISBNs are both optional and expensive, so individual authors often forego them. “The total size of the market is unclear,” writes Nicola Bacon, public relations manager for ProQuest, which owns Bowker. “Our data is meant to be directional—one of the few sources that can be compared year over year.”

“Honestly, Bowker’s numbers are completely useless,” says David Gaughran, a self-published author of historical fiction who blogs about the business of getting published on your own. “They’re worse than useless, because they’re taken as reliable, and they’re not.”

Nobody—industry experts, authors, publishers—can gauge the true size of the self-publishing market. So no one can say for sure what’s going on in the larger book industry.

Message-board mobilization

Short of any resource for good data, authors have self-organized and tried to fill in that gap to better understand the market. They band together on message boards to share their sales data and try to extrapolate a clearer picture of how many sales are needed to hit a certain ranking on Amazon.

In 2014, a self-published author started the blog Author Earnings anonymously to scrape Amazon’s bestseller page. Until recently, it was the best resource for sales data from self-publishing. In January, the team behind Author Earnings soft-launched Bookstat, a paid service that tracks online book retail in real time. Bookstat extrapolates sales data from book rankings and sales history, provided by authors, and estimates sales per author and book throughout the day, with a self-reported margin of error of 5%.

Bookstat estimates that in 2017, there were half a million self-published authors who sold at least one book, and a total of 240 million self-published ebook units sold. Both figures went undetected by the traditional reporting organizations. But as the founder, who still asks to remain anonymous, notes, “There’s really no way to wrap your arms around how many authors there are, including the ones who are not selling, including the ones who are out of print on the traditional publishing side.” By his estimate, self-published books in the US were worth $875 million last year, about $700 million of which was ebooks.

Combine last year’s NPD BookScan numbers (that is, 85% of US trade print sales) and what Bookstat strings together of self-published book sales, and you have a very rough picture of the difference between what is generally reported in sales figures and what’s missing (not including a grab-bag of uncategorizable sales or books from Amazon’s own imprints):

A new landscape with a bad map

Without good data, there’s no complete picture of the industry. News stories say digital fatigue is sounding the death knell of ebooks, as readers across the country devour $700 million dollars of untracked digital files. Publishers are less able to see what’s selling in certain commercial genres, and less able to take risks on debut authors. Bookstore attendance becomes lopsided, and a large swath of American readers get algorithm-driven book creation. As authors move to self-publishing, the creativity pool becomes bifurcated.

“I think it hurts everyone,” says publishing consultant Jane Friedman. “Because everyone gets to put forward the narrative they would personally like to believe in.” Publishers believe ebooks were a failed experiment, bookstore owners can cheer the triumph of their raison d’être, print lovers get to gloat that screens will never kill the old-school ways. Self-published authors can keep making money, and trying to light lamps to cut through the data darkness.

And Amazon can keep doing what it does best, without any transparency to the public, readers, or the rest of the industry. Using its highly attuned proprietary data, it builds a bigger, more pervasive product with every turn of the page: the machine that knows readers.