“Scraping” is not a word with a lot of positive connotations. In “A Christmas Carol,” Charles Dickens describes Ebenezer Scrooge as “a squeezing, wrenching, grasping, scraping, clutching, covetous old sinner.” We feel the impropriety of Scrooge’s scraping all the more keenly for his wealth and power, since scraping is most often an act of the weak or desperate; we talk of “scraping by” or “scraping the bottom of the barrel,” and of “bowing and scraping” to someone more powerful. It’s also recently become news: Aaron Swartz, Chelsea Manning, and Edward Snowden have all found themselves in trouble during the past few years for something called “scraping.”

Among computer professionals, scraping means using software to continuously download and save small parts of a large body of data in order to slowly construct a copy; it’s typically done when there isn’t an easy way to download the information in bulk. A large database grants whoever possesses it considerable power. Facebook’s influence, for instance, rests largely with the social graph, its voluminous catalog of social connections between every person on the service. While coders have long built and refined ways to help information flow freely from a data source to a user—the alphabet soup of technologies like RSS and XML—powerful organizations do not always find it in their interest to offer easy access. As a last resort, programmers often turn to scraping.

In 2008, the activist Aaron Swartz scraped the database of Public Access to Court Electronic Records (PACER), a service that charges a small per-page fee for electronic access to the United States federal court system’s public records. Swartz, who saw himself as a liberator of public data working on behalf of the people, downloaded and released 2.7 million of PACER’s approximately five hundred million documents to the organization Public.Resource.Org, prompting an F.B.I. investigation. The agency ultimately determined that the documents were in the public domain, and Swartz was not charged with any crime. A couple of years later, Swartz attempted a similar feat when he connected a laptop to M.I.T.’s computer network to download approximately four hundred and fifty thousand documents from the online academic-article repository JSTOR. According to some, including Swartz, there was nothing wrong with attempting to download the entire JSTOR corpus: he had an account that provided him access to the documents, and anyone on M.I.T.’s “open” campus was granted access to it through the network. Swartz was arrested by M.I.T. police and the U.S. Secret Service, and was eventually indicted on federal and state charges, including wire fraud, computer fraud, and grand larceny. As Larissa MacFarquhar documented in detail, when Swartz committed suicide, in January, 2013, he was still tied up in legal problems stemming from his scraping of JSTOR. He faced up to fifty years in prison and a million dollars in fines.

Swartz worked to scrape data that was, to a certain degree, intended to be public. Chelsea Manning presents a more ethically complex case. In 2009 and 2010, Manning, a United States Army private known at the time as Bradley, used a similar technique to download copies of the Significant Activities (SigAct) portion of Army Intelligence’s secure databases. “This process began in late December 2009 and continued through early January 2010,” Manning said in a statement to a court martial last year. “I could quickly export one month of the SigAct data at a time and download in the background as I did other tasks. The process took approximately a week for each table.” While the initial scraping was performed as part of Manning’s work as an intelligence analyst, it was easy for her to copy the entire archive of classified documents to a thumb drive. In early 2010, disillusioned with the United States’ mission in Iraq, she gave the thumb drive to Wikileaks.

Like Manning, Edward Snowden was never supposed to collect, let alone release, any of the information he had access to as a Booz Allen Hamilton contractor working for the N.S.A. on the Hawaiian island of Oahu. Snowden was a systems administrator, which meant that his job gave him low-level access to the hardware on which the N.S.A.’s data was stored. He was also skilled at copying and managing large quantities of information. At some point before June of last year, Snowden amassed an archive of classified documents—possibly as many as 1.7 million files, according to intelligence officials who testified before the House Intelligence Committee last week.

Swartz and Manning both used a program called Wget, which has been freely available since the mid-nineteen-nineties, to obtain their vast troves of data. If you point it at a network address and select the right options, Wget will download any file it finds, and will follow links to more documents, downloading as it goes. It can be configured to crawl and download an entire Web site—or a trove of classified SigAct reports, if you can give it the credentials to access them. It’s a familiar and frequently used tool for anyone who works with computers, because it is specifically engineered to work over slow or unreliable networks. In the prosecution’s opening statement at Manning’s trial, Captain Joe Morrow called it “a case about a soldier who systematically harvested hundreds of thousands of documents from classified databases.” But here, as is so often the case when we talk about technology, metaphor can obscure the truth as easily as it can illuminate it: Wget cannot “harvest” anything. Just like any document accessed over a network, Wget operates by copying files, leaving the original in place. Likewise, the word “scraping” connotes a harsh removal of information, but data scraping only involves copying.

Unnamed intelligence officials, describing how Edward Snowden collected his archive of N.S.A. files, told the Times, “We do not believe this was an individual sitting at a machine and downloading this much material in sequence” and that the process was “quite automated.” The officials didn’t specify what software Snowden used, but “said it functioned like Googlebot, a widely used web crawler that Google developed to find and index new pages on the web.” The difference between a “crawler” and a “scraper” is subtle, but typically a crawler is smarter about the links it follows, what it downloads, and what it leaves uncopied. For the most part, though, “crawling” is just scraping with a fancier name, and Google created one of the world’s most valuable companies in part by being better at scraping than anyone else. Google was incorporated in 1998, and by 2002 its Web-scraping “Googlebots” were so ubiquitous and voracious that, in a short story titled “Robot Exclusion Protocol,” the programmer and writer Paul Ford imagined one trying to index his bathroom. Some have suggested that Google’s recent acquisition of the smart-device maker Nest Labs is effectively an effort to scrape real-world data about our homes and lives, to add to the company’s trove of information about us, which now includes information about the Web pages we visit, our e-mails, the books we read, our shopping habits, and more.

With enough persistence, scraping can produce enormous rewards, whether financial or political. Unlike panning for gold, where the objective is to extract the few valuable bits from a large worthless mass, scraping is an accretive process: value is created by the quantity of information. The N.S.A. scrapes and amasses enormous databases of global communications data, while Google constantly crawls the Internet, copying and indexing everything it can reach. With a large enough database of links between Web sites, Google can help searchers find exactly what they are looking for online. With a large enough collection of communications metadata, the N.S.A. can analyze networks and associations between people, identify patterns of interaction and behavior, and possibly spot threats before they happen. The N.S.A. believes they can do this so well that communications metadata alone has reportedly been used to target drone strikes.

Scrooge is haunted by three ghosts, Past, Present, and Future, and by the end of “A Christmas Carol” he has a change of heart, deciding to share his pile instead of hoarding it. The scraping Swartz, Manning, and Snowden did still haunts us. Their work is still bringing unsettling revelations about the secret structures of our world to light. Whether it will bring us a lasting change of heart has not yet been decided.

Rusty Foster is a computer programmer and writer who lives in Maine. He summarizes the Internet at Today in Tabs.

Photograph: Mandel Ngan/AFP/Getty