LinkedIn is suing a gang of hackers who used Amazon's cloud computing service to circumvent security measures and copy data from hundreds of thousands of member profiles each day.

"Since May 2013, unknown persons and/or entities employing various automated software programs (often referred to as 'bots') have registered thousands of fake LinkedIn member accounts and have extracted and copied data from many member profile pages," company attorneys alleged in a complaint filed this week in US District Court in Northern California. "This practice, known as 'scraping,' is explicitly barred by LinkedIn's User Agreement, which prohibits access to LinkedIn 'through scraping, spidering, crawling, or other technology or software used to access data without the express written consent of LinkedIn or its Members.'"

With more than 259 million members—many who are highly paid professionals in technology, finance, and medical industries—LinkedIn holds a wealth of personal data that can prove highly valuable to people conducting phishing attacks, identity theft, and similar scams. The allegations in the lawsuit highlight the unending tug-of-war between hackers who work to obtain that data and the defenders who use technical measures to prevent the data from falling into the wrong hands.

The unnamed "Doe" hackers employed a raft of techniques designed to bypass anti-scraping measures built in to the business network. Chief among them was the creation of huge numbers of fake accounts. That made it possible to circumvent restrictions dubbed FUSE, which limit the activity any single account can perform.

"In May and June 2013, the Doe defendants circumvented FUSE—which limits the volume of activity for each individual account—by creating thousands of different new member accounts through the use of various automated technologies," the complaint stated. "Registering so many unique new accounts allowed the Doe defendants to view hundreds of thousands of member profiles per day."

The hackers also circumvented a separate security measure that is supposed to require end users to complete bot-defeating CAPTCHA dialogues when potentially abusive activities are detected. They also managed to bypass restrictions that LinkedIn intended to impose through a robots.txt file, which websites use to make clear which content may be indexed by automated Web crawling programs employed by Google and other sites.

LinkedIn engineers have disabled the fake member profiles and implemented additional technological safeguards to prevent further scraping. They also conducted an extensive investigation into the bot-powered methods employed by the hackers.

"As a result of this investigation, LinkedIn determined that the Doe defendants accessed LinkedIn using a cloud computing platform offered by Amazon Web Services ('AWS')," the complaint alleged. "This platform—called Amazon Elastic Compute Cloud or Amazon EC2—allows users like the Doe defendants to rent virtual computers on which to run their own computer programs and applications. Amazon EC2 provides resizable computing capacity. This feature allows users to quickly scale capacity, both up and down. Amazon EC2 users may temporarily run hundreds or thousands of virtual computing machines. The Doe defendants used Amazon EC2 to create virtual machines to run automated bots to scrape data from LinkedIn's website."

It's not the first time hackers have used EC2 to conduct nefarious deeds. In 2011, the Amazon service was used to control a nasty bank fraud trojan. (EC2 has also been a valuable tool to whitehat password crackers.) Plenty of other popular Web services have been abused by online crooks as well. In 2009, for instance, researchers uncovered a Twitter account that had been transformed into a command and control channel for infected computers.

The goal of LinkedIn's lawsuit is to give lawyers the legal means to carry out "expedited discovery to learn the identity of the Doe defendants." The success will depend, among other things, on whether the people who subscribed to the Amazon service used payment methods or IP addresses that can be traced.