Editor’s Note: Today’s post is by Andrew Pitts, Co-Founder and CEO of PSI, and is based on the talk he gave at the 2018 Society for Scholarly Publishing (SSP) Annual Meeting.

Back in March, Scholarly Kitchen Chef and NISO Executive Director Todd Carpenter wrote a post titled, “FBI Indicts Nine Iranians in a Massive Scheme to Target Academic Credentials and Steal Content.” It was about the U.S. Federal Bureau of Investigation’s unsealed indictments of nine Iranian citizens for the bulk theft of intellectual property from academic institutions in a brazen scheme to gather and redistribute scholarly content. His post followed a number of headline articles from the preceding months, including “University secrets are stolen by cybergangs” on the front page of The Times in September 2017.

I recall thinking that the issue of international IP theft was hitting closer to home for universities, and hoping more of them would start taking the threats posed by Sci-Hub – a similarly “brazen scheme to gather and redistribute scholarly content” – more seriously. So when I was offered an opportunity to present actual evidence of how Sci-Hub is doing so much more than enabling the download of publisher’s PDFs at the SSP Annual Meeting, I accepted.

Why me? I’ve been working on IP intrusions for many years, limiting the damage caused by intrusions to publishers and libraries. In recent years that’s meant spending a lot of time looking at Sci-Hub. I continue to work with legal authorities in the US and UK, as well as with publishers, individual libraries and consortia. Our aim is to make all of these stakeholders aware of the threats, to expose cybercrime, and to share information across the academic research community to help protect publishers, authors, researchers, and library patrons.

Let me be clear: Sci-Hub is not just stealing PDFs. They’re phishing, they’re spamming, they’re hacking, they’re password-cracking, and basically doing anything to find personal credentials to get into academic institutions. While illegal access to published content is the most obvious target, this is just the tip of an iceberg concealing underlying efforts to steal multiple streams of personal and research data from the world’s academic institutions.

As my SSP co-panellist Joe DeMarco, IP theft and cybercrime lawyer and former federal cybercrime prosecutor in New York, said during our panel:

I will tell you that in 21 years now of doing work both in the public sector and the private sector, I have never once met a cybercriminal who set out to gain access to a database for one purpose and only confined themselves to that one purpose once they got inside the database. It’s not how criminals think. The criminal doesn’t break into your home just to steal your silver; they break in to steal the silver, the jewels, the electronics, the cash, and anything else worth stealing.

How are they doing this? Stolen individual credentials are published on websites. So far, we know of 29 sites that openly boast about having passwords to access library content. We’ve monitored these and asked libraries to take corrective action.

What can they do with passwords? We tested this ourselves. For one user of a library in Michigan, whose credentials were published on an Iranian website, we were not only able to access content through her account but also her profile and her personal information. We were able to track her library usage, including titles requested and when they were available for collection.

In the hands of a criminal, this information could be dangerous. Imagine what additional information an experienced hacker would be able to get hold of, and what they might do with this information. Maybe they would change the username and passwords for all the other accounts that person has. How many student access credential passwords are something like “GoState! #1” for more than just their library information system? The hackers can buy an email, a password, and maybe even a phone number. Then they have the golden ticket.

We know that, at one UK University, Sci-Hub managed to get six passwords through a 48-hour dictionary attack on their system. Then, over a weekend (when spikes in usage are less likely to come to the attention of publishers or library technical departments) they accessed 350 publisher websites and made 45,092 PDF requests. In another attack, the hackers not only broke into their database; they changed the names and passwords of profiles. Another institution told us an intruder changed the cell phone numbers linked to the user accounts and also planted malware, meaning that all their computers needed to be completely wiped. In addition, we have evidence that Sci-Hub is bombarding university IT systems, often for days on end, without the knowledge of compromised users.

Libraries that allow access to Sci-Hub from their secure networks are inviting trouble. Internet traffic flows both ways, and use from within a secure network effectively opens a hole right through the secure firewall. If key loggers are placed on computers, how many other credentials will be stolen and used?

More evidence collected shows that credentials that get into Sci-Hub’s hands are subsequently shared widely. How do we know? We caught them. When a particular set of credentials had been stolen and used first by Sci-Hub, the password was reset. For a short period afterward, the stolen credentials were monitored. The log file analysis revealed that there were 302 further attempts to access the site using the stolen credentials. The access points came from 12 countries including the United States, China, Thailand and Hong Kong.

Only 17 of the attempts were from Sci-Hub itself, demonstrating the credentials stolen by Sci-Hub had gone viral. Scarily, those same stolen credentials were even being used by university users at 34 recognized universities! This tells us that the credentials had been passed around the web.

So, if you have donated your credentials and you think they are only used to access scholarly content, they’re not. The image below is taken from a website called Passfans; they trade in stolen credentials for credits. Sometimes they’re paid for, and sometimes they’re free.

The next visual is just three pages of 60 I’ve compiled, taken from the Passfans website, showing a wide variety of organizations whose stolen credentials are shared online. This is rife, and it’s a huge problem.

A small number of Sci-Hub’s credentials are donated, but the vast majority are clearly phished or taken by dictionary attacks. This is a massive scale operation, as they’ve taken over 65 million articles. They’re not doing that with just a small number of credentials; they need thousands, and it is unreasonable to think that many have been donated.

Intrusions result in thousands of articles being stolen in a single attack. But it’s not just that – it’s personal research and it’s social security numbers, names, addresses, and other personal information. This information is truly valuable, and when it’s traded on the dark web, a set of credentials can typically fetch $75. Sci-Hub must have thousands of these – a potentially valuable source of revenue.

The real cost for libraries

The time and effort taken by universities to protect themselves and repair the damage caused by cybercriminals like Sci-Hub is just one example of the real harm to individual libraries. One university took days to go through and wipe all the viruses, clean compromised machines, look at what had happened, and change things to protect themselves in the future. These activities increase staffing and hardware costs for both the library and institution. The damage also impacts on the daily operation of IT systems; for example, dictionary attacks will slow the universities’ IT systems. Access to paid content can be lost when a publisher becomes aware of an intrusion. Personal data can get into the wrong hands.

So what can you do? Here are some basic actions to take:

Report intrusions to the authorities and to each other. Both libraries and publishers need to do this.

Ensure proxy servers are secure, especially against dictionary attacks.

Educate users about phishing attempts.

Educate students and staff on protecting personal data, particularly with different and more secure passwords.

Understand the possible effects if your students use Sci-Hub, LibGen, aaaarg.fail, etc. Libraries need to block access to Sci-Hub from within their firewalls. Libraries typically have firewalls that protect from outside intrusions, but if a user within the firewall brings in PDFs, this can open a door in the firewall that is open to two-way traffic.

Register library IP addresses and contact details with The IP Registry – a clear communication channel between libraries and publishers. This helps when intrusions are identified because the details can be shared across the academic research community.

There are technical solutions as well. Sari Frances from IEEE mentions some in her April 2018 post in The Scholarly Kitchen: “Guest Post: Technology, Law, and Education: A Three-Pronged Approach to Fight Digital Piracy.”

A lot of these actions are for librarians. I don’t mean to pick on librarians, but they are very much on the front lines of this battle, representing both their students and institutions, and they need to be more engaged. As co-panellist at the SSP Meeting Rick Anderson, Collections Librarian for the University of Utah (and a Scholarly Kitchen Chef who has written further about the risks Sci-Hub poses to universities), pointed out: “Sci-Hub doesn’t represent an attack on the publishing companies; it’s also an attack on the integrity of academic information systems and on the privacy of students.”

Can you trust Sci-Hub?

That’s clearly a decision each individual has to make. Personally, I wouldn’t. This is a large organization that has passwords to obtain both PDFs and personal information from some of the world’s top institutions.

Maybe you think Sci-Hub’s founder, Alexandra Elbakyan, is a Robin Hood type character. Okay, but she’s also a cybercriminal. I thought Joe DeMarco put something else well:

Whatever you think about open access or the pricing of articles, books, and journals – of the ability of publishers to be innovative, or not, across the digital economy we all live in – whatever you think about any of that, always bear in mind that cybercriminals are thinking about it in a very different way.

If Sci-Hub is entirely reliant on using donated passwords, then why are they running attacks to break into university systems? Given her history of erroneous statements and erratic actions, what credibility should we grant her unsupported claims? Should we just take her word for it?

My aim is to make all people aware of the threats, to expose cybercrime and to share information across the academic research community to help protect publishers, authors, researchers, and library patrons. Please feel free to reach out to either myself or Sari Frances at IEEE to learn more about what the IP Intrusion Database (IPID) committee, along with a number of publishers and librarians, are doing to confront the dangers posed by Sci-Hub and other cyber attackers.

Addendum

Thank you to everyone for your comments. When it comes to evidence I have to be careful; I cannot provide exact details without giving away our sources. I can say that we have been able to monitor activity patterns through log files and trace the source of the activity as being Sci-Hub, we have then gone on to monitor subsequent activity from the same source and we have seen all kinds of criminal activity coming from that source.

Below you will find some further explanation of some of the evidence already included in the article as well as extracts from some of the emails I’ve received while investigating intrusions targeting universities. I hope that these will provide an illustration of the kind of evidence we are working with.

1) The table below was shown at SSP 2018. It explains how we know stolen credentials are shared around the globe;

2) PHISHING: it is easy to tell from the log files that the phishing and password attacks come from the same place as the eventual hack where all the IP of the University is stolen.

Dear XXXXX,

Thank you for alerting us to this infringement which we take seriously.

We’re able to provide some additional information regarding this incident.

The attack took place via EZproxy and began at 02/Sep/2017:05:37:05 +0100 and lasted until 04/Sep/2017:15:45:59 +0100

Our systems staff have identified the account concerned and the user’s password that was compromised. This has now been scrambled to prevent any further misuse. … the user concerned fell victim to a phishing scam and it was not their intention to download content inappropriately.

XXXXXX XXXXXX

Serials & E-Resources Librarian

University of XXXXXX

3) Another Phishing Attack;

Hi XXXXXX,

We do believe it was part of a phishing attack.

IP addresses: XXX.XXX.158.32

The accounts are compromised through Innovative’s WAM proxy

We are unable to tell whether any other activity occurred, outside of usage of our e-resources via our proxy server.

We have seen a phishing letter, so we believe these are the result of phishing. The frequency of these seems to be very high at the moment.

In almost all instances more than one publisher is affected.

Thanks,

XXXXXX

Library Resource Officer

University of XXXXXX

Ext: XX26

4) Password Attack Evidence

Dear XXXXXX,

The process is quite alarming – we’re seeing a sustained password guessing attack on our password management system. It’s coming from a Tencent cloud server (XXX.51.XXX.220) which we’ve now locked. The guessing rate was quite low but still successful, so I suspect it’s testing a list of users/passwords gathered from elsewhere.

The attacker is even registering a new cellphone number for accounts captured to maintain access to the account. We’ve had to start auditing phone numbers to prevent this. It’s quite a lot of effort so I’m wondering why this is worthwhile to the attacker.

…

Regards

XXXXXX

XXXXXX XXXXXX Phone: (+44) XXX XX08 987X

Senior Information Security Officer Int phone: XXXXX

Information Systems

University XXXXXX XXXXXX

5) The image below shows data from just one of the attacks discussed in the above email extracts: