Updated An Amazon Web Services engineer published exchanges with customers and "system credentials including passwords, AWS key pairs, and private keys" to a public GitHub repository by accident.

On 13 January, infosec biz UpGuard discovered a 954MB repository containing AWS resource templates – used to create cloud services – plus hostnames, and log files generated in the second half of 2019. There were also internal Amazon training resources marked "confidential."

"Several documents contained access keys for various cloud services," UpGuard reported today. "There were multiple AWS key pairs including one named 'rootkey.csv,' suggesting it provided root access to the user's AWS account. Other files contained collections of auth tokens and API keys for third party providers. One such file for an insurance company included keys for messaging and email providers."

UpGuard continued:

In addition to data related to computer systems like credentials, logs, and code, the repo also contained assorted documents that established the identity of the owner and their relationship to AWS. These documents included bank statements, correspondence with AWS customers, and identity documents including a driver's license. Multiple documents included the owner’s full name. A LinkedIn profile matching the exact full name identified one person who listed AWS as their employer in a role that matched the kinds of data found in the repository. Other documents in the repository included training for AWS personnel and documents marked as “Amazon Confidential.” Based on this evidence, UpGuard is confident the data originated from an AWS engineer.

A couple of hours after the discovery, UpGuard notified AWS security, and the repo was taken offline. The repository was public for less than five hours. However, as UpGuard noted by referencing this paper [PDF] from North Carolina State University, there are ways to discover mishaps like this quickly via GitHub's search features.

"One is able to discover 99 per cent of newly committed files containing secrets in real time," it said. These researchers believe that "thousands of new, unique secrets are leaked every day". What this means is that even five hours of exposure is plenty of time for confidential information to be picked up by criminals.

Scotiabank slammed for 'muppet-grade security' after internal source code and credentials spill onto open internet READ MORE

Why do so many secrets end up in GitHub repositories? A common reason is that developers trying out some new ideas hard-code credentials into applications, and then publish the code without thinking through the implications – or forget they are pushing to a public repo.

The problem is so common that GitHub has a token scanning service that scours "public repositories for known token formats to prevent fraudulent use of credentials that were committed accidentally."

GitHub also recommends "considering any tokens that GitHub sends you messages about as public and compromised".

In this case, however, the repository was "structured as general storage rather than application code, with many files in the top-level directory and no clear convention for the subdirectories," noted UpGuard. Why was this in a GitHub repository at all? This is not known; it could be anything from an errant script to a misguided attempt to use GitHub like Dropbox, for exchanging or backing up files.

UpGuard noted: "There is no evidence that the user acted maliciously or that any personal data for end users was affected, in part because it was detected by UpGuard and remediated by AWS so quickly." It is an oddly complacent conclusion bearing in mind the statements that precede it, but AWS will be hoping it is correct.

Does GitHub make it too easy to search its repositories for passwords and access tokens? Should GitHub scan for tokens before rather than after they are in public repositories? Should such data be redacted from internal logs and support data just in case – as Microsoft appears to have done?

We have asked AWS for comment and will report back with any statements. ®

Updated to add

A spokesperson for Amazon has told us the code repository was used by the engineer in a personal capacity, and claimed no customer data or company systems were exposed.