It’s no secret that GitHub has become one of the main information resources for red-team reconnaissance. I mean, why bother with complex attack vectors when you can just grab AD credentials that were put in some source code and pushed to a public repository? Many times, such creds will grant you a remote desktop connection to the org (usually via Citrix), and, if you’re lucky, you might even land an admin account.





How frequently does this actually happen?





We wanted to have a better idea, so we started some quick and dirty research against some of the F500 companies to find out. Our methodology was, again, swift and dirty. We didn’t want to spend too much time reading through code or perfecting search queries and we were also not too worried about coverage. We just wanted to get some rough data. Nonetheless, in a few hours of work, we obtained critical data of 10 top enterprises. So, what did we find?





Enterprise admin creds, Domain admin creds, many more ‘regular’ AD creds, multiple database credentials (there is something about these connection strings that just keep popping up in repositories), SAP creds, mainframe passwords, SMTP, FTP, SSH – you name it, we had it.





Separating the wheat from chaff





OK, so how do we actually find creds on GitHub? Considering that we were targeting huge enterprises here, it’s rather obvious that just plugging the org’s name in GitHub’s search console would just result in hundreds, sometimes thousands and millions of completely irrelevant results:









Trivial queries (such as adding the word ‘password’ to the search) don’t really help either as you’re just going to hit numerous HTML login forms that are of little to no use (it also doesn’t help that GitHub search logic is… complicated. Search results for the query ‘pass’ will often not include documents containing the word ‘password’).





So, how to go about password hunting?





Search methodology

The easiest anchors we could think of for finding sensitive (read: non-public) data was to search by Active Directory domain names: there is really little reason why an internal domain name should pop up in public code.





So, we came up with the following methodology for grabbing creds:

Choose a (F500) target.

Find its main Domain names (usually company_name.com).

Find an NTLM login for one of these domains and extract the internal AD domain name from it.

Put the result in GitHub search (perhaps narrow it down by attaching the word ‘password’ to the query).

Profit!





NTLM’ing

Maybe it’s worthwhile to elaborate a bit about step ‘c’ of the above concept. We can break it down into two different questions:

How do you find NTLM logins for a given company? How do you extract an internal AD Domain Name from an NTLM login?

Starting with question 1 – while there are many potential endpoint URL’s that might return NTLM logins, we found that the Microsoft’s Autodiscover service is quite commonly used (at least by big corporations), so we chose it as our main NTLM resource. Just plug the organization’s main domain in this URL: https://autodiscover.company.com/rpc.













Which brings us to question 2 – how do we get an AD name from this login?

Simple enough, just input something in the user/password fields (anything really) and grab the NTLM ‘negotiate’ header from the server’s response:









Now, just base64 decode and you have the internal host (and domain) names (FQDN) of the authenticating server.





Disclosure & other thoughts

It was not easy to get through to companies that had their data exposed. Frustrating as it is, not too many of these giants have clear addresses for security disclosures. However, even when we did manage to get through, we came across a problem. Sure, removing the detected repositories (or at the very least making them private) is pretty straight forward. But then again, the repositories we detected are just a tiny subset of the repositories these organizations have online, and, by no means do we pretend to have found the ‘most severe’ data exposure. We just launched a very basic preliminary script that returned pretty scary results.





It is quite obvious that some larger, more robust solution is required to properly handle GitHub data leaks, dealing with developer awareness as well as with automated detection (and indeed some start-ups are working towards this direction). Security assessments and red-team engagements can be used to understand and reduce the risk for each organization. For now, though, and until it gets the attention it deserves, GitHub remains one of the attacker’s best friends.