For the past three years, the Bishop Fox engineering team has been tackling a number of emerging challenges in the offensive space and developing technology that amplifies their security capabilities. Most people, however, don’t even know that Bishop Fox has an engineering team.



In an effort to share our learnings with the security community and invite meaningful discussions, we wanted to introduce a new blog series, Inside Engineering, where we will talk about the problems we’re solving, the obstacles we’re facing, and the new technology we’re building.



To kick things off, the first three posts will walk you through our experiences developing a new methodology to identify the modern attack surface.

Why Develop a Whole New Methodology to Gain Visibility Into the Modern Attack Surface?

While there are innumerable open-source intelligence (OSINT) tools and data sources for discovering domains and subdomains, scanning ports for exposed services, and fingerprinting tech stacks, we weren’t able to find a suitable model that could synthesize all those results and recognize which digital assets should be prioritized as targets. So we decided to build a new framework.

Our approach now unites existing technologies into a codified, modern representation of the attack surface. But this first post of three will begin at the beginning with the first asset and target models we developed and the lessons we learned along the way.

Principles

As with designing any model, we first developed a set of principles to guide how we map an effective, representative attack surface that could be continuously discovered and updated. We decided that the model must be the following:

Reliable and Scalable. We intend to perform discovery at scale, so the framework must support scalable ingestion of large amounts of data - typically in a short, daily window. By our estimates, the system would need to ingest approximately 10 million data elements in 6 hours to be considered useful.

Consistent. As we intend to incorporate insights from different perspectives on the modern attack surface, we need to ensure that the data tells a consistent story. In order to scale our efforts, we perform discovery against subdomains and scanning against targets independently. The resulting data sets can be greatly affected by constantly changing DNS records. We need to ensure that our understanding of a subdomain’s IP addresses reflects scans against the same IP addresses for associated targets.



Precise. We intend to provide a well-defined representation of the modern attack surface. Because security tools and data sources are not inherently accurate, up-to-date, or definitive, our framework must be flexible enough to layer the benefits of many scanning and collection techniques. For example, we discover and aggregate subdomains from several data sources to precisely identify the scope of the attack surface.

Challenges

While our principles defined our destination, outlining our challenges helped us choose which routes to take in our preliminary activities.

From past experiences developing security tools (notably Search Diggity), we anticipate certain obstacles we need to account for as we develop our model. Namely, the following:

Tool limitations. OSINT tools are often unreliable and poorly maintained. To account for these limitations, we needed to pull insights from several tools rather than relying on a single source. We cannot rely on heavy loads with a single execution because most tools either require immense resources to scale or crash under pressure.





OSINT tools are often unreliable and poorly maintained. To account for these limitations, we needed to pull insights from several tools rather than relying on a single source. We cannot rely on heavy loads with a single execution because most tools either require immense resources to scale or crash under pressure. The threat of DoS . Network detection services typically focus on the origin of source traffic and request signatures to throttle, block, and report traffic. If we are too aggressive, the results of repeated scans can be inconsistent and imprecise, and risk taking our clients' systems down.





. Network detection services typically focus on the origin of source traffic and request signatures to throttle, block, and report traffic. If we are too aggressive, the results of repeated scans can be inconsistent and imprecise, and risk taking our clients' systems down. The need to provide data from many angles. Penetration testers and customers want to pivot between interesting pieces of the attack surface, analyzing the data for organizational patterns and avenues of attack. We need to be able to present a multi-faceted look at the modern attack surface.

We kept these challenges in mind as we developed our technology and continually tested our framework to ensure it would give us a reliable, scalable, and comprehensive view of the modern attack surface.

Iteration One: Identifying Targets

In our first model, our goal was to identify the entire attack surface. We chose to start by testing the following hypothesis:



If we include subdomains in the representation of an organization’s attack surface, we can discover major gaps in the modern attack surface.

Many companies have subdomains open to the internet that they are not aware of. And because they don’t have visibility into those subdomains, they are unable to secure them. Our professional services group has often found security risks during finite assessments that other testers have missed because they didn’t review the entire attack surface, including subdomains.



As a first step for mapping our external attack surface, we focused on finding all subdomains and CIDRs – including those served by serverless, cloud, and rapidly changing or fraudulent hosts. In other words, we wanted to ensure that our tool saw every potential attack surface or subdomain that a threat actor could put in place.



In order to test our first hypothesis, we need our framework to represent targets originating from both subdomains and network ranges (CIDRs). An example HTTP target to a subdomain is http://www.bishopfox.com:80 . An an example RDP target to a CIDR is: rdp://123.45.67.89:3389 .

As illustrated in the diagram below, we developed a pipeline for automating the collection and aggregation of our targets:

FIGURE 1 - The first attack surface pipeline for continuously discovering targets

We were pleased with the results of this first iteration. In a wide variety of testing environments, we were able to see all targets (both subdomains and CIDRs) associated with the organization within a matter of hours. We cross-referenced these targets through significantly intensive manual efforts to learn that, not only had we identified all targets, but we were able to replicate the testing to ensure consistent and reliable results.

Lessons Learned

While this first iteration showed enormous potential to efficiently collect a comprehensive picture of an organization’s digital attack surface, we knew we could do better. The lessons we learned translated into three areas of focus for future improvements:

We found that the usage of dynamic DNS records (i.e., ones that use wildcards) was quite common. These wildcards identified a pattern of subdomains registered to a domain. As a result, there was a near-infinite number of subdomains that could enter the system.



We had difficulty automating DNS resolutions in a reliable, consistent, and precise manner. We initially tried to pool a list of authoritative nameservers when performing DNS lookups. If we noticed anomalies, we would cycle to different nameservers. This entire process hinged on effectively manually detecting anomalies across nameservers – a process that proved cumbersome, unscalable, and unreliable.



We also found that our ability to query for targets was not performant enough to scale for enterprise organizations with a very broad attack surface (e.g., those with 50K+ subdomains). As shown in the above diagram, we combined targets from both domains and CIDRs and then delivered them to the user. While this was feasible on our initial smaller data sets, this process did not scale.

What's Next?

Through this iteration process, the engineering team has arrived at an accurate and holistic view of our clients’ businesses and provided a framework for them to operationalize that information.



But since this first iteration left us room to improve, the next Inside Engineering blog post will walk through how we used the lessons from the first approach to refine our framework with new applied techniques and perspectives.