A benefit of having a platform for delivering a service is that the centralization helps to aggregate and analyze data. This data can give us insight into vulnerability trends, feedback on how well processes are working, and metrics on how well something is (or is not!) performing. In other words, we should approach this data with an eye towards using it to inform decisions or have actionable outcomes.

One broad topic to investigate is the difference in overhead of managing bug bounty programs vs. conducting pen tests. This isn’t meant to position one as better than the other; they both address security needs. Instead, this discussion aims to illuminate how to prepare and what to expect for each effort.

Security testing searches for flaws in an app. Ideally, it will produce only a few results, implying the app is secure. However, when there are vulns, it’s important for security testing to provide a strong signal to identify them. Just as it’s equally important for the security testing to avoid generating noise and distraction for issues that aren’t relevant or incur no risk to the app.

The following graph highlights this signal and noise relationship in the findings reported via the Bug Bounty and Pen Test programs on the Cobalt platform.

Seeking signal on the right-hand side

Bug bounties have weak filter effects. They produce noisy findings that require manual overhead to review. Regardless of whether this task falls on the organization sponsoring the bounty program or the platform operating the program, someone will be responsible for identifying which findings require follow-ups and fixes, and which findings can be forgotten.

Pen tests tend to function as a band-pass filter — they overwhelmingly produce valid findings (i.e. bugs to be fixed) and attenuate noise that would be distracting to developers.

With data in hand, we can design strategies to reduce noise from crowdsourced security testing.

Public bug bounties don’t have an initial barrier to who submits a finding or what the quality of a finding may be. One drastic solution is to set up a barrier: turn the bounty program private or otherwise close it off to a vetted group of researchers. Another solution is to invest in better triage (or understand that triage is an investment of time and people in the first place) — this overhead is one of the hidden costs of running a public bounty program. And, of course, there can be strategies to filter out bad reporters and bad findings over time. Reputation systems are a step in this direction.

Even pen tests may have findings that fall into the noise outside of the valid category. This presents a chance to explore how pen tests are managed and start some hypothesis testing. For example, do duplicate findings appear due to lack of communication and collaboration among the pen test team? Are invalid findings due to a pen tester’s lack of skill or lack of insight into evaluating risk for complex environments? With data, we not only have a chance to test these questions, but we can watch for trends to see if corrective actions are having a desirable effect.

When talking about findings and vulns and quality it’s easy to lose track that people are integral to these processes — both the researchers who identify vulns and the developers who fix them. This is even more applicable to crowdsourced security testing. Data that measures how well these processes are working helps us not only build more secure apps, but helps us manage programs so they encourage the positive behaviors that produce the signals we need to improve that security.