Finding More Vulnerabilities Won’t Fix AppSec’s Biggest Challenge if AI Can’t Explain What’s at Risk
Question: A report found that different AI security scanners analyzing the same codebase often produced inconsistent findings. How does that kind of alert instability influence developer behavior, remediation, release cycles, and the day-to-day workload of AppSec teams?
Jeff Williams, CTO & Founder of Contrast Security
The first thing to understand is that AI doesn’t solve alert fatigue. Left alone, it makes it worse. We now have tools that can generate findings faster than any human team can triage them, so the bottleneck was never discovery. It was trust. The real question for the next few years is not how to find more, it’s how to know which of the thousands of findings actually matter in your environment.
That confidence has to come from your own context, not from generic ratings. Most teams still prioritize off a base score like CVSS, but those systems assume a reasonable worst case for almost every factor, which systematically inflates severity.
The result is a wall of criticals, and when everything is critical, nothing is. A vulnerability score calculated in the abstract is really just a guess with a decimal point on it. The same flaw can be a five-alarm fire in one application and completely unreachable in another.
To score accurately, you need sensors across both development and production, gathering four kinds of context at once. There’s
- Vulnerability context, what the flaw is.
- Architectural context, where it lives and whether the vulnerable code is actually reachable when the app runs.
- Threat context, whether anyone is targeting that path.
- And business context, whether it touches sensitive data or is under attack.
- The most powerful of these is reachability.
- If you can confirm from runtime evidence that an attacker’s input never reaches the vulnerable code, you can confidently set most of your backlog aside instead of drowning in it.
When you connect all of that telemetry, what you’re really building is a digital twin of your environment, a living model of your applications, their data, their connections, and their exposure.
Every piece of telemetry adds fidelity to that model. This is the part most organizations skip, and it’s the part that changes everything, because a model is something AI can actually reason over.
Combined with a model like that, AI becomes genuinely strategic. It can:
- Explain your security posture in plain terms,
- Spot the systemic weaknesses behind individual findings, and
- Help you build initiatives that reduce whole categories of risk rather than chasing tickets.
- Without that model, you’re stuck in penetrate-and-patch, reacting to one finding at a time.
- Even with AI, you will never keep up that way.
- AI applied to a context-free pile of alerts just produces a bigger, faster pile.
So the validation that builds confidence in automated findings is really about evidence and feedback. Findings have to show
- Their work
- The actual request
- The reachable path
- The data at risk so a human can trust them
Every confirmed true or false positive should feed back into the model and make the next judgment sharper. Get that loop running and you reach the point where high-confidence findings can drive automated remediation safely. That’s the destination. Context is what gets you there.
Still, using AI to triage vulnerabilities and chase alerts is really just throwing new technology at yesterday’s problem. The real value comes when we point AI at the things we know how to do in theory but have never been able to do at scale, like
- threat modeling,
- security architecture, and
- runtime protection.
AI can help us reach secure-by-design, push standardization, and head off whole classes of problems before a single line of vulnerable code ever ships. Triage is treating the symptom. The real win is using AI to prevent the disease in the first place.




