What does effective-risk rescoring replace in our existing stack?

It doesn't replace your scanner — Tenable, Rapid7, Qualys, GitHub Dependabot, Snyk all still feed in the CVEs they find. It replaces the spreadsheet (or the analyst hour) where someone re-reads the CVSS vector against your asset's actual exposure, criticality, and compensating controls and writes 'not exploitable here, suppress' in the ticket. Every suppression decision becomes a deterministic, cited rule-firing instead of a tribal-knowledge note that leaves with the person who wrote it.

How is this different from an LLM that just summarises CVEs?

An LLM produces a different answer every time you ask. An effective-risk engine produces the same answer every time, traceable to a specific rule firing with a citation. For audit, insurance, and regulator-facing work, that determinism is the product — a customer can run the same CVE × asset × controls input through the engine a year later and reproduce the score. LLMs are how the customer's agent asks the question; the engine is what answers.

What input do we need to provide?

Per CVE, four required slots: asset name, asset exposure (internet / internal / isolated), asset criticality (low / medium / high / critical), plus a list of compensating controls (with whether each is attested-by-engineer or evidenced-by-tool). Components, data classes, and data sensitivity are optional but improve the score. Most workflows fill these from a system description in the workflow frame — no separate asset registry needed for one-shot scoring. For repeated scoring against the same asset, register the context once.

Effective risk: turning the LLM-era CVE firehose into a triage queue you can actually work

Vulnerability scanners used to emit a few dozen CVEs a week. Then the LLM-assisted reporters arrived — agentic pentesters, AI-assisted CVE triagers, supply-chain scanners that re-evaluate every dependency on every commit — and the weekly count went up an order of magnitude. The 2024 NVD backlog made it worse: thousands of CVEs sat unscored for months, then landed in one batch.

The bottleneck is not finding vulnerabilities anymore. The bottleneck is deciding which of the ones you've already found actually matter for your systems.

That is what effective-risk rescoring does.

The one-paragraph mechanism#

Take a CVE. Take a structured description of the asset it might affect — exposure, criticality, data classes, the controls actually in place. Run both through a deterministic rule engine that knows how to adjust the CVSS vector: a remote-code-execution CVE against an isolated, MFA-gated, WAF-fronted internal service is not the same risk as the same CVE against an internet-exposed unpatched edge box. The engine returns an effective score, every altered metric annotated with the rule that changed it, and a citation chain back to the framework or threat-intel source the rule encodes.

Same input today and same input next quarter produce the same output. That is the property that makes it audit-defensible.

Data flow#

mermaid

flowchart LR
  A["Vulnerability scanners
+ AI reporters
+ NVD feed"] -->|CVE ID + base CVSS| B["Customer's MCP agent"]
  C["Asset description
exposure / criticality
data classes"] --> B
  D["Control evidence
attested vs evidenced"] --> B
  B -->|effective_risk_inline_batch| E["Effective-risk engine"]
  E --> F{"Rule library evaluates
CVE × asset × controls"}
  F -->|rule fires| G["Altered metric
+ citation"]
  F -->|no rule, gap detected| H["Context revision proposal
ask analyst for X"]
  G --> I["Effective score
+ provenance chain"]
  H --> I
  I --> J["Ticket: suppress / escalate /
request more info"]

Every arrow is a contract, not a guess. The engine refuses to silently degrade — if a control claim cannot be evaluated, it says so and proposes the missing input.

Why CVSS base scores alone do not work anymore#

Four well-known problems, sharpened by the LLM-era volume:

CVSS base is asset-agnostic. A 9.8 against an isolated lab box and a 9.8 against your customer-facing payments API are the same number. Your analyst has to write the difference in a comment, every time.
CVSS temporal and environmental metrics are rarely populated. The vectors exist; the discipline of filling them in across thousands of vulnerabilities does not survive a real backlog.
KEV listing changes the priority but not the base score. A medium that is actively exploited in the wild should outrank a non-KEV high. CVSS does not encode that ordering for you.
Compensating controls are invisible. The fact that the affected service sits behind an authenticated reverse proxy with rate limits and request-body inspection is not in the CVE record. It is in your head, or your runbook, or nowhere.

The engine encodes all four as rules. Each one fires deterministically against the inputs you give it, and each firing carries a citation to the framework or threat-intel record the rule derives from.

What the numbers actually look like#

A typical Patch-Tuesday dump lands ~120 new CVEs on a mid-sized engineering org. Run them through effective-risk rescoring against a realistic asset inventory — most services are not internet-exposed, most have at least two of the standard compensating controls in place, a chunk are not affected because the vulnerable component is not in the call path — and the queue collapses to ~12-15 that move the needle. Of those, two or three usually carry KEV-listed promotions that bump them above what their base CVSS would suggest.

The ratio is the point. An analyst can work 12-15 well-scoped tickets with cited rationale in an afternoon. They cannot work 120 in a week.

Why deterministic, not "just ask an LLM"#

The temptation in 2026 is to feed the CVE plus a system description into a model and ask "is this exploitable here." That works for exploration. It fails for everything downstream:

Audit. A regulator (DORA, NIS2, the insurer underwriting your cyber policy) will ask why you suppressed CVE-X against system-Y. "Our LLM said so on a Tuesday" is not a defensible answer.
Reproducibility. The same input next quarter should produce the same score. LLMs do not guarantee that.
Drift detection. When a rule changes — because a threat-intel source updated, because a control mapping moved — every score that depended on it can be re-run and the deltas reported. LLM outputs cannot be diffed cleanly.

The customer-facing agent is the right place for the LLM. The scoring decision is the wrong place. Effective risk splits those concerns: your agent (Claude Desktop, Copilot Studio, Cursor, anything that speaks MCP) calls a deterministic engine, presents the cited result to the analyst, and helps them write the ticket.

Two ways to invoke it#

The engine is exposed through the Ansvar gateway as two tools, both gated to Team and Company tiers:

effective_risk_inline — one CVE × one inline asset description × one set of controls. Use during a workflow when you already have the asset in the workflow frame.
effective_risk_inline_batch — same shape, but up to 100 CVE IDs against the same asset. The Patch-Tuesday triage call.

Both return the same result shape: effective_score, altered_metrics with rule provenance, applicable_controls showing which fired versus which did not, and — when the engine cannot decide without more input — a context_revision_proposed result class carrying specific proposals for what to ask the analyst next. No silent fallbacks. No fabricated reductions.

For repeated scoring against the same asset over months (the audit-ledger use case), register the asset once via the persistent effective_risk tool. Each call then carries a stable scoring_context_id that anchors the audit trail.

Where this lives in the gateway#

Effective-risk rescoring is one of the gateway's first-class capabilities, alongside the workflow engine (threat modelling, gap analysis, DPIA), the architecture knowledge tools, and the document-citation surface. The same MCP agent that runs your STRIDE threat model can call the engine mid-walk to score the residual risk of each identified threat. The same agent that drafts your DORA gap analysis can re-score every open vulnerability against the article-level requirement it maps to.

The product is not "another scoring tool." The product is deterministic risk reasoning, callable from the agent you already use, with citations every regulator already accepts.

Try it#

Tier requirement. The effective_risk_inline and effective_risk_inline_batch tools are gated to Team and Company subscriptions. Free and Premium tiers do not include them. See /pricing for the tier matrix.

If your AI client speaks MCP and you are on Team or Company:

Connect it to the gateway — https://gateway.ansvar.eu/mcp, OAuth 2.1, two-minute setup.
Hand your agent a CVE list and an asset description and ask: "Use the Ansvar gateway to compute effective risk for each of these CVEs against this asset, and tell me which ones I can suppress."
The agent calls effective_risk_inline_batch, returns the rescored list with rule citations, and you triage the short tail.

The mechanism is deterministic by design: the rules are versioned, every score carries the rule citations it fired on, and the same inputs always produce the same score — so a reviewer can trace any score back to the exact rules and asset facts behind it.

The asymmetry is on your side: an attacker can generate more CVEs against you than you can read. A deterministic engine that suppresses the inapplicable ones with cited reasoning is how you keep the queue workable without hiring a second triage team.