"We'll just use ChatGPT for threat modeling."

I hear this more than I'd like. And I get it — the promise is appealing. Paste in your architecture docs, ask for a threat model, get something back in seconds.

But here's what that output actually is: a plausible-sounding document that might be completely wrong.

The Hallucination Problem

Large language models don't understand your system. They predict what a threat model should look like based on patterns in their training data. Sometimes they're right. Often they're not.

I've seen ChatGPT-generated threat models that:

Identified threats for components that didn't exist in the architecture

Missed obvious attack paths because the model didn't understand the actual data flow

Produced STRIDE categories that sounded correct but mapped to the wrong assets

Referenced CVEs that were either irrelevant or entirely fabricated

The output looks professional. It has the right headings, the right terminology, the right structure. But it's theater — it performs expertise without having it.

What Auditors Actually Want

When an auditor reviews your threat model under DORA, NIS2, or ISO 27001, they're not just checking that a document exists. They're looking for:

Traceability. Can you show how each identified threat connects to your actual architecture? Can you explain why certain risks were prioritized over others?

Methodology. Did you follow a recognized approach (STRIDE, PASTA, Attack Trees)? Can you demonstrate consistent application?

Evidence of reasoning. Why did you scope it this way? What assumptions did you make? What did you explicitly exclude and why?

A raw LLM output fails all three. There's no reasoning to trace — just pattern matching. When the auditor asks "why did you identify this as a high-severity threat?", the honest answer is "because GPT said so." That's not going to fly.

AI-Assisted Is Different

Here's what AI-assisted threat modeling actually looks like:

Human defines scope. I decide what we're analyzing, what's in bounds, what's out. The AI doesn't choose this.

AI accelerates analysis. Given proper context about the architecture, AI can help identify potential threat categories, suggest attack patterns, map to frameworks like MITRE ATT&CK. This is the tedious part that used to take days.

Human validates everything. Every threat the AI suggests gets reviewed. Does this actually apply to this system? Is the severity assessment reasonable given our context? Does this match what I know about real-world attack patterns?

Human documents reasoning. The final deliverable includes rationale. Not "AI said so" but "this threat was identified because of X trust boundary, validated against Y attack pattern, and prioritized based on Z business context."

The AI is a tool that makes me faster. It's not a replacement for judgment.

The Speed vs. Quality False Dichotomy

The old way: security consultant spends 4-6 weeks producing a threat model. Most of that time is spent on documentation formatting, framework mapping, and cross-referencing — work that doesn't require deep expertise.

The "just use ChatGPT" way: get output in 5 minutes that won't survive scrutiny.

The AI-assisted way: get a thorough, expert-validated threat model in days. The AI handles the tedious work. The human handles the judgment calls.

This isn't about choosing between fast and good. It's about recognizing that most of the time spent on traditional threat modeling wasn't the valuable part.

Questions to Ask Your Vendor

If someone offers you AI-powered threat modeling, ask:

1. Who validates the output? If the answer is "the AI is very accurate," walk away.

2. Can you show me the reasoning chain? You should be able to trace any finding back to a specific architectural element and a specific threat pattern.

3. What happens when the AI is wrong? There should be a human review process that catches hallucinations before they reach your deliverable.

4. Will this hold up to an auditor? Ask for sample documentation. Look for methodology, traceability, and explicit reasoning — not just a list of threats.

The Bottom Line

AI is transforming how we do threat modeling. But "AI-generated" and "AI-assisted, expert-validated" are not the same thing.

One produces documents that look right.

The other produces documents that are right — and can prove it.

If you're doing threat modeling for compliance, your auditor will know the difference. Make sure you do too.

Key Takeaways

LLMs hallucinate — they produce plausible-sounding threats that may be completely wrong for your system

Auditors want reasoning — traceability, methodology, and documented assumptions, not just a list of threats

AI-assisted ≠ AI-generated — one uses AI to accelerate expert work, the other replaces expert judgment entirely

Ask vendors tough questions — who validates output, can you trace findings, will it pass audit scrutiny

Speed and quality aren't mutually exclusive — AI handles tedious work, humans handle judgment calls

Next Steps

Want to see what AI-assisted, expert-validated threat modeling looks like? Our sample report shows the methodology, traceability, and reasoning that auditors expect.

Book a call to discuss your threat modeling needs.

Found this helpful?

Share it with your network

Written by

Jeffrey von Rotz

Founder

Building tools to make threat modeling accessible to every development team.

Ready to automate your threat modeling?

Join security teams using Ansvar to build comprehensive threat models in days, not weeks.

Get Started