Blog

AI Security Audit With Reproducible Findings

AI security audit should produce reproducible findings: file references, exploit path, command output, severity reasoning, and a concrete fix.

Drew Stone
code-auditorsecurity-auditai-security
Security audit report showing validated findings, severity proof, and reproduction commands

An AI security audit should be judged by reproducibility. If the finding cannot point to code, show the exploit path, explain severity, and give a command or test that supports the claim, it is not ready for a security decision. Tangle Code Auditor is being shaped around that standard: agent-assisted review with sandboxed execution and proof-backed reports.

The upcoming public surface is audit.tangle.tools. Until it is live, do not treat the URL as a working product page.

Finding Bar

FieldRequired content
titlespecific vulnerability, not a category label
affected codefile and function or contract path
exploit pathhow an attacker reaches the issue
impactwhat breaks and who loses value
reproductioncommand, test, or proof notes
severitywhy the level is justified
fixpractical mitigation

This is the difference between “possible reentrancy” and “this function can be reentered before balance update, here is a failing test.”

Tools Are Inputs, Not The Report

Static analyzers help, but their output needs triage. OWASP WSTG is useful for web testing structure. CodeQL and Semgrep are useful for code search and static analysis. The audit agent should use those tools and then explain what is real in the target repository.

Tool outputAgent responsibility
warninginspect reachability
dataflow pathcheck exploitability
failing testexplain root cause
build failureseparate setup issue from vulnerability
duplicate findingmerge or discard

Severity Discipline

High and critical findings need proof. A useful audit runtime should downgrade severe claims when no exploit or loss path is shown.

claim
-> inspect reachable code path
-> create or run reproduction
-> estimate impact
-> assign severity
-> downgrade if proof is missing

For smart-contract-specific validation, read Automated Smart Contract Audit With PoC Validation. For the difference between scanners and agent review, read AI Vulnerability Scanner Vs Agent Audit.

Reproduction Packet

Every accepted finding should include a packet a reviewer can run or inspect.

Packet itemPurpose
repo reffixes the exact code version under review
setup commandseparates environment failure from security signal
proof commandshows the finding can be triggered or reasoned about
expected resulttells the reviewer what should happen
observed resultshows the vulnerable behavior
proposed patchgives engineering a concrete next step

For web application issues, OWASP WSTG gives a useful testing structure. For code search, CodeQL code scanning can surface paths worth reviewing. The audit report should turn those inputs into repo-specific evidence.

What To Downgrade

The auditor should downgrade:

ClaimDowngrade reason
high severity without reachable pathno demonstrated attacker route
critical issue without asset lossimpact not proven
scanner warning with safe wrappercontext reduces risk
duplicate pathsame root cause already reported
setup failureenvironment problem, not vulnerability

This keeps the report short enough for engineers to act on.

Fix Verification

The audit should not end at “recommendation written.” For important findings, the agent should rerun the reproduction against the patched code and record the result. A good fix note says what changed, which proof no longer works, and whether any residual risk remains. That turns the audit from a report generator into a release gate.

When the reproduction cannot be rerun, the report should say why. A dependency issue, missing fixture, or unavailable chain state is still useful context for the reviewer.

What This Does Not Prove

An AI security audit does not guarantee absence of vulnerabilities. It produces findings under a scope and evidence bar. Use it to speed triage, catch obvious and non-obvious issues, and prepare for human review.

Decision Rule

Accept an AI security audit finding only when it includes location, path, impact, reproduction, and fix guidance. Treat unsupported severe claims as hypotheses.

FAQ

What is an AI security audit?

It is a security review assisted by agents that inspect code, run tools, validate findings, and produce a report with evidence.

What makes a finding reproducible?

The reviewer can follow the file references, commands, tests, or proof notes and see why the issue is real.

Does this replace human auditors?

No. It can speed review and catch issues earlier, but humans should review high-risk systems and final release decisions.

Where does Tangle Code Auditor fit?

Tangle Code Auditor is the upcoming audit product for sandboxed, agent-assisted security review.