Blog

AI Code Audit With Sandboxed Agents

AI code audit should run in sandboxes, spawn specialist agents, validate findings with commands, and downgrade claims that lack proof.

Drew Stone
code-auditorsecurityai-audit
Code auditor agent workspace showing repository audit, sandboxed tools, subagents, and finding validation

AI code audit is useful only when the agent can inspect the repository, run tools, test exploit paths, and show why each finding is real. A model that reads code and writes confident prose is not an auditor. Tangle Code Auditor is the upcoming product surface at audit.tangle.tools; until that domain is live, public copy should describe the audit runtime without linking to a product URL.

The product shape is simple: coordinator agent, sandboxed specialist agents, stack-specific tools, and findings that must survive validation.

What The Audit Runtime Must Do

RequirementWhy it matters
clone target repoauditor sees the same code reviewers see
isolate toolsuntrusted build scripts run away from production systems
spawn specialistsEVM, Solana, Move, zk, and app security require different tools
run commandsclaims need compile, test, static analysis, or proof output
validate severityhigh and critical findings need evidence
emit reporthumans need a concise issue list and reproduction steps

This is closer to an audit workflow than a scanner workflow. Scanners such as CodeQL and Semgrep are useful tools. The agent’s job is to use tools, inspect context, and prove or discard hypotheses.

Example Audit Command

pnpm redteam audit --repo https://github.com/org/protocol
pnpm redteam audit --repo https://github.com/org/protocol --ref main --focus "reentrancy,flash-loans" --json

The command is only valuable if the report connects each issue to files, exploit path, command output, and a fix direction.

Sandbox-First Design

Security audits execute unknown code. That should happen in a managed environment, not on a developer laptop with production credentials.

pnpm redteam sandbox create --capability evm-foundry
pnpm redteam sandbox exec <sandbox-id> "forge test"
pnpm redteam sandbox destroy <sandbox-id>

For the runtime layer behind this pattern, read LLM Sandbox Environment For Agent Runs. For scanner triage, read AI Vulnerability Scanner Vs Agent Audit.

Finding Lifecycle

An audit agent should move findings through stages instead of dumping every suspicion into the report.

StageRequired evidence
candidatewhy the pattern or code path looked risky
reachablehow an attacker or user can reach the path
reproduciblecommand, test, trace, or proof notes
severity assignedconcrete impact and affected assets
fix proposedcode-level mitigation or design change
residual riskwhat the fix does not cover

That lifecycle protects the report from two common AI failures: overclaiming and duplicate noise. A candidate finding can be useful in notes, but it should not become a high-severity issue until the audit runtime has validated reachability and impact.

Coordinator And Specialists

The coordinator should keep a small tool surface and hand stack-specific work to specialists.

SpecialistUseful work
EVMFoundry tests, traces, invariants, access control paths
SolanaAnchor constraints, account validation, program tests
Moveresource and ability checks, package tests
zkcircuit constraints, proof generation paths, witness handling
app securityauth, input validation, dependency and dataflow review

This is where sandboxes matter. Each specialist can run the toolchain it needs without polluting the coordinator or exposing local credentials. The output should come back as evidence, not as raw model opinion.

The coordinator should also track negative evidence. “Slither flagged this, but the path is unreachable because only the owner can call it after initialization” is useful audit work. Good reports include what was checked and rejected when that context prevents duplicate future findings.

That negative evidence should stay out of the executive summary but remain available in an appendix or trace. Engineers need the decision trail when the same warning returns in a later scan.

What This Does Not Prove

An AI code audit does not prove the system is secure. It finds and validates issues under defined scope. Human security review, formal audits, monitoring, and defense-in-depth still matter.

Decision Rule

Use AI code audit when you need fast, evidence-backed review before a deeper audit or release gate. Reject findings that do not include a reproduction path or validation artifact.

FAQ

What is AI code audit?

It is repository review by an AI-assisted audit system that inspects code, runs tools, validates findings, and produces a report.

How is this different from a vulnerability scanner?

A scanner flags patterns. An agent audit can inspect context, run commands, test hypotheses, and downgrade unproven claims.

Is audit.tangle.tools live?

It is the planned Code Auditor surface. Public copy should treat it as launching soon until the domain resolves.

What stacks should a code auditor support?

At minimum, the stack it claims to audit. Tangle’s auditor direction includes EVM, Solana, Move, zk, Cosmos, and general application code paths.