Blog

Natural Language E2E Testing for Wallet Apps

Natural-language E2E testing for wallet apps lets agents drive browser flows while stopping before destructive signing and preserving evidence.

Drew Stone
agentsbrowsertesting
Wallet application test run showing browser state, wallet prompt, screenshot evidence, and safe stop

Natural-language E2E testing for wallet apps lets a browser agent execute user-facing flows from a goal, capture DOM and screenshot evidence, and stop before irreversible signing or value transfer. The useful target is not “the agent clicked buttons.” The target is a reproducible trace: page state, wallet prompt state, network state, screenshot, final assertion, and stop reason. Tangle Browser Agent is built for that evidence loop, and Tangle Sandbox can host the surrounding test workspace.

Wallet apps are harder than ordinary forms because the dangerous moment is often outside the dapp: a wallet confirmation, signature request, transaction preview, or network switch.

A Safe Test Shape

bad run \
  --url https://example-defi-app.test \
  --goal "Open the swap flow, enter a small quote, verify the wallet confirmation appears, then stop before signing."

The stop condition is part of the test. For wallet flows, “do not sign” should be explicit unless the environment uses a test wallet, test chain, and non-value funds.

What To Capture

EvidenceWhy it matters
screenshotshows the rendered state a user would see
DOM staterecords selectors, text, disabled states, and hidden errors
wallet promptproves the signing boundary appeared
network statecatches failed RPCs or wrong chain
stop reasonproves the agent stopped before destructive action

Playwright documents the browser automation layer. Wallet testing adds provider state and signing safety. The MetaMask developer docs, WalletConnect docs, and Ethereum JSON-RPC docs define the wallet and chain surfaces the browser task may encounter.

Stop Conditions

Wallet flows need explicit stop rules because the last click can become a transaction:

Stop atUnless
signature requesttest wallet, test chain, and explicit signing permission
network switchthe test goal includes chain switching
approval transactionallowance is on a disposable test token
unknown wallet modalthe run can capture evidence and request review

This keeps natural-language testing useful without turning every smoke test into a production-risk exercise.

Where It Belongs In The Test Suite

Natural-language E2E should sit above deterministic checks:

LayerJob
unit and contract testsprove core logic and invariants
transaction simulationcatch revert paths and allowance mistakes
Playwright flowslock down deterministic UI paths
browser-agent smokecatch real user regressions and copy/layout drift
manual reviewapprove destructive or high-value flows

The browser agent is best at the messy boundary where copy, layout, wallets, RPCs, and user intent meet. Keep it there; do not ask it to replace lower-level invariants.

Where Tangle Fits

Browser Agent supplies the agentic browser run and evidence capture. Sandbox supplies the isolated workspace for app code, test dependencies, and artifacts. For broader browser automation, read browser automation for AI agents.

What This Does Not Prove

A natural-language test is not a substitute for deterministic contract tests. It catches product and integration regressions near the user surface. Keep smart contract invariants, transaction simulation, wallet mocks, and RPC-level tests in the suite.

Start

Run one non-mutating browser goal against a staging wallet flow. Require screenshots, DOM evidence, and a final stop reason before letting the agent near signed transactions.

FAQ

Can AI agents test wallet apps safely?

Yes, if the run uses test environments, explicit stop conditions, evidence capture, and non-mutating defaults.

Should a browser agent sign wallet transactions?

Only in controlled test environments with test wallets and test funds. Otherwise it should stop at the signing boundary.

What is the minimum evidence for wallet E2E tests?

Capture the browser screenshot, DOM state, wallet prompt state, network or chain context, and stop reason.