Natural-language E2E testing for wallet apps lets a browser agent execute user-facing flows from a goal, capture DOM and screenshot evidence, and stop before irreversible signing or value transfer. The useful target is not “the agent clicked buttons.” The target is a reproducible trace: page state, wallet prompt state, network state, screenshot, final assertion, and stop reason. Tangle Browser Agent is built for that evidence loop, and Tangle Sandbox can host the surrounding test workspace.
Wallet apps are harder than ordinary forms because the dangerous moment is often outside the dapp: a wallet confirmation, signature request, transaction preview, or network switch.
A Safe Test Shape
bad run \
--url https://example-defi-app.test \
--goal "Open the swap flow, enter a small quote, verify the wallet confirmation appears, then stop before signing."
The stop condition is part of the test. For wallet flows, “do not sign” should be explicit unless the environment uses a test wallet, test chain, and non-value funds.
What To Capture
| Evidence | Why it matters |
|---|---|
| screenshot | shows the rendered state a user would see |
| DOM state | records selectors, text, disabled states, and hidden errors |
| wallet prompt | proves the signing boundary appeared |
| network state | catches failed RPCs or wrong chain |
| stop reason | proves the agent stopped before destructive action |
Playwright documents the browser automation layer. Wallet testing adds provider state and signing safety. The MetaMask developer docs, WalletConnect docs, and Ethereum JSON-RPC docs define the wallet and chain surfaces the browser task may encounter.
Stop Conditions
Wallet flows need explicit stop rules because the last click can become a transaction:
| Stop at | Unless |
|---|---|
| signature request | test wallet, test chain, and explicit signing permission |
| network switch | the test goal includes chain switching |
| approval transaction | allowance is on a disposable test token |
| unknown wallet modal | the run can capture evidence and request review |
This keeps natural-language testing useful without turning every smoke test into a production-risk exercise.
Where It Belongs In The Test Suite
Natural-language E2E should sit above deterministic checks:
| Layer | Job |
|---|---|
| unit and contract tests | prove core logic and invariants |
| transaction simulation | catch revert paths and allowance mistakes |
| Playwright flows | lock down deterministic UI paths |
| browser-agent smoke | catch real user regressions and copy/layout drift |
| manual review | approve destructive or high-value flows |
The browser agent is best at the messy boundary where copy, layout, wallets, RPCs, and user intent meet. Keep it there; do not ask it to replace lower-level invariants.
Where Tangle Fits
Browser Agent supplies the agentic browser run and evidence capture. Sandbox supplies the isolated workspace for app code, test dependencies, and artifacts. For broader browser automation, read browser automation for AI agents.
What This Does Not Prove
A natural-language test is not a substitute for deterministic contract tests. It catches product and integration regressions near the user surface. Keep smart contract invariants, transaction simulation, wallet mocks, and RPC-level tests in the suite.
Start
Run one non-mutating browser goal against a staging wallet flow. Require screenshots, DOM evidence, and a final stop reason before letting the agent near signed transactions.
FAQ
Can AI agents test wallet apps safely?
Yes, if the run uses test environments, explicit stop conditions, evidence capture, and non-mutating defaults.
Should a browser agent sign wallet transactions?
Only in controlled test environments with test wallets and test funds. Otherwise it should stop at the signing boundary.
What is the minimum evidence for wallet E2E tests?
Capture the browser screenshot, DOM state, wallet prompt state, network or chain context, and stop reason.