Research
Arguments about agent improvement that can be tested against traces, gates, and runtime behavior.
Traces Are The Training Data
Why self-improving agents need full trajectories, tool spans, analyst findings, provenance, and replay instead of final scores alone.
Self-improving agents
Each note should make a claim about how agent systems improve, fail, or prove progress.
The Self-Improving Stack
A series map for self-improving agent systems, from optimization theory and prompt search to runtime topology, traces, memory, and governance.
Prompt Optimization Is Not The Whole Game
Where GEPA, DSPy, MIPRO, AxLLM, and related prompt optimizers fit inside a larger self-improving agent stack.
Skills Are Trainable State
How SkillOpt, Voyager-style skill libraries, and agent skills turn durable procedure into an optimization surface.
Optimization Theory For Agent Builders
A compact map from hill climbing and Bayesian search to GEPA, SkillOpt, Frontier Tuning, agent runtimes, and noisy promotion gates.
When The Model Itself Is Mutable
How SFT, RLHF, process supervision, tool-use RL, and Microsoft Frontier Tuning differ from public prompt, skill, and harness loops.
Memory Is Not Automatically Learning
How episodic memory, knowledge gates, retrieval evals, negative knowledge, and production trace mining fit into self-improving agent systems.
The Gate Is The Optimizer
Why held-out promotion, judge reliability, failure taxonomies, cost ceilings, and confidence intervals decide whether self-improvement is real.
Beat Random At Equal Compute First
Why best-of-N, self-consistency, verifier reranking, and compute-matched controls are the baseline for agent topology claims.
When The Harness Has To Evolve
Why meta-harness, AlphaEvolve-style code search, worktree isolation, and architecture frontiers matter after prompt and skill tuning plateau.
Topology Is The Missing Action Space
Why multi-agent self-improvement needs explicit runtime primitives for fanout, refine, select, parallelism, supervision, budgets, and replay.
Personas Are Content, Coordination Is Structure
How driver, worker, selector, reviewer, analyst, and coordinator roles become reliable multi-agent systems instead of roleplay.
Self-Improvement Needs A Safety Case
Why prompt injection, sandbox boundaries, eval poisoning, provenance, compliance, and release gates are core to any real self-improving agent stack.