Research

Arguments about agent improvement that can be tested against traces, gates, and runtime behavior.

Traces Are The Training Data

Why self-improving agents need full trajectories, tool spans, analyst findings, provenance, and replay instead of final scores alone.

Self-improving agents

Each note should make a claim about how agent systems improve, fail, or prove progress.

The Self-Improving Stack

A series map for self-improving agent systems, from optimization theory and prompt search to runtime topology, traces, memory, and governance.

Prompt Optimization Is Not The Whole Game

Where GEPA, DSPy, MIPRO, AxLLM, and related prompt optimizers fit inside a larger self-improving agent stack.

Skills Are Trainable State

How SkillOpt, Voyager-style skill libraries, and agent skills turn durable procedure into an optimization surface.

Optimization Theory For Agent Builders

A compact map from hill climbing and Bayesian search to GEPA, SkillOpt, Frontier Tuning, agent runtimes, and noisy promotion gates.

When The Model Itself Is Mutable

How SFT, RLHF, process supervision, tool-use RL, and Microsoft Frontier Tuning differ from public prompt, skill, and harness loops.

Memory Is Not Automatically Learning

How episodic memory, knowledge gates, retrieval evals, negative knowledge, and production trace mining fit into self-improving agent systems.

The Gate Is The Optimizer

Why held-out promotion, judge reliability, failure taxonomies, cost ceilings, and confidence intervals decide whether self-improvement is real.

Beat Random At Equal Compute First

Why best-of-N, self-consistency, verifier reranking, and compute-matched controls are the baseline for agent topology claims.

When The Harness Has To Evolve

Why meta-harness, AlphaEvolve-style code search, worktree isolation, and architecture frontiers matter after prompt and skill tuning plateau.

Topology Is The Missing Action Space

Why multi-agent self-improvement needs explicit runtime primitives for fanout, refine, select, parallelism, supervision, budgets, and replay.

Personas Are Content, Coordination Is Structure

How driver, worker, selector, reviewer, analyst, and coordinator roles become reliable multi-agent systems instead of roleplay.

Self-Improvement Needs A Safety Case

Why prompt injection, sandbox boundaries, eval poisoning, provenance, compliance, and release gates are core to any real self-improving agent stack.