SHEPHERD: Programmable Meta-Agents via Reversible Agentic Execution Traces

Your harness is just another agent.

Simon Yu^*1, Derek Chong^*2, Ananjan Nandi^*2, Dilara Soylu², Jiuding Sun²,
Christopher D. Manning², Weiyan Shi¹

¹Northeastern University ²Stanford University · ^*equal contribution

Figure 1. A meta-agent creates, observes, intercepts, forks, and reverts a worker's execution trace, written as ordinary @agent code. The same substrate powers three meta-agents: runtime intervention, counterfactual optimization, and tree-search RL.

Abstract

As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that create, operate on and manage other agents. Meta-agent operations such as coordinating agents, halting risky actions before execution, or repairing failed runs require runtime manipulation of agentic execution. Yet existing agentic substrates make this difficult: they expose only transcripts and environment snapshots, forcing meta-agents to build ad hoc tooling to reconstruct and operate over full execution state.

Therefore, we introduce SHEPHERD, a Python substrate grounded in functional programming principles, where an agent's execution is itself a first-class object that a meta-agent can easily inspect and transform. Every model action, tool call, and environment change becomes a structured event in a reversible, Git-like execution trace, where any past state can be reverted 5× faster than docker commit and fork. Three example use cases show SHEPHERD's versatility: (1) a supervisor meta-agent prevents conflicts among parallel coding agents, lifting pair-coding pass rate from 28.8% to 54.7% on CooperBench; (2) a counterfactual optimization meta-agent repairs agent workflows by proposing edits and replaying runs from the point of changed behavior, outperforming MetaHarness on Terminal-Bench 2.0 by 12.8% with 58% lower wall-clock; (3) a training meta-agent picks fork points during rollouts to improve credit assignment in long-horizon agentic RL, doubling GRPO's uplift on Terminal-Bench 2.0. We open-source SHEPHERD to enable principled and efficient operations over agentic execution for both users and meta-agents.

The idea: execution becomes data

A meta-agent creates a worker, observes its execution without perturbing it, and intercepts a bad action before it lands, then forks a patched branch and reverts the buggy one. Because the whole run is a reversible, Git-like trace, all of this is ordinary @agent code.

from shepherd import agent, observe, revert, fork

# agent
@agent(LLM="haiku")
def implement(repo, feature):
    "Implement feature in repo"

# meta-agent
@agent(LLM="opus", tool=[observe, revert, fork])
def oversee(agent_run):
    "If tests break, revert the agent and retry"

# meta-agent manages the agent
agent_run = implement(repo, "login")
implemented = oversee(agent_run)

Create Observe Edit Intercept Revert Fork

Try it yourself: fork and rewind your agent

Install Shepherd, drop in the plugin for your coding agent, and every step it takes becomes a commit on a reversible, Git-like trace.

From there the trace is yours to drive: shepherd log shows the steps, and when a run goes sideways, shepherd revert 4 restores the agent and its files to step 4, byte-for-byte, the way git checkout rewinds a repo.

See the GitHub repo for more, and the blog for the full walkthrough.

$ pip install shepherd-ai
$ shepherd init

# add the Claude Code (or Codex) plugin
$ shepherd plugin install claude-code

# run it: every step is a commit
$ shepherd run claude "fix the login bug"

# went wrong? roll back like git
$ shepherd log
$ shepherd revert 4

BibTeX

@misc{yu2026shepherdruntimesubstrateempowering,
  title={Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace},
  author={Simon Yu and Derek Chong and Ananjan Nandi and Dilara Soylu and Jiuding Sun and Christopher D Manning and Weiyan Shi},
  year={2026},
  eprint={2605.10913},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.10913}
}