SHEPHERD: Programmable Meta-Agents via Reversible Agentic Execution Traces

Your harness is just another agent.

Simon Yu*1, Derek Chong*2, Ananjan Nandi*2, Dilara Soylu2, Jiuding Sun2,
Christopher D. Manning2, Weiyan Shi1

1Northeastern University   2Stanford University  ·  *equal contribution

CHATS-Lab Northeastern University Stanford NLP Group
Figure 1. A meta-agent creates, observes, intercepts, forks, and reverts a worker's execution trace, written as ordinary @agent code. The same substrate powers three meta-agents: runtime intervention, counterfactual optimization, and tree-search RL.

Abstract

As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that create, operate on and manage other agents. Meta-agent operations such as coordinating agents, halting risky actions before execution, or repairing failed runs require runtime manipulation of agentic execution. Yet existing agentic substrates make this difficult: they expose only transcripts and environment snapshots, forcing meta-agents to build ad hoc tooling to reconstruct and operate over full execution state.

Therefore, we introduce SHEPHERD, a Python substrate grounded in functional programming principles, where an agent's execution is itself a first-class object that a meta-agent can easily inspect and transform. Every model action, tool call, and environment change becomes a structured event in a reversible, Git-like execution trace, where any past state can be reverted 5× faster than docker commit and fork. Three example use cases show SHEPHERD's versatility: (1) a supervisor meta-agent prevents conflicts among parallel coding agents, lifting pair-coding pass rate from 28.8% to 54.7% on CooperBench; (2) a counterfactual optimization meta-agent repairs agent workflows by proposing edits and replaying runs from the point of changed behavior, outperforming MetaHarness on Terminal-Bench 2.0 by 12.8% with 58% lower wall-clock; (3) a training meta-agent picks fork points during rollouts to improve credit assignment in long-horizon agentic RL, doubling GRPO's uplift on Terminal-Bench 2.0. We open-source SHEPHERD to enable principled and efficient operations over agentic execution for both users and meta-agents.

The idea: execution becomes data

A meta-agent creates a worker, observes its execution without perturbing it, and intercepts a bad action before it lands, then forks a patched branch and reverts the buggy one. Because the whole run is a reversible, Git-like trace, all of this is ordinary @agent code.

Meta-agent Create Agent Observe LLM-call tool-call ! edit (buggy) ✗ fail Intercept Revert Fork edit (patched) LLM-call ✓ pass
from shepherd import agent, observe, revert, fork
# agent
@agent(LLM="haiku")
def implement(repo, feature):
    "Implement feature in repo"
# meta-agent
@agent(LLM="opus", tool=[observe, revert, fork])
def oversee(agent_run):
    "If tests break, revert the agent and retry"
# meta-agent manages the agent
agent_run = implement(repo, "login")
implemented = oversee(agent_run)
Create Observe Edit Intercept Revert Fork

Try it yourself: fork and rewind your agent

Install Shepherd, drop in the plugin for your coding agent, and every step it takes becomes a commit on a reversible, Git-like trace.

From there the trace is yours to drive: shepherd log shows the steps, and when a run goes sideways, shepherd revert 4 restores the agent and its files to step 4, byte-for-byte, the way git checkout rewinds a repo.

See the GitHub repo for more, and the blog for the full walkthrough.

$ pip install shepherd-ai
$ shepherd init

# add the Claude Code (or Codex) plugin
$ shepherd plugin install claude-code

# run it: every step is a commit
$ shepherd run claude "fix the login bug"

# went wrong? roll back like git
$ shepherd log
$ shepherd revert 4

BibTeX

@misc{yu2026shepherdruntimesubstrateempowering,
  title={Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace},
  author={Simon Yu and Derek Chong and Ananjan Nandi and Dilara Soylu and Jiuding Sun and Christopher D Manning and Weiyan Shi},
  year={2026},
  eprint={2605.10913},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.10913}
}