Anatomy of a run¶
The overview tells you who does what — Brain reasons, Hands act, Session records. This page tells you what flows through: the artifacts the per-task loop reads to do its work, and the ones it leaves behind.
A Tilth run is, in the end, a function over files. It reads a fixed set of inputs, turns the loop once per task, and writes a fixed set of outputs — and three of those outputs feed straight back in as the loop's working memory. Getting that shape in your head first makes the deep dives easier to place.
A run as a function over files: the per-task loop reads its inputs (left), turns once per task, and writes its outputs (right). Three artifacts — prd.json, progress.txt, and the evaluator ledger/ — are written out and read back in as the loop's working memory (the lower arc).
The split below mirrors that diagram: pure inputs, the loop, pure outputs, and the three artifacts that are both.
Inputs — what the loop reads¶
Read-only, from the loop's point of view. Most are seeded before the run — by you (AGENTS.md) or by tilth prep-feature (seed-meta.json, the acceptance tests). The loop consumes them; it doesn't edit them.
| Artifact | Lives in | Read by | Carries |
|---|---|---|---|
system.md |
tilth/prompts/ (harness) |
worker (its system prompt) | the worker's role, tool guidance, "done" criteria |
AGENTS.md |
the workspace (user-owned) | worker, evaluator, self-improve step | your project's conventions |
seed-meta.json (a slice) |
sessions/<id>/ |
worker (curated slice only) | the interview's TL;DR, scope notes, blockers, open questions |
test_t<NNN>_*.py |
the worktree (workspace/tests/) |
pytest (the floor); evaluator (inlined); worker (on disk) | the per-task acceptance bar |
| the source repo | the worktree | worker, via Hands | the code the run changes |
The instructions (system.md, AGENTS.md) and the plan-derived context (the seed-meta.json slice, the acceptance criteria) are assembled into each fresh task prompt. What the worker actually sees, versus what stays harness-only, is its own subject — see Agent visibility. For the input channels in depth, see Memory channels.
The loop — what turns¶
Between read and write sits the loop you came here to run. Its mechanics live elsewhere — The two loops for the Ralph (outer) / tool-use (inner) split — but the per-task shape is:
One task's lifecycle. The worker sees the prompt and the tool loop; the validators, evaluator, self-improve step, and commit are harness-side.
This page stays out of those internals. What matters here is that each turn reads the inputs above and produces the outputs below.
Outputs — what the loop writes¶
Written by the harness as the run proceeds. The worker writes none of them directly — it writes code, which the harness commits. They exist for the human reading the run afterwards, and for tilth resume / tilth visualize.
| Artifact | Lives in | Written | Read afterwards by |
|---|---|---|---|
commits on session/<id> |
the source repo's .git |
one per accepted task | humans (review + merge); evaluator (via the diff) |
events.jsonl |
sessions/<id>/ |
append-only, every step | humans; tilth visualize; tilth resume |
summary.json |
sessions/<id>/ |
rebuilt at each task boundary | humans; the visualizer; external consumers |
checkpoint.json |
sessions/<id>/ |
as the run proceeds (it carries the running token total, so it rewrites after each model call) | tilth resume |
proposed-learnings.md |
sessions/<id>/ |
by the self-improve step, when it has something | you, at end of run |
Where each lives on disk — and the full events.jsonl event taxonomy — is in Session layout. The branch is never auto-merged; you review and merge it like any feature branch.
One artifact the loop doesn't write is chat.html: tilth visualize renders it on demand from events.jsonl after a run. It's an out-of-band step, not part of the loop — which is why it sits outside both this table and the diagram.
Working memory — the artifacts that are both¶
Three artifacts are written by the loop and read back by the next turn. They are how a run keeps continuity when each task starts from a fresh context, and across tilth resume in a brand-new process: the state lives on disk, not in the model's head.
| Artifact | Lives in | Written | Read back by |
|---|---|---|---|
prd.json |
sessions/<id>/ |
tilth prep-feature seeds it; the harness flips status per task |
the harness (next-task selection); the worker sees the plan as prose context |
progress.txt |
sessions/<id>/ |
one line per task outcome | the worker (last ~30 lines, next task) |
ledger/<task_id>.jsonl |
sessions/<id>/ |
one entry per evaluator call | the evaluator (its prior verdicts, next iteration); the worker (on a retry) |
This is the loop's durable working memory — the sage-green arc in the diagram. It is also why a run survives interruption: stop it at any point and the next process reads these three back and picks up where it left off. See Resume mechanics.
The worker never reads
events.jsonl,summary.json,checkpoint.json, or its own token counts. Those are outputs about the run, for the human — not inputs to it. Keeping them out of the loop is invariant 2; the honest scope of that boundary (a determined worker withbashcan still reach them via relative paths) is in Agent visibility.
See also¶
- Architecture overview — who does what (Brain / Hands / Session).
- Memory channels — the input channels in detail.
- Session layout — where the outputs live on disk, plus the event taxonomy.
- The two loops — what the loop in the middle actually does.

