Skip to content

Memory channels

Five memory channels live outside the agent. Some are project files the user owns; some are session-local artifacts the harness manages. Together they give a long-running run continuity by treating durable state as files on disk rather than messages the model has to carry between calls. (The fifth — the per-task evaluator ledger — arrived with the structured worker↔evaluator dialogue.)

Channel Lives in Written by Read by
AGENTS.md / CLAUDE.md the workspace (user-owned) the user worker, evaluator, self-improve step (injected when present)
Git history the worktree the harness (one commit per task) humans, evaluator (via diff)
progress.txt sessions/<id>/ (harness-owned) the harness (one line per task outcome) worker (last ~30 lines injected)
prd.json sessions/<id>/ (harness-owned) tilth prep-feature (seed) and the harness (status flips) the harness (task selection); worker (the plan as injected prose context)
Evaluator ledger sessions/<id>/ledger/<task_id>.jsonl the harness (one entry per evaluator call) evaluator (its prior verdicts on this task); worker (the same, on a retry)

The reviewing role is the evaluator.

The worker writes none of these. It writes code in the worktree, which the harness commits. The split is clean: memory channels are inputs to the agents; session artifacts under sessions/<id>/ (events.jsonl, summary.json, proposed-learnings.md) are outputs the harness produces during a run. seed-meta.json is the interview audit trail; a curated slice of it (TL;DR, scope notes, blockers, open questions) is now injected into the worker prompt as context, while the interview bookkeeping stays an output. The full read-it-once picture — every input, every output, and the three artifacts that are both — is laid out in Anatomy of a run; this page zooms in on the input channels.

prd.json and progress.txt used to live in the workspace itself, which leaked harness state into every PR. Phase 1 of the prep-feature work moved them under sessions/<id>/. Your workspace now only ships the things that genuinely belong in the PR — source changes and tests.

Diagram suggestionfive labelled "channels" feeding into a worker bubble at the centre with arrows annotated with the cadence of each: AGENTS.md ("at task start, one-way from user"), progress.txt ("last 30 lines, at task start"), git history ("via evaluator diff"), prd.json ("plan visible to worker as context; mutable status harness-only"), and the evaluator ledger ("the evaluator's prior verdicts on this task"). Reinforces the asymmetry of what the agent sees and the one-way directionality of AGENTS.md.

AGENTS.md — your project conventions

Short markdown. User-owned, user-maintained. Tilth reads it into the worker's user-prompt on every task, into the evaluator's prompt on every evaluator call, and into the self-improvement step's prompt — but never writes to it. Use whatever section headings make sense for your project; we suggest the ones below as a starting template:

# AGENTS.md

## Project
One paragraph describing what this codebase is.

## Language and tooling
Python version, frameworks, test runner, linter, etc.

## Layout
Where things live.

## Style
- Standard library first.
- Type hints on public functions.
- ...

## Patterns
- (Add as you learn what works for this codebase.)

## Gotchas
- (Add as you trip over them.)

AGENTS.md should stay project-focused. It's for project conventions, not harness mechanics:

  • Belongs in AGENTS.md: language version, test framework, file layout, style rules, project-specific gotchas, accumulated learnings.
  • Does not belong in AGENTS.md: "record token counts in events.jsonl" (agent doesn't write that file), "update prd.json status when done" (agent doesn't manage prd), "stop after 32 iterations" (handled by max_iterations_per_task), "don't run dangerous commands" (handled by pre_tool hook), "the evaluator will evaluate your work" (see Agent visibility).

The cleanest test: if you removed a rule from AGENTS.md and the harness still enforced the underlying behaviour, the rule shouldn't be there.

Which file(s) Tilth reads

The channel isn't tied to a single filename. By default Tilth reads AGENTS.md and CLAUDE.md from the workspace root — in that order, concatenated — so a repo that keeps its conventions in CLAUDE.md (Claude Code's convention) is picked up out of the box, not left invisible. Override the list with TILTH_CONTEXT_FILES (comma-separated, first-listed highest priority); only files that exist are injected, and the combined text is capped so the prompt stays legible. Tilth never writes any of them.

Where do learnings go?

After each task, Tilth runs a self-improvement step that asks the worker model whether the task surfaced anything durable worth capturing for later. The output of that step does not land in your AGENTS.md. It lands in sessions/<id>/proposed-learnings.md — a session-local file outside the worktree, never in the PR diff.

The user (and eventually an end-of-session findings hook) is the integrator: read the proposals at session end, decide which (if any) are worth promoting into your AGENTS.md, and merge them by hand. AGENTS.md stays in your voice, growing only when you decide it should.

Git history — atomic commits per task

The worktree branch (session/<id>) gets one commit per completed task. The evaluator sees the cumulative diff against main for each finished task; humans see the same diff at review time.

A failed task lands a FAILED (...) placeholder commit so the partial work is preserved; tilth resume soft-unwinds that placeholder and the retry sees its own previous edits as uncommitted changes (so the evaluator gets a single cumulative diff, not just the new edits).

The branch is never auto-merged. Open a PR and review like any other branch.

progress.txt — the chronological journal

Lives at sessions/<id>/progress.txt. Starts empty when the session is created; the harness appends one line per task outcome. The most recent ~30 lines are injected into each fresh task's prompt so the agent has rolling context — what was just done, what failed, what the cumulative shape of the run looks like.

The agent does not write to progress.txt directly; the harness writes after task done/fail.

prd.json — the task list

This is the work. Lives at sessions/<id>/prd.json. The harness does not plan; the seed (interview output) plans, and tilth prep-feature runs that interview against your codebase to produce the file. See Seeding a session for the full story.

[
  {
    "id": "T-001",
    "title": "Short imperative title",
    "description": "What needs to be done. Be specific. Reference files if useful.",
    "acceptance_criteria": [
      "Concrete, checkable statement.",
      "Another concrete, checkable statement."
    ],
    "status": "pending"
  }
]

The agent never sees this file, its status fields, or the queue-management machinery. The harness reads it to pick the next pending task and writes it to flip status (pendingdone / failed). The agent does, since the visibility expansion, see the whole task list as prose context (every task collapsed, the current one marked) — framed as "context, not work to do" so it understands the shape of the feature without pre-empting later tasks. What stays hidden is the mutable JSON state, not the plan.

Hiding the mutable prd.json state (status fields, the queue machinery) from the agent prevents three real failure modes seen in earlier hand-built loops:

  1. The agent marks its own task done.
  2. The agent skips ahead to a "more interesting" task.
  3. The agent rewrites the queue.

State management belongs in code; the agent works on one task at a time and stops.

Evaluator ledger — the evaluator's per-task memory

Lives at sessions/<id>/ledger/<task_id>.jsonl. One append-only entry per evaluator call ({ts, iter, diff_summary, case, verdict}). It gives the evaluator memory across iterations of a single task — so it can confirm a prior concern was resolved instead of re-litigating it, and escalate when the same rejection category recurs. The last 5 entries are injected into each evaluator call, and (since the visibility expansion) into the worker's prompt as the reviewer's prior verdicts on its current task. Task-scoped and session-local; never crosses sessions; read straight off disk on resume. See The worker↔evaluator dialogue for the full mechanism.

Why the channels live outside the agent

You could imagine baking all of this into the system prompt and letting the agent juggle it. Why not?

  • Context budget. Re-injecting the whole task list every turn gets expensive fast and crowds out the model's working memory for the actual code.
  • Resumability. State outside the agent survives across sessions and provider switches. State inside the agent is gone the moment the conversation resets.
  • Auditability. The channels are flat files in the workspace or session directory. You can git log them, diff them, hand them to teammates, version them. Anything inside the model is opaque.
  • Independence of the evaluator. The evaluator runs in a fresh context across tasks (it carries per-task memory only via the ledger); without external memory channels, it would have nothing to look at except what the worker chose to expose.

See Agent visibility for the full story of which artefacts the worker can and can't see.