Skip to content

Running the demo

The demo workspace is deliberately almost empty — just an AGENTS.md (the project's conventions) and a .gitignore. It exists mainly as a git repo, which is all Tilth needs to do what it always does: branch off a worktree and build inside it. The path mirrors what a real first-time user does — seed a task list with tilth prep-feature, then run it. Nothing is pre-baked; the todo CLI gets built from scratch during the run.

Clone the demo workspace

Path used on this page. Commands below use ~/projects/tilth-demo as an illustrative location. Tilth doesn't care where the workspace lives — the path is just a CLI argument — so substitute any directory that matches your setup. Treat the demo repo as a stand-in for your own.

git clone git@github.com:AlteredCraft/tilth-demo-todo-cli.git ~/projects/tilth-demo

Seed a task list

Tilth's task list (prd.json) and the matching acceptance tests come from an interview the harness runs against your codebase. Kick it off:

uv run tilth prep-feature ~/projects/tilth-demo
The interview prompts you for a one-line brief (try: "build a minimal todo CLI with add, list, and done subcommands, on-disk format - [ ] item in TODOS.md"), then asks a few targeted questions to slice the work and lock acceptance criteria. The output is harness-owned and lands on the Tilth side, not in the demo repo: the task list at <tilth-clone>/sessions/<id>/prd.json, plus one test_t<NNN>_*.py per task under that session's worktree at <tilth-clone>/sessions/<id>/workspace/tests/ (on branch session/<id>). Your demo checkout stays as empty as it started. See Seeding a session for the full interview-engine story.

Filesystem trees for one Tilth run: HARNESS SIDE under ~/projects/tilth/sessions/<id>/ holds workspace/, events.jsonl, summary.json, checkpoint.json, chat.html; TARGET REPO SIDE under ~/projects/tilth-demo/.git/ holds refs/heads/session/<id> and worktrees/<id>/. A sage-green arrow labeled 'git worktree binds these' connects the workspace/ on the left to the worktrees/<id>/ admin entry on the right.

Where a session's state lives. Everything the harness writes — prd.json, the seeded tests under workspace/, the event log — sits under your Tilth clone (sessions/<id>/); only the session/<id> branch and its worktree admin entry live in the demo repo's .git. Full breakdown in Session layout.

You can preview what a finished seed for this codebase looks like by reading examples/seed-reference/todo-cli/ in the Tilth repo — same project, a hand-crafted reference.

Run a session against the demo

uv run tilth run ~/projects/tilth-demo

What happens, end-to-end:

  1. Tilth verifies the path is a git repo on a clean main.
  2. Creates a worktree of the demo repo. The working tree lives at <tilth-clone>/sessions/<id>/workspace/ (inside Tilth, gitignored); the new branch session/<id> is registered in the demo repo's .git. The two halves live in different places by design — see Session layout for the why.
  3. Loops through pending tasks in prd.json. For each task:
    • Reset context. Prompt = system + the feature plan (as context) + AGENTS.md + recent progress + this task (and, on a retry, the evaluator's prior verdicts on it).
    • Tool-loop with the worker model (bash, file ops, search) until it calls submit_case to present its finished work.
    • Run ruff + pytest in the worktree. Failures get fed back into the loop.
    • Evaluator model reviews the case + diff in a fresh context (it also sees this task's prior verdicts). Rejections get fed back.
    • Self-improvement prompt — the worker considers whether the task surfaced a durable observation worth proposing. Any proposal lands in sessions/<id>/proposed-learnings.md (not in your repo) for end-of-run review.
    • Commit on the worktree branch. Append to progress.txt. Mark the task done in prd.json.
  4. Stops on: all tasks done, iteration cap, wall-clock cap, token cap, evaluator-call cap, or a terminal failure (e.g. a provider returning empty responses, or the worker never presenting a case).

You can interrupt at any point with Ctrl-C. Ctrl-C and cap hits (iteration, wall-clock, token) all leave the run in a resumable state — see Resuming a session to pick it back up. Of the three caps, only the token cap needs attention before you resume: the cumulative token total carries across resumes, so if TILTH_MAX_TOKENS is what stopped the run, raise it in .env first or tilth resume trips it again on the first check. The wall-clock budget resets per resume, and the iteration cap is per-task (a retried task starts counting from one), so neither blocks a resume unless the work genuinely needs a bigger budget — see What resume does.

What you should expect to see

The console streams every tool call as it happens. The per-task loop has the shape below:

Six rounded boxes arranged left to right depicting one task's lifecycle inside Tilth's harness: PROMPT (a stack of three document icons representing AGENTS.md, progress.txt, and the task; caption "fresh context built from disk"); TOOL LOOP (a wrench-and-file glyph encircled by a loop arrow, with monospace tool labels bash, read_file, edit_file, grep; caption "worker iterates until it stops"); VALIDATORS (a checkmark over a terminal prompt, labels ruff and pytest; caption "objective gate"); JUDGE (a balance scale; caption "subjective gate, fresh context"); SELF-IMPROVE (a notebook with a sage-green bookmark ribbon; caption "propose a learning (optional)"); COMMIT (a git-branch glyph with a single new-commit dot; caption "one task = one commit"). Two label-bars span the top: "WORKER SEES" over PROMPT and TOOL LOOP, "HARNESS ONLY" over the remaining four boxes. Sage-green forward arrows connect each box to the next; two thinner sage-green feedback curves return to TOOL LOOP from VALIDATORS (labelled validator_failed) and from JUDGE (labelled evaluator_rejected).

One task's lifecycle inside the harness. The worker sees the Prompt and the Tool Loop; the Self-Improve step and the cross-task evaluation machinery stay harness-side. Failed validators or a rejected evaluator verdict feed back into the Tool Loop for another iteration.

A clean run ends with every task in prd.json marked done and a commit-per-task on the session/<id> branch. When the loop doesn't track this cleanly, watch for these patterns:

  • A task spinning is signalled by the same files being read and re-written across iterations. If it happens, kill the run and rewrite the task description before retrying.
  • Validator feedback loops show as repeated validator_failed → next iteration patterns. A handful is normal; a long string usually means the test suite or the lint config is misaligned with the agent's idea of "done."

After the run

Once every task in prd.json is done, the harness closes out the final task and prints all tasks complete followed by a run summary:

A three-region diagram of Tilth's end-of-session state. Left region, under the label "ON THE SESSION BRANCH": a vertical stack of five rounded rectangles, each a monospace task id with a checkmark — T-001 through T-005 — with the italic caption "tasks done · one commit each". Centre region: a rounded panel titled "RUN SUMMARY" in bold sans-serif all caps, with four monospace key/value rows — session 20260525-103149-3800ea, duration 6m10s, tokens 412,800, tasks total=5 done=5 failed=0 pending=0 — and the italic caption "harness reports out". Right region, under the label "WRITTEN UNDER sessions/<id>/": a vertical stack of four document-icon chips, each a monospace filename with a short italic role note — events.jsonl ("full audit trail"), summary.json ("rolled-up snapshot"), checkpoint.json ("resume footing"), proposed-learnings.md ("review when ready"). A sage-green arrow runs from the task stack into the RUN SUMMARY panel; a second sage-green arrow curves from the panel up into the right-hand stack, labelled "everything one run leaves on disk".

A clean ending. Every task is committed on the session branch (left); the run summary tallies what happened (centre); and the artifacts the run wrote under sessions/<id>/ — the event log, the rolled-up summary, the resume checkpoint, and any proposed learnings — sit on the right, outside the worktree for you to read, resume from, or review by hand. AGENTS.md is never touched by the run.

To inspect what just got committed:

cd ~/projects/tilth-demo
git log session/<id> --oneline
git diff main..session/<id>

Each task is one commit. If you like the work, merge it into main like any other branch; if not, delete the branch. The harness never auto-merges. (You can also use Resetting a session to throw away the worktree, branch, and the harness's session directory in one shot.)

The session log lives at <tilth-clone>/sessions/<id>/events.jsonl — every model call, tool call, validator run, evaluator verdict, and proposed-learning verdict is recorded (see Session layout → Event types for the full taxonomy). Alongside it, sessions/<id>/summary.json carries a rolled-up snapshot (token totals, per-task iteration counts, tool histogram, hook outcomes, evaluator accepts/rejects with rejection categories) refreshed at every task boundary — read that when you want a quick stat without jq-ing the full log.

For a more readable view of a finished run, see Visualizing a session.