Resume mechanics¶
tilth resume wakes a session and re-enters the outer loop (the legacy --resume flag still works for one minor version). Three things happen on wake:
Session.wake()readscheckpoint.jsonand reconstructstokens_used,workspace,branch.started_atis reset totime.time()(wall-clock budget is per-resume)._prepare_resume()reads the trailingstopevent fromevents.jsonlto learn how the previous run ended, then:- If
last_stop == "all_done", no-op (besides logging). - Otherwise, any task in
prd.jsonwithstatus == "failed"is flipped back to"pending"andws.unwind_failed_commit()soft-resets theFAILED (...)placeholder commit so the partial work returns to the index. Without that soft-reset, the evaluator'stask_diff(HEAD vs working tree) would only see new edits on the retry, not the cumulative work — incorrect evaluation.
- If
- A
session_resumeevent is logged with the structured plan:last_stop,retried,pending,unwound_commit, and a one-line summary. This is the parallel ofsession_startfor resumes; both transitions are auditable fromevents.jsonlalone.
Per-task ledgers survive resume. sessions/<id>/ledger/<task_id>.jsonl are plain append-only files under the session root, not in the worktree or the live conversation. Session.wake() re-roots and read_ledger reads off disk, so a resume picks up the prior run's evaluator verdicts — re-injected into both the evaluator prompt and the worker prompt under "## Prior iterations on this task". The first run after a resume therefore sees what was rejected before, even though the conversation is gone. tilth reset drops sessions/<id>/, so it discards ledgers too. See The worker↔evaluator dialogue.
Bare tilth resume (no session ID) selects the most recent session in sessions/ by directory name (the timestamp prefix sorts chronologically). Explicit tilth resume <session_id> is unchanged.
Diagram suggestion — sequence diagram:
tilth resumeinvocation →Session.wake()reads checkpoint →_prepare_resume()reads trailingstopevent → unwinds FAILED placeholder if any → flips failed task back to pending → logssession_resumeevent → outer loop starts. Lifeline lanes forcheckpoint.json,events.jsonl,prd.json, and the worktree git database.
Resume does not loop endlessly. If a retried task hits a terminal-failure stop again — iter_cap, evaluator_cap, empty_responses, or no_case — the outer loop halts with that stop {reason} just like the original run; the next tilth resume would retry once more. The retries are recursive in invocation, not in mechanism — each one is just a fresh ride through the same loop.
Resumable-session detection¶
When you run uv run tilth run <workspace> and there's no prepared session to pick up, _find_resumable_session() scans sessions/ newest-first and looks for a directory whose session_start.source matches <workspace>, whose last stop.reason is anything other than all_done (or has no stop event at all — covers crashes that died before logging), and whose checkpoint status is not prepared (prepared sessions are picked up directly by tilth run without a warning). If a resumable session exists, the harness prints a heads-up listing the tilth resume / tilth reset recovery commands and pauses 5 seconds before calling Session.new(). Ctrl-C during the pause returns 130 cleanly.
The detection is read-only — no files modified, no state mutated. It exists purely to surface that a fresh run will silently abandon resumable progress, which is the failure mode the iteration loop ("halt → tweak → continue") inadvertently optimises for.