Claude Code /goal: Set the Finish Line, Walk Away

The /goal command sets a completion condition and Claude works until a separate evaluator says it's done. How to write goals that actually finish.

By Mor Dabastany • May 28, 2026 • 1636 words • 8 min

TL;DR

/goal is the Claude Code command that says “keep going until this condition is true.” You type a measurable end state - pytest exits 0, every call site migrated, queue empty - and after every turn a small fast model (Haiku) reads the transcript and decides whether to stop or push Claude into another turn. You don’t prompt between steps. The goal clears itself when the condition holds.

It’s the opposite of micromanaging an agent. You write the finish line once, you walk away, you come back to either a done task or a /goal panel telling you what’s still missing.

What /goal actually is

Three approaches keep a Claude Code session running between prompts. They differ by what starts the next turn:

Approach	Next turn starts when	Stops when
`/goal`	The previous turn finishes	A separate Haiku call confirms the condition is met (or marks it impossible)
`/loop`	A time interval elapses	You stop it, or Claude declares done
Stop hook	The previous turn finishes	Your own script or prompt decides

/goal is a session-scoped shortcut for the Stop-hook pattern. You don’t edit settings, you don’t write a script. You type one line against a project laid out like this:

myapp/
├── pyproject.toml
├── src/
│   └── myapp/
│       ├── __init__.py
│       └── auth/
│           ├── __init__.py
│           ├── tokens.py
│           └── sessions.py
└── tests/
    └── auth/
        ├── conftest.py
        ├── test_tokens.py
        └── test_sessions.py

Then:

/goal all tests under tests/auth/ pass with `pytest tests/auth -q` exiting 0,
and `ruff check src/myapp/auth tests/auth` reports no findings.
Stop after 20 turns.

Setting a goal starts a turn immediately, using the condition itself as the directive. No second prompt. While active, a Goal active panel shows elapsed time, turn count, token spend since the goal began, and the evaluator’s most recent reason for not stopping yet. Run /goal with no argument any time to see it:

Goal active
running 6m 41s · 4 turns · 38.2k tokens

Goal: all tests under tests/auth/ pass with `pytest tests/auth -q`
      exiting 0, and `ruff check src/myapp/auth tests/auth` reports
      no findings. Stop after 20 turns.

Last check: pytest tests/auth -q still reports 1 failure in
            test_tokens.py::test_refresh_token_extends_by_ttl
            (refresh_token returns ttl - 1, off-by-one in
            tokens.py:2). ruff has not been re-run since the last
            edit.

/goal clear to stop early

To stop early: /goal clear (aliases: stop, off, reset, none, cancel).

How evaluation actually works

The mechanics are simpler than the framing suggests:

After each turn, the condition plus the conversation so far are sent to a small fast model (Haiku).
The evaluator returns a JSON verdict: {"ok": true, "reason": "..."} if the condition is met, {"ok": false, "reason": "..."} if not, or {"ok": false, "impossible": true, "reason": "..."} if the condition can never be satisfied in this session.
ok: false feeds the reason back into Claude as guidance and forces another turn.
ok: true clears the goal and records an “achieved” entry in the transcript.
impossible: true clears the goal too - it’s the evaluator’s way of saying “stop wasting tokens, this can’t be reached.” Use it as a signal in the panel, not a guarantee.

The critical constraint: the evaluator does not call tools. Its system prompt says verbatim “You have NO tools available - you cannot read files, run commands, search, or take any actions.” It judges only what Claude has already surfaced. If you want it to know tests pass, Claude has to actually run the tests and let the output land in the transcript. A claim that “tests pass” without the run output is not evidence.

That’s the whole trick. The “autonomy” is one main-model turn plus one cheap evaluator call, looped.

Evaluator tokens are billed on the small fast model, so the per-turn overhead is negligible compared to main-turn spend.

What a good condition looks like

The shape that consistently works: one measurable end state, a stated check, constraints that must hold. In practice:

/goal Every call site of get_user_sync is migrated to fetch_user.
Proof: `grep -rn get_user_sync src/` returns zero matches and `pytest` exits 0.
Constraint: no test file is modified.
Or stop after 25 turns.

Four things in there:

End state - all call sites migrated.
How Claude proves it - the grep and pytest invocations whose output will land in the transcript.
What must not change - no test file edits.
A budget clause - bound the runtime so you don’t come back to 200 turns of thrash.

The condition field accepts up to 4,000 characters. Use them. The vaguer the condition, the more the evaluator has to guess.

Tokens add up fast - the budget clause is a circuit breaker

There is a built-in cap, but it’s lower than people expect. /goal works by blocking the Stop hook; Claude Code refuses to keep blocking forever, and the limit is governed by the CLAUDE_CODE_STOP_HOOK_BLOCK_CAP environment variable (default: 8). After that many forced continuations, the loop ends regardless of whether the evaluator is satisfied. So out of the box, a /goal run is bounded to roughly 8-9 turns. Raise the variable and you raise the ceiling.

That default sounds safe until you do the math per turn. Every turn is a full main-model call. With Opus, a single turn against a large repo can cost a couple of dollars. Even the default 8-turn ceiling can run real money on a big codebase; raise the cap to 100 and a thrashing goal becomes a bill you find Monday morning. The evaluator itself is cheap (Haiku, cents per check). The expensive part is the main-model turns the evaluator keeps authorising. (If unattended token spend is a worry in general, Claude Code on a token diet is the companion piece on keeping the bill down.)

So write the budget clause as if you’ve raised the cap, because most serious uses do:

Turn cap inside the condition: Stop after 20 turns.
Wall-clock cap: Stop if more than 30 minutes have elapsed.
Stop-on-no-progress: If two consecutive checks return the same failing reason, stop and report.
Headless runs (claude -p): also pass --max-turns N. The CLI flag hard-kills the process regardless of what the evaluator decides, so it’s a second layer of defense if the condition’s own clause gets ignored or misread.

Rule of thumb: if you can’t say out loud the worst-case cost of this goal running unattended overnight, don’t start it.

The trap: mechanical pass vs actually correct

The uncomfortable lesson with any goal-driven agent: it optimises exactly what the condition measures, and nothing else. “Build exits 0” tells you the compiler is happy, not that the thing is good. “Tests pass” tells you the assertions you wrote still hold, not that you wrote the right assertions. If your condition is purely mechanical, you’ll get the cheapest possible artifact that satisfies it - a stub function whose only job is to make the test green, a renderer that draws three pixels because the headless smoke test only checked that the window opened.

Two ways out:

Anchor the condition to a spec, not adjectives. “Matches the acceptance criteria in docs/PRD.md sections 2-4” gives the evaluator something concrete to grade against.
Pair /goal with a cross-model reviewer. Some users wire in Codex (or another vendor’s model) as the adversarial reviewer step inside the goal. Different training, different blind spots. Anthropic’s evaluator already runs on a separate (Haiku) model; cross-vendor review is just scaling that idea.

If you only remember one thing from this post: a condition that’s only mechanical is a condition you’ve under-specified.

When /goal earns its keep

The sweet spot is substantial work with a verifiable end state:

Migrating a module to a new API until every call site compiles and tests pass.
Implementing a design doc until each acceptance criterion holds.
Splitting a large file into focused modules until each is under a size budget.
Working a labeled-issue backlog until the queue is empty.

Pair it with acceptEdits permission mode (Shift+Tab in the TUI, or --permission-mode acceptEdits on the CLI) and you remove two layers of per-step prompting at once: the permission mode approves the tool calls inside a turn, /goal decides whether to start another turn. Two unrelated controls, complementary effects.

/goal also works in headless mode, which is the part that turns it into infrastructure (the same claude -p headless trick that turns a subscription into a scriptable agent):

claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"

Wire that into a Friday cron. Come back Monday to either a populated changelog or an interrupted run with a clear “why not” in the transcript.

When not to use /goal

Exploratory or research tasks. “Figure out why X is slow” has no measurable end state. Use a normal session.
Conditions that need tool-side evidence the model won’t surface. If the proof lives in an external dashboard the evaluator can’t see, the evaluator can’t grade it.
Anything where you’d be uncomfortable with an unsupervised 20-turn run. The 8-block default cap is your only safety net out of the box, and most non-trivial goals push you to raise it. Once you do, the condition is the only governor. If you can’t write one tight enough, don’t set one.
Workspaces where you haven’t accepted the trust dialog, or where hooks are disabled at any settings level. The evaluator is a hook; no hooks, no /goal. The command will tell you which check is blocking it.

The short version

/goal is not a magic “do my work” button. It’s a way to move the conversation from “what should I do next?” to “is the condition met yet?”, and to let a cheap, separate model answer that question. Write the condition well and you get autonomous progress on real work. Write it lazily and you get cheap, hollow compliance.

The command is free. The condition is the whole product.