TL;DR
Claude Code’s bill snowballs because every turn replays the entire session as input tokens, and output tokens cost roughly 5× more than input. Three habits cut my usage by ~60%: a lean CLAUDE.md, a strict .claudeignore, and disciplined use of /clear, /compact, and per-task model routing. Everything else is gravy.
The two facts nobody says out loud
Claude Code is not a chatbot - it’s an agent. Every message it sends carries the whole session as input: every file it read, every command output, every previous turn.
Two consequences:
- The snowball. Message 1 is cheap. Message 50 is expensive even when you type five words - because those five words ride on top of 50 turns of accumulated context.
- Output is 5× input. Verbose answers aren’t just noisy - they’re literally more expensive than the question that produced them.
Once those click, the rest is mechanics.
The three files that actually move the bill
1. CLAUDE.md - the silent tax
CLAUDE.md loads before every single message. All of it. Every turn.
3,000 tokens of project lore × 100 messages = 300,000 tokens of overhead before you’ve typed real work.
Keep it short and structural - not narrative.
Belongs:
- How to run tests (
pytest -v,npm test, etc.) - Which package manager
- 1-2 genuinely non-obvious conventions
- Folders Claude must avoid
Doesn’t:
- Pasted Slack threads or chat transcripts kept “just in case”
- Historical “why we did it this way” essays
- Anything Claude already knows how to do
Target: under 150 lines, bullets only. Mine went from ~800 lines to 85 - and Claude behaved better, not worse. Less noise, clearer signal.
2. .claudeignore - set once, save forever
By default Claude can read everything in the repo. That includes virtualenvs, build artifacts, .pytest_cache/, generated code - useless context, all billed.
Drop a .claudeignore in the root, .gitignore syntax. For a typical Python project:
__pycache__/
*.pyc
*.pyo
.venv/
venv/
env/
*.egg-info/
build/
dist/
.pytest_cache/
.mypy_cache/
.ruff_cache/
.tox/
htmlcov/
.coverage
coverage.xml
*.log
poetry.lock
uv.lock
*.generated.*
For monorepos or multi-package layouts, ignore at both workspace and package level. One-time setup. Every session afterward is cheaper by default.
Bonus: also ignore IDE and editor folders. They’re full of settings files, workspace caches, and machine-specific noise Claude never needs:
.vscode/
.idea/
*.iml
.vs/
.nova/
*.swp
*.swo
.DS_Store
Thumbs.db
3. Use skills, not a fatter CLAUDE.md
Skills are CLAUDE.md-like files that load only when Claude pattern-matches the task to the skill’s description. Ten skills cost ~100 tokens each as metadata on a normal message - the heavy content (~2k tokens) loads only when relevant.
The mechanism: only the name + description sit in Claude’s context every turn (~100 tokens per skill). The body of SKILL.md (often 1-3k tokens) is fetched only when Claude reads your message and decides the description matches the task. Ten skills idle in the background for ~1k tokens total; only the one or two that match actually load.
That makes the description the single most important line in the whole file. Vague descriptions never trigger - the body never gets loaded - the skill is dead weight:
- Bad:
description: Helps with code(matches everything, so it matches nothing reliably) - Good:
description: Reviews Python code for security, performance, and team standards. Use when asked to review code, audit a PR, or check a diff for risks.
Specific verbs and concrete triggers (“review”, “audit a PR”, “check a diff”) give Claude clear signals to match against. Write the description like you’re writing a search query the user would type.
Mid-session: /clear, /compact, and don’t switch models
/clear drops the session history. Treat it as the default move whenever the topic changes. The auth bug you just fixed adds nothing to your next React tweak - but it rides along on every prompt until you clear it. Dead context is weight you’re paying to carry.
/compact summarizes history and replaces it with the summary. Use mid-task when the window fills up. Trick: tell it what to preserve before you run it (“keep the decision to use optimistic locking”), or you’ll lose conclusions that took twenty messages to reach.
Don’t switch models mid-session. Prompt caches are per-model, so when you switch from Sonnet to Opus, the new model has no cache for your history and reads the full transcript as fresh, uncached input - you effectively pay for it twice. If you need Opus for one subtask, spawn a subagent instead. It runs in its own context window with its own cache; only the summary returns, and your Sonnet cache stays warm.
Two environment variables worth setting
# Cap extended thinking - output tokens cost 5x
export MAX_THINKING_TOKENS=10000
# Disable background suggestions/tips
export DISABLE_NON_ESSENTIAL_MODEL_CALLS=1
The first is the big lever. Default thinking budgets can blow tens of thousands of output tokens on Opus. 10k is plenty for normal coding. Use /effort low|medium for routine work and save /effort xhigh for the actually-hard problems.
Model routing in one line
- Haiku - triage, classify, syntax lookups.
- Sonnet - 80-85% of real coding work.
- Opus - genuine architecture decisions, hard multi-file refactors, bugs Sonnet already failed on.
Sonnet on a clean session beats Opus on a bloated one. Every time.
The cheatsheet: a session loop that works
- One task per session.
CLAUDE.mdshort, bullets only..claudeignorecovers build artifacts and lockfiles.MAX_THINKING_TOKENS=10000in shell profile./compactwhen a single task runs long.git commit, then/clearon task switch.- End of session: ask Claude to summarize decisions and next steps. Save the note. Load it tomorrow.
That last step is the underrated one. A 200-token handoff note tomorrow is infinitely cheaper than a 50,000-token zombie context.
The clean way to automate it is a dedicated skill. Drop the file below at .claude/skills/end-of-session/SKILL.md and it’ll fire whenever you wrap up:
---
name: end-of-session
description: Summarizes the current session's decisions, open threads, and next steps into a short handoff note saved to disk. Use when the user is wrapping up a working session, ending a coding day, or asking for a "handoff" or "carry-over" note.
---
# End-of-session handoff
When triggered, produce a short handoff note for the next session.
## What to capture
1. **Decisions made** - one bullet per concrete choice (chose X over Y because Z).
2. **Open threads** - questions or sub-tasks that did NOT get resolved.
3. **Next steps** - 2-5 prioritized items, each one action verb + concrete target.
4. **Touched files** - paths only, no diffs.
## Output format
Write the note to `.claude/handoffs/YYYY-MM-DD-HHmm.md` (UTC). Use the structure:
```markdown
# Handoff - <ISO date>
## Decisions
- ...
## Open threads
- ...
## Next steps
1. ...
## Touched files
- path/to/file
```
## Rules
- Be short and precise. Don't pad, don't restate the obvious - the next session pays input tokens for every line.
- Write for a human reader. The note should be scannable in 30 seconds. No JSON dumps, no copy-pasted logs, no diagnostic noise no human would actually read. Laser focus.
- Keep the whole note under 200 lines.
- No prose recap of the conversation - bullets only.
- If a decision is uncertain, mark it `(tentative)` rather than burying it.
- Do not paste code. Reference files and line numbers instead.
Why this works: the description gives Claude clean verbal triggers (“wrap up”, “end of session”, “handoff”) so it activates exactly when you want it. The body carries the actual prompt - what to capture, where to save, in what format - so you shape the handoff once and never re-explain it. Tomorrow, opening the latest handoff file as your first message loads ~200 tokens instead of replaying a 50,000-token session.
