Start using Claude Code and the API bill comes in way higher than expected — it's one of the most common complaints on Reddit. Once you analyze where the tokens are going and make targeted changes, you can cut consumption dramatically. Here's the full playbook.
By the end of this article, you'll know how to identify where your tokens are going, and you'll have concrete commands for the fixes that actually worked: .claudeignore, Plan mode, prompt discipline, MCP server management, and session hygiene.
What Are the 5 Biggest Token Drains in Claude Code?
Before optimizing anything, you need to know what you're fighting. Token waste tends to cluster around five sources.
1. Bloated context reads
Claude Code can attempt to read files you don't need it to touch — node_modules, .git, build artifacts. This is often the single largest waste.
2. Vague prompts causing back-and-forth "Make it look nicer" forces Claude Code to ask clarifying questions. A task that should take one round-trip ends up taking four.
3. Always-on MCP servers Every connected MCP server adds its tool list to your context on every message. Five servers running constantly adds up to hundreds of tokens per turn, before you've said anything.
4. Bloated CLAUDE.md If your project instructions file contains every decision, every background note, and every piece of context from the last six months, Claude Code loads all of it at the start of every session.
5. Long sessions left running Conversation history accumulates. The longer a session runs without a reset, the more tokens each new message costs.
Start with /context to see a visual breakdown of your context usage, and /cost to check your session spend. Between the two, you'll quickly identify which of these is your biggest offender.
How Do You Stop Unnecessary File Reads with .claudeignore?
The highest-ROI fix is adding a .claudeignore file. It works exactly like .gitignore and tells Claude Code which paths to skip entirely.
Create .claudeignore at your project root:
# .claudeignore
# Build artifacts
.next/
dist/
build/
out/
# Dependencies
node_modules/
.pnp/
.pnp.js
# Caches
.cache/
.turbo/
*.tsbuildinfo
# Logs
*.log
npm-debug.log*
# Test output
coverage/
.nyc_output/
# Environment files (security too)
.env
.env.local
.env.*.local
# Database files
*.db
*.sqlite
prisma/migrations/
# Media and binaries
public/images/
*.png
*.jpg
*.gif
*.mp4
Restart claude after saving. In a Next.js project, excluding .next/ alone typically cuts context size by 30–40%. The mindset shift is to exclude everything Claude Code doesn't need to read, not just the obvious stuff. Generated type definitions, test fixtures, documentation that's already in your CLAUDE.md — all fair game to exclude.
How Does Plan Mode Cut Token Consumption in Half?
Plan mode (toggle with Shift+Tab) tells Claude Code to produce a plan without making any changes. This is one of the most effective techniques because it eliminates the biggest source of token waste: trial-and-error execution.
In normal mode, Claude Code will try things, hit errors, and iterate. Each iteration costs tokens. In Plan mode, it first outputs a step-by-step plan — which files it will touch, what changes it will make, in what order. You review the plan, cut anything unnecessary, and only then switch back to normal mode to execute.
# Plan mode workflow
Press Shift+Tab to enable Plan mode
→ Give your task
→ Claude outputs a plan (no files changed)
→ Review and adjust the plan
→ Press Shift+Tab to disable Plan mode
→ Execute with the refined plan in context
For a prompt like "add user authentication," skipping Plan mode means Claude Code dives in, potentially picks the wrong approach, and you're correcting it across five messages. Plan mode surfaces those decisions upfront, before any tokens are spent on execution. The bigger the task, the larger the savings.
How Much Do Vague Prompts Multiply Your Token Costs?
Prompt quality has a direct, measurable effect on token consumption. Consider this example:
Costly prompt:
"Add a login feature"
Claude Code will ask: Which auth library? Cookie-based sessions or JWT? Which directory? What about the UI component? That's four back-and-forth exchanges before any code is written.
Efficient prompt:
"Add Google OAuth login using NextAuth.js v5.
JWT sessions. Implement in /app/auth/.
Add auth guards to the existing middleware.ts."
This resolves in one pass.
The framework I use is to answer the 5W1H before I send anything: What exactly, Where in the codebase, How (which library or pattern), When (any ordering constraints), Who (which user role, if relevant). If I can answer those myself, I write them into the prompt instead of letting Claude Code ask.
One more rule: one task per message. "Add login, write tests, and update the README" sent as one prompt causes Claude Code to hold all of that in context simultaneously. Sending them separately reduces total token cost, counterintuitive as that sounds.
How Do You Run MCP Servers Only When You Actually Need Them?
MCP servers are powerful, but each connected server adds its tool definitions to every message's context. If you have five servers connected, you're paying that overhead on every single exchange — even when you're not using any of them.
Check what you have running with the /mcp slash command inside Claude Code. It lists all connected servers and lets you toggle them on and off during a session.
To add or remove servers from the command line:
# Add a server
claude mcp add postgres -- npx -y @modelcontextprotocol/server-postgres postgresql://localhost/mydb
# Remove it when you're done
claude mcp remove postgres
# List all configured servers
claude mcp list
The fix is a simple policy: keep only the servers you use every session. Add database, GitHub, or other specialized servers when you need them, then remove them when you're done. The per-message savings are small, but they compound over a full day of work.
A good default is zero MCP servers connected. When you need to query the database for analytics, add the Postgres server, finish the task, and remove it. This keeps your per-message token overhead to a minimum.
When Should You Use /compact vs /clear?
Claude Code gives you two commands for managing conversation history. Using them at the right moments makes a meaningful difference on long sessions.
/compact: Summarizes the conversation history to reduce token count while preserving context. The conversation continues, just more efficiently.
/clear: Resets the conversation entirely. Clean slate.
Here's how I decide:
Use /compact when:
- You're still on the same task but the conversation is getting long
- You need the context from earlier in the session
- You're midway through a session (roughly 500+ exchanges)
Use /clear when:
- You're switching to a completely different task
- Previous conversation context is irrelevant
- You're starting a new feature from scratch
- You're resuming work the next day
The failure mode to avoid is letting a long session drift. As conversation history grows, Claude Code tries to stay consistent with everything said earlier — including things that are no longer relevant. Old context can actually degrade response quality while simultaneously increasing cost.
When in doubt, /compact. When switching tasks, /clear.
How Do You Keep CLAUDE.md Lean?
CLAUDE.md is loaded into context at the start of every session. If it's 600 lines long, you're paying for those 600 lines before you've even started work.
Cut this:
- Historical context ("we decided X because of Y back in November")
- Completed task details
- Links to external docs Claude Code can't access anyway
- Long disclaimers or policy statements
Keep this:
- Tech stack (a bulleted list, three lines max)
- Directory structure overview
- The most important coding conventions
- Current active task
Target: under 200 lines. If you need more, split the detail into separate files under docs/ and have Claude Code read them on demand.
# CLAUDE.md
## Stack
- Next.js 15 + TypeScript + Tailwind v4
- Prisma + PostgreSQL
- NextAuth.js v5
## Structure
- /app — App Router pages
- /components — UI components
- /lib — Utilities and helpers
## Conventions
- Default to Server Components
- Data fetching via Server Actions or Route Handlers
- Tests with Vitest + Testing Library
## Active Task
- Building user dashboard
- Details: docs/current-sprint.md
The detail lives in docs/current-sprint.md. Claude Code reads it when needed, not on every session start.
How Much Can You Actually Expect to Save?
Here's a realistic breakdown for a developer spending 3–4 hours daily on a mid-sized Next.js project:
| Technique | Token Reduction |
|---|---|
.claudeignore setup | 30–40% |
| Plan mode habit | 20–30% |
| Prompt precision | 15–25% |
| MCP server cleanup | 5–10% |
/compact usage | 10–15% |
| CLAUDE.md trimming | 5–10% |
These compound rather than stack additively, but the real-world result is 40–55% reduction from a baseline of no optimization. "50% reduction" is achievable and not a stretch goal.
Claude Code's cost scales directly with how you use it. Cut the waste, and the same monthly spend gets you twice the productive work.
How Can Subagent Delegation Distribute Token Usage?
Claude Code has an "Agent" tool that launches subagents in separate processes. This preserves the main context window while delegating research and exploration to separate processes.
When subagents are effective:
- File exploration ("Where is this function used?")
- Cross-file searches
- Dependency investigation
- Test execution and result summarization
Subagents run in separate context windows. If you run a 10-file investigation in the main window, all file contents consume context. Delegating to a subagent means only the summary returns to the main window.
CLAUDE.md example to encourage subagent usage:
# Cost Optimization
- Delegate exploration of 3+ files to subagents
- Run tests via subagents and return results only
- For code reviews, launch parallel subagents per file
Frequently Asked Questions
How much does Claude Code actually cost per month?
It depends on usage. API users pay per token — a heavy day on a mid-sized project can run $5–15. Anthropic also offers Claude Max subscription plans that include Claude Code usage. The optimizations in this article apply regardless of your billing model, because less token consumption means either lower bills or more headroom within your plan.
Does .claudeignore hurt response quality?
No — as long as you're excluding files Claude Code doesn't need. Build artifacts, node_modules, and binary files add noise, not signal. Excluding them actually improves response quality because Claude Code focuses on relevant source files. The only risk is over-excluding: don't ignore source files you want Claude Code to edit.
What's the difference between /compact and /clear?
/compact summarizes your conversation history to reduce token count while preserving context — the session continues with the summary. /clear wipes everything and starts fresh. Use /compact mid-task when the conversation gets long. Use /clear when switching to a completely different task. See the Claude Code Commands Cheatsheet for more built-in commands.
Can I set a monthly spending limit for Claude Code?
Yes. API users can set spend limits in the Anthropic Console under billing settings. You can configure both a hard limit and a notification threshold. Subscription plan users have their usage governed by the plan's rate limits instead.
Do MCP servers consume tokens even when I'm not using them?
Yes — every connected MCP server injects its tool definitions into each message's context. Even if you never call a server's tools, the definitions are still there. That's why disconnecting servers you're not actively using is an easy win.
Is Plan mode slower than just letting Claude Code execute directly?
In wall-clock time, Plan mode adds one extra step — reviewing the plan. But it almost always saves total time because it prevents Claude Code from going down the wrong path and having to backtrack. For small changes (renaming a variable, fixing a typo), skip Plan mode. For anything that touches 3+ files, use it.
How do I know if my CLAUDE.md is too long?
Run /context at the start of a fresh session. If CLAUDE.md is consuming more than 10-15% of your context, it's worth trimming. The CLAUDE.md Design Patterns article covers five patterns for keeping it lean. A good target is under 200 lines, with details split into separate docs/ files that Claude Code reads on demand.
Does using subagents actually save tokens overall?
It depends on the task size. Subagent startup has overhead, so delegating a simple 1-2 file read wastes tokens. But for investigations spanning 3+ files — dependency audits, cross-file searches, test execution — subagents keep the results out of your main context window. Only the summary comes back. Delegating large file explorations to a subagent saves significantly on tokens compared to running them in the main session.
Wrapping Up
In order of impact, here's what to do:
- Add
.claudeignore— excludenode_modules,.next/, and binaries. Biggest single gain. - Use Plan mode for large tasks — review the plan before any execution happens.
- Make prompts specific — answer the 5W1H yourself before sending.
- Trim your MCP servers — only keep always-on what you use every session.
- Use
/compactmid-session — don't let conversation history accumulate unchecked. - Keep CLAUDE.md under 200 lines — move detail to separate files.
Start with .claudeignore. You'll feel the difference within the first session.
Related articles: