My first month using Claude Code, the API bill came in at three times what I expected. After digging into what was eating all those tokens and making targeted changes, I cut consumption in half the following month. Here's everything I did.
By the end of this article, you'll know how to identify where your tokens are going, and you'll have concrete commands for the fixes that actually worked: .claudeignore, Plan mode, prompt discipline, MCP server management, and session hygiene.
What Are the 5 Biggest Token Drains in Claude Code?
Before optimizing anything, you need to know what you're fighting. Token waste tends to cluster around five sources.
1. Bloated context reads
Claude Code can attempt to read files you don't need it to touch — node_modules, .git, build artifacts. This is often the single largest waste.
2. Vague prompts causing back-and-forth "Make it look nicer" forces Claude Code to ask clarifying questions. A task that should take one round-trip ends up taking four.
3. Always-on MCP servers Every connected MCP server adds its tool list to your context on every message. Five servers running constantly adds up to hundreds of tokens per turn, before you've said anything.
4. Bloated CLAUDE.md If your project instructions file contains every decision, every background note, and every piece of context from the last six months, Claude Code loads all of it at the start of every session.
5. Long sessions left running Conversation history accumulates. The longer a session runs without a reset, the more tokens each new message costs.
Start with /usage to get a sense of your consumption patterns, then identify which of these is your biggest offender.
How Do You Stop Unnecessary File Reads with .claudeignore?
The highest-ROI fix is adding a .claudeignore file. It works exactly like .gitignore and tells Claude Code which paths to skip entirely.
Create .claudeignore at your project root:
# .claudeignore
# Build artifacts
.next/
dist/
build/
out/
# Dependencies
node_modules/
.pnp/
.pnp.js
# Caches
.cache/
.turbo/
*.tsbuildinfo
# Logs
*.log
npm-debug.log*
# Test output
coverage/
.nyc_output/
# Environment files (security too)
.env
.env.local
.env.*.local
# Database files
*.db
*.sqlite
prisma/migrations/
# Media and binaries
public/images/
*.png
*.jpg
*.gif
*.mp4
Restart claude after saving. In a Next.js project, excluding .next/ alone typically cuts context size by 30–40%. The mindset shift is to exclude everything Claude Code doesn't need to read, not just the obvious stuff. Generated type definitions, test fixtures, documentation that's already in your CLAUDE.md — all fair game to exclude.
How Does Plan Mode Cut Token Consumption in Half?
Plan mode (toggle with Shift+Tab) tells Claude Code to produce a plan without making any changes. This is one of the most effective techniques because it eliminates the biggest source of token waste: trial-and-error execution.
In normal mode, Claude Code will try things, hit errors, and iterate. Each iteration costs tokens. In Plan mode, it first outputs a step-by-step plan — which files it will touch, what changes it will make, in what order. You review the plan, cut anything unnecessary, and only then switch back to normal mode to execute.
# Plan mode workflow
Press Shift+Tab to enable Plan mode
→ Give your task
→ Claude outputs a plan (no files changed)
→ Review and adjust the plan
→ Press Shift+Tab to disable Plan mode
→ Execute with the refined plan in context
For a prompt like "add user authentication," skipping Plan mode means Claude Code dives in, potentially picks the wrong approach, and you're correcting it across five messages. Plan mode surfaces those decisions upfront, before any tokens are spent on execution. The bigger the task, the larger the savings.
How Much Do Vague Prompts Multiply Your Token Costs?
Prompt quality has a direct, measurable effect on token consumption. Consider this example:
Costly prompt:
"Add a login feature"
Claude Code will ask: Which auth library? Cookie-based sessions or JWT? Which directory? What about the UI component? That's four back-and-forth exchanges before any code is written.
Efficient prompt:
"Add Google OAuth login using NextAuth.js v5.
JWT sessions. Implement in /app/auth/.
Add auth guards to the existing middleware.ts."
This resolves in one pass.
The framework I use is to answer the 5W1H before I send anything: What exactly, Where in the codebase, How (which library or pattern), When (any ordering constraints), Who (which user role, if relevant). If I can answer those myself, I write them into the prompt instead of letting Claude Code ask.
One more rule: one task per message. "Add login, write tests, and update the README" sent as one prompt causes Claude Code to hold all of that in context simultaneously. Sending them separately reduces total token cost, counterintuitive as that sounds.
How Do You Run MCP Servers Only When You Actually Need Them?
MCP servers are powerful, but each connected server adds its tool definitions to every message's context. If you have five servers connected, you're paying that overhead on every single exchange — even when you're not using any of them.
Check what you have running:
cat ~/.claude.json | jq '.mcpServers'
The fix is a simple policy: always-on for tools you use every session, everything else disabled by default.
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
},
"postgres": {
"disabled": true,
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
},
"github": {
"disabled": true,
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"]
}
}
}
Enable database or GitHub MCP only when you're doing that kind of work, then disable it again. The per-message savings are small, but they compound over a full day of work.
When Should You Use /compact vs /clear?
Claude Code gives you two commands for managing conversation history. Using them at the right moments makes a meaningful difference on long sessions.
/compact: Summarizes the conversation history to reduce token count while preserving context. The conversation continues, just more efficiently.
/clear: Resets the conversation entirely. Clean slate.
Here's how I decide:
Use /compact when:
- You're still on the same task but the conversation is getting long
- You need the context from earlier in the session
- You're midway through a session (roughly 500+ exchanges)
Use /clear when:
- You're switching to a completely different task
- Previous conversation context is irrelevant
- You're starting a new feature from scratch
- You're resuming work the next day
The failure mode to avoid is letting a long session drift. As conversation history grows, Claude Code tries to stay consistent with everything said earlier — including things that are no longer relevant. Old context can actually degrade response quality while simultaneously increasing cost.
When in doubt, `/compact`. When switching tasks, `/clear`.
## How Do You Keep CLAUDE.md Lean?
CLAUDE.md is loaded into context at the start of every session. If it's 600 lines long, you're paying for those 600 lines before you've even started work.
**Cut this:**
- Historical context ("we decided X because of Y back in November")
- Completed task details
- Links to external docs Claude Code can't access anyway
- Long disclaimers or policy statements
**Keep this:**
- Tech stack (a bulleted list, three lines max)
- Directory structure overview
- The most important coding conventions
- Current active task
Target: **under 200 lines**. If you need more, split the detail into separate files under `docs/` and have Claude Code read them on demand.
```markdown
# CLAUDE.md
## Stack
- Next.js 15 + TypeScript + Tailwind v4
- Prisma + PostgreSQL
- NextAuth.js v5
## Structure
- /app — App Router pages
- /components — UI components
- /lib — Utilities and helpers
## Conventions
- Default to Server Components
- Data fetching via Server Actions or Route Handlers
- Tests with Vitest + Testing Library
## Active Task
- Building user dashboard
- Details: docs/current-sprint.md
The detail lives in docs/current-sprint.md. Claude Code reads it when needed, not on every session start.
How Much Can You Actually Expect to Save?
Here's a realistic breakdown for a developer spending 3–4 hours daily on a mid-sized Next.js project:
| Technique | Token Reduction |
|---|---|
.claudeignore setup | 30–40% |
| Plan mode habit | 20–30% |
| Prompt precision | 15–25% |
| MCP server cleanup | 5–10% |
/compact usage | 10–15% |
| CLAUDE.md trimming | 5–10% |
These compound rather than stack additively, but the real-world result is 40–55% reduction from a baseline of no optimization. "50% reduction" is achievable and not a stretch goal.
Claude Code's cost scales directly with how you use it. Cut the waste, and the same monthly spend gets you twice the productive work.
Wrapping Up
In order of impact, here's what to do:
- Add
.claudeignore— excludenode_modules,.next/, and binaries. Biggest single gain. - Use Plan mode for large tasks — review the plan before any execution happens.
- Make prompts specific — answer the 5W1H yourself before sending.
- Trim your MCP servers — only keep always-on what you use every session.
- Use
/compactmid-session — don't let conversation history accumulate unchecked. - Keep CLAUDE.md under 200 lines — move detail to separate files.
Start with .claudeignore. You'll feel the difference within the first session.