4 ways I cut Claude Code token usage after actually seeing my context

After watching my own Claude Code sessions in claude-devtools, I found four token-drain patterns I never would have caught from the terminal — heavy MCPs, lazy @-mentions, probabilistic skills, and monolithic CLAUDE.md files.

The Claude Code terminal shows a three-segment progress bar for context usage. No numbers, no breakdown, no idea which file or tool call is eating your budget. So like most people, I just trusted it — until I built claude-devtools and started watching my own sessions tick token-by-token.

These four patterns surfaced fast once I could actually see what was in the window. None of them are novel best practices — they're floating around the community — but reading about them is one thing. Watching your own session bleed tokens in real time hits different.

1. Heavy MCPs and large files crash the context

Some MCPs return enormous payloads. The TypeScript LSP MCP would routinely dump 10k+ tokens into the context on a single call. Once that happens, Claude effectively loses its mind for the rest of the session — quality drops, follow-ups get confused, costs spike. The same thing happened with context7 MCP responses and with Read calls on massive files (large .tsx files, generated bundles, lockfiles).

Seeing this visualized — a single tool call ballooning the context bar — forced two changes in my workflow:

  • Refactor large files. A 3,000-line component file isn't just a code-quality problem; it's a context-budget problem. Once I started seeing those Read calls light up the token chart, I had a concrete reason to split them.
  • Audit what's getting read under the hood. claude-devtools showed Claude reading lockfiles and generated artifacts that I'd never have noticed in the terminal. Straight into .claudeignore.

If a single tool call is consuming a measurable chunk of your window, fix the source — don't try to compact your way out of it. Per-turn token attribution makes the offender obvious.

2. The hidden cost of lazy @-mentions

I used to skip @-mentioning files when I knew Claude could "figure it out." It's just laziness — Claude, look at the auth flow and fix the bug instead of Claude, look at @apps/web/auth.ts and @apps/web/middleware.ts.

What I didn't realize until I watched a session play out: skipping @-mention forces Claude to use Grep + Read to hunt for the right file. Every Grep returns a chunk of matches. Every Read returns the file's contents. By the time Claude has located the file you meant, you've already paid for 3–5 tool calls plus their outputs.

@-mentioning loads the file's contents directly — no intermediate tool calls, no false matches, no overhead. For tasks that need several specific files, the difference is dramatic. Task completion rate goes up and total token spend goes down at the same time.

The fix: be explicit. If you know the path, type the @-mention. If you don't, use a quick Glob outside Claude first.

3. Skill activation is probabilistic

Custom skills sound like a clean abstraction — Claude is supposed to recognize when a skill applies and invoke it. In practice, it doesn't always do that. Skills get skipped on tasks where they'd obviously help, or they get invoked too late, after Claude has already explored the wrong direction.

Once you can see which skills actually fired in a session (subagents and skills view), the pattern becomes clear: relying on automatic skill matching is hit-or-miss. The token-efficient move is to invoke the skill explicitly from the start: Use the /refactor-component skill on @apps/web/foo.tsx.

This isn't a knock on skills — they're great when invoked. It's a knock on assuming Claude's autonomous skill-matching is reliable enough to be your default.

4. A layered CLAUDE.md beats one giant file

A single 800-line CLAUDE.md at the project root sounds organized. It also gets loaded into context on every single turn. Even when the current task is "fix a typo in the footer," Claude is still carrying the entire global instructions document.

The token-efficient pattern: a layered CLAUDE.md system. Short global file at the root with project-wide rules. Directory-specific CLAUDE.md files inside apps/, packages/, infra/ — each one short, scoped, and only relevant when work happens in that subtree. Claude Code resolves these layer-by-layer.

The visualization that drove this home for me was the context breakdown chart — the "CLAUDE.md" segment used to be one of the larger ones turn after turn. Splitting it dropped that segment to almost nothing on most turns, with directory layers only loading when relevant.

How to find your own patterns

None of this is invented. The lessons sit in blog posts, Discord threads, and HN comments. But reading them is abstract. Watching your own session in claude-devtools — seeing the token bar grow with each MCP call, each unnecessary Read, each redundant CLAUDE.md reload — turns "best practice I read once" into "thing I'll never do again."

The setup is one command:

brew install --cask claude-devtools

Open any session you've run, find the heaviest turn, hover the token breakdown. The pattern that's costing you the most will be obvious within a minute. (Why did Claude forget? walks through the same flow for compaction-driven memory loss.)

FAQ · 07

Questions,
answered.

Common questions about claude-devtools, session transcripts, and Claude Code logs on disk.

  1. Find the biggest per-turn consumers first. claude-devtools breaks down each turn across CLAUDE.md, skills, @-mentions, tool I/O, thinking, team overhead, and user text. Once you can see which category is dominating, you can act — refactor large files, add entries to .claudeignore, switch lazy prompts to explicit @-mentions, split a monolithic CLAUDE.md into layered directory files, or invoke skills explicitly instead of hoping for auto-detection.

  2. Three usual suspects: heavy MCP responses (some MCPs return 10k+ tokens per call), Read calls on large files (long components, lockfiles, generated artifacts), and a monolithic CLAUDE.md loaded on every turn. The terminal's progress bar hides which one is at fault. claude-devtools' per-turn token attribution shows each category as an explicit number.

  3. Yes, often substantially. Without @-mention, Claude uses Grep + Read to locate the file you meant — each step adds its own tool I/O. @-mentioning loads the file content directly with no intermediate tool calls. For multi-file tasks, the savings compound.

  4. Layered. A single large CLAUDE.md is loaded into context on every turn even when most of it is irrelevant to the current task. Splitting into a small project-root file plus directory-specific CLAUDE.md files in apps/, packages/, etc. keeps the loaded-per-turn portion small and scoped.

  5. Sometimes, but not reliably. Automatic skill matching is probabilistic — Claude may skip a skill that would obviously help, or invoke it after exploring the wrong direction first. For token efficiency, invoke skills explicitly in your prompt rather than relying on auto-detection.

  6. A .claudeignore file (similar to .gitignore) tells Claude Code to skip listed paths when reading or globbing. Good candidates: lockfiles (pnpm-lock.yaml, yarn.lock), generated bundles, build artifacts, large auto-generated TypeScript output, vendor directories. If you see Claude Reading these files in claude-devtools and they don't matter for your task, add them.

  7. In claude-devtools, expand the relevant tool call — MCP responses render with their full payload and token cost. The per-turn token breakdown attributes tool I/O as a separate category, so a turn dominated by tool I/O usually points to a heavy MCP response or a large Read.

GitHub

On this page