· billing anomaly · april 2026 ·

Claude Code invisible token burn (April 2026): what's happening and how to detect it

Published April 20, 2026 · by Septim Labs · 7 min read

TL;DR

Multiple users on r/ClaudeAI and the Anthropic community forum describe roughly 20,000 extra tokens per request appearing in their billing logs that they never sent.
This inflates session costs without any visible change in the Claude Code interface — your /context readout and your actual bill tell different stories.
Four runnable detection patterns are below. If your API-side token counts are diverging from local ccusage estimates, you are likely affected.

1. What's happening

Starting in early April 2026, a cluster of developers began noticing that their Anthropic API spend was climbing faster than their actual work could explain. The divergence wasn't random — it was consistent, roughly proportional to request volume, and invisible in the Claude Code UI.

Coverage from Efficienist (April 2026) described the pattern as server-side token inflation: requests were arriving at Anthropic's API carrying a significantly larger token footprint than the client-side session data accounted for. The publication noted that affected users could reproduce the gap reliably by comparing their local ccusage totals against their Anthropic dashboard in the same 15-minute window.

DevClass (April 1, 2026) reported that developers were observing "consistent billing anomalies tied to request counts rather than context length," suggesting the inflation was applied per-call rather than scaling with actual prompt size. This per-call characteristic is what makes it particularly expensive for agentic workflows with high tool-call frequency.

The sharpest user signal came from r/ClaudeAI, where one thread accumulated significant upvotes around a single observation:

"Every request appears to carry around 20,000 extra tokens that the user never sent."

— r/ClaudeAI, April 2026

Separately, threads on the Anthropic community forum documented users correlating timestamp-specific rate changes: costs that spiked not when context grew, but after a specific date in early April, holding roughly flat at the elevated rate regardless of prompt complexity. Multiple users in these threads noted the issue appeared after an Anthropic server-side update and was not reproducible on API clients that bypassed the Claude Code CLI layer.

To be direct about what we do and don't know: Anthropic has not issued a public statement confirming this as an infrastructure bug. What we have is a consistent pattern across independent user reports. The mechanism — whether it's system prompt inflation, caching metadata, or something in the tool-use scaffolding — is not yet publicly documented. We're reporting what's observable, not what's confirmed.

2. How to detect it in your own logs

Four runnable patterns. Each takes under five minutes. Together they tell you whether you're affected and, if so, the approximate magnitude.

Pattern 1: ccusage diff against the Anthropic dashboard

Install ccusage if you haven't already (npm i -g ccusage). Run a short Claude Code session — a single file edit, a question, anything with at least five tool calls. Then immediately compare:

# Local estimate from session files
ccusage daily --date today

# Then open: console.anthropic.com → Usage → filter to the last 30 minutes

A healthy session shows ccusage within 5–10% of the dashboard (the gap is normal caching overhead). If the dashboard is showing 1.5x or more what ccusage estimates, you have a divergence worth investigating. Users in the r/ClaudeAI thread were describing multipliers of 1.3x to 2.1x on affected accounts.

Pattern 2: /context inspection per request

During an active Claude Code session, type /context to see the current window usage. Note the token count. Then run a single simple tool call (e.g., read one small file). Run /context again immediately after. The delta should approximate: file tokens + response tokens + modest overhead.

# Before tool call
/context
# → note "X tokens used"

# Run: read one 50-line file
# After tool call
/context
# → new count should be X + ~300–500 for a 50-line file

If the delta is 20,000+ tokens on a trivial file read, you're looking at the same inflation pattern described in community reports. The /context view reflects the client-side state — if this is also inflated, the overhead is being injected before the context window is assembled on your machine.

Pattern 3: API-side token count via direct comparison

Run the identical prompt twice — once via the Claude Code CLI, once directly via the Anthropic API using the same model. Compare usage.input_tokens in the raw API response.

# Direct API call (Python, uses anthropic SDK)
import anthropic
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    messages=[{"role": "user", "content": "Say hello."}]
)
print(response.usage)
# → input_tokens: ~12 (the actual prompt)

# Now run the equivalent in Claude Code and check ccusage:
# ccusage session --last 1 | grep input_tokens

A significant gap between direct API input_tokens and Claude Code session input_tokens for an equivalent prompt points to overhead being added by the CLI layer's scaffolding — system prompt, tool definitions, or session context injected server-side.

Pattern 4: Rate-change timestamp correlation

Pull your Anthropic billing API or export your usage CSV from the console. Plot daily token spend per session (not per message) against calendar date. If you see a step-change in cost-per-session that is not correlated with a change in your own prompting behavior, and if that step-change occurred in the first week of April 2026, it's consistent with the server-side update window described in community reports.

# From Anthropic console: Usage → Export CSV
# In your terminal (requires csvkit or similar):
csvstat --csv usage_export.csv | grep -A3 "tokens_input"

# Or in Python:
import pandas as pd
df = pd.read_csv("usage_export.csv", parse_dates=["date"])
df.groupby("date")["input_tokens"].sum().plot()

A clean visual step-change on or around April 1–3, 2026 with no corresponding change in your codebase or prompting patterns is the clearest signal that the inflation is external to your workflow.

Running agents overnight? The token burn compounds fast.

Septim Subagent Cost Guard is a PreToolUse hook that hard-halts agents mid-flight when spend crosses your threshold. $29 founding rate, 50 seats, ships May 2026. Zero-dollar reservation now.

Reserve Cost Guard →

3. What to do about it right now

Three actions you can take today. None require waiting for Anthropic to issue a fix.

Action 1: Lower concurrency on multi-agent sessions

If the overhead is per-request, the most direct cost lever is request volume. Reduce the number of parallel subagents in your current workflows. A session running 4 agents in parallel instead of 12 doesn't cut output by 66% — most of that parallelism is latency optimization, not throughput — but it does cut the number of inflated requests by 66%.

In your CLAUDE.md, add a temporary directive:

# Temporary cost constraint (April 2026)
# Due to active token inflation reports, limit parallel subagent
# spawns to a maximum of 3 concurrent at any time.
# Prefer sequential tool calls over parallel where latency allows.

Action 2: Disable autocompact during the investigation window

Claude Code's autocompact feature summarizes long conversations to keep context manageable. When autocompact fires, it generates a new summary request — which, under current conditions, carries the same per-request overhead. Disabling it prevents those compaction requests from adding to the inflated bill while the situation is active.

In your Claude Code settings (~/.claude/settings.json):

{
  "autoCompact": false
}

Note: disabling autocompact means long sessions may run into context limits. Break long sessions manually rather than relying on autocompact to rescue them.

Action 3: Set an hourly spend alert in the Anthropic console

Anthropic's billing alerts default to monthly thresholds. That's too slow a feedback loop for an active per-request inflation issue. Set a tight hourly alert to catch runaway spend before it compounds:

Go to console.anthropic.com → Settings → Billing → Usage alerts.
Create a new alert at a threshold that represents roughly one hour of your normal usage. If you typically spend $5/day, set an alert at $1 — that's a 5x daily-rate signal within a single hour.
Set the notification to email AND, if available, SMS or webhook. Email alone can arrive 5–20 minutes after the threshold fires.

This doesn't stop the burn, but it cuts the discovery window from "morning coffee" to "within the hour." That difference is several hundred dollars on an active agentic session.

4. Septim's perspective

We can't fix Anthropic's infrastructure, and we're not going to pretend otherwise. If the token inflation is server-side, the only entities that can resolve it are Anthropic's engineering team. What we can do — and what we're actively building — is close the gap on the agent-side multiplier: the part of this problem that lives on your machine, in your session files, and in the number of requests your agents are firing per hour. Septim Subagent Cost Guard puts a hard ceiling on that number. It runs as a PreToolUse hook, reads your local session state, and halts the agent before the next request goes out if your cumulative spend has crossed your threshold. It won't un-inflate a token that's already been billed, but it will stop the 11th request from compounding on the first 10. The founding rate is $29 at septimlabs.vercel.app/subagent-cost-guard. If what you need right now is a human to diagnose your specific billing logs and triage your agent configuration, that's the Septim Rescue engagement at septimlabs.vercel.app/septim-rescue — $299, one working session, a written diagnosis and a concrete fix plan.

Bill already spiked? Septim Rescue.

One focused session: we audit your Claude Code billing logs, identify the inflation source, and give you a written fix plan. $299. Booked and completed within one business day.

Book Septim Rescue →

FAQ

Has Anthropic acknowledged this as a bug?

Not publicly, as of April 20, 2026. What exists is a consistent body of user reports across r/ClaudeAI, the Anthropic community forum, and independent coverage from Efficienist and DevClass describing the same pattern: billing that diverges from local session estimates starting in early April. We will update this post when and if an official statement is issued.

Could this just be system prompt or tool-definition overhead I wasn't accounting for?

Possibly, for some users. Claude Code does inject a system prompt and tool definitions with every request — that's expected overhead, typically 2,000–5,000 tokens. The 20,000-token figure in community reports is significantly above that baseline, and users who had previously benchmarked their overhead describe the gap as new behavior rather than overhead they had simply miscounted before. Run Pattern 3 above (direct API comparison) to isolate Claude Code's scaffolding from your own prompt content.

Will Anthropic refund the extra spend?

We have no information on Anthropic's refund policy for this situation. If you believe you've been overbilled, the appropriate first step is to open a support ticket at support.anthropic.com with your usage export attached and the specific date range flagged. Keep your ccusage output from the same period as supporting documentation.

Does this affect the API directly, or only Claude Code CLI users?

Based on community reports, the pattern appears most pronounced for Claude Code CLI users. Direct API users who crafted their own minimal-context requests did not describe the same divergence in the threads we reviewed. This is consistent with the theory that the overhead originates in the CLI's scaffolding layer rather than the Anthropic API itself — but that's a hypothesis based on user reports, not a confirmed diagnosis.