Go raibh maith agat

Thank you

©Dhruvil Patel 2026 · v0.1.0 · Privacy

The Physics of Context: More Tokens, Less Intelligence | Dhruvil Patel

Home
Writing
The Physics of Context: More Tokens, Less Intelligence

May 10, 2026

The Physics of Context: More Tokens, Less Intelligence

Q: Why does my RAG system get worse when I retrieve more chunks?

Attention is normalized across all context tokens, so extra chunks dilute the attention available to the load-bearing ones. Worse, retrieved chunks are near-misses by construction - semantically similar to the answer - which makes them active distractors, not passive padding. Raise your relevance threshold instead of your top-k.

Q: Is long-context degradation the same as "lost in the middle"?

Related but distinct. Lost-in-the-middle is positional - facts in the middle of long contexts get less reliable attention. Dilution and interference are compositional - they depend on how much competes for attention and how confusable it is. You can suffer all three at once.

Q: Do bigger context windows make these problems go away?

No. Bigger windows raise the ceiling on how much you *can* include; they don't add attention capacity. Newer models degrade more slowly, but the slope stays negative for low-signal-density context, and interference between contradictory near-misses is an information problem no window size fixes.

Q: How do I measure signal density in my system?

Ablation: remove context segments and check whether the answer changes. Segments whose removal doesn't affect correct answers are non-load-bearing. Even a coarse version - running evals at several top-k values and history lengths - reveals where your quality peak actually sits. Most teams discover it's far below their current inclusion level.

Q: Should agents keep full history in context?

Almost never. Agent context should be actively curated per step: goal pinned at a high-attention position, completed subtasks compressed to conclusions, superseded observations dropped. Passing full transcripts between agents or across long horizons trades a small summarization cost for a large attention tax.

Q: What does this essay explain about the upgrade that made things worse?

The most confusing week of my engineering life started with a model upgrade that should have been free. We moved a production knowledge assistant from a model with a 32k context window to one with 200k. Same family, newer generation, better on every benchmark we could find. The migration was one config change and a round of smoke tests.

Q: What additional detail does the "The Upgrade That Made Things Worse" section cover?

We moved a production knowledge assistant from a model with a 32k context window to one with 200k. Same family, newer generation, better on every benchmark we could find. The migration was one config change and a round of smoke tests.

Q: What additional detail does the "Just Include Everything" section cover?

Retrieval top-k went from 8 chunks to 40, because why not - recall goes up, and the model can ignore what it doesn't need. Conversation history went from a rolling summary to full verbatim transcripts, because summarizing was lossy and now we didn't have to. Tool outputs stopped being truncated, because truncation had caused a bug once and deleting the truncation code felt like progress.

Q: What does this essay explain about the chart with the wrong slope?

Debugging this meant building the eval I should have built from the start. We took 200 questions with known answers and ran them at increasing context sizes. Same question, same gold chunk always present, but padded with progressively more retrieved-but-unnecessary material - the exact material our loosened pipeline was now including.

Q: What additional detail does the "The Chart With the Wrong Slope" section cover?

We took 200 questions with known answers and ran them at increasing context sizes. Same question, same gold chunk always present, but padded with progressively more retrieved-but-unnecessary material - the exact material our loosened pipeline was now including.

Attention is a conserved quantity. Every token you add taxes every token already there. A physical model for why bigger contexts make LLMs dumber.

applied ai engineering
RAG Systems
llm engineering

No content blocks yet.

Key Takeaways

Context windows grow; attention doesn't.
Attention is normalized - a fixed budget split across every token present - so each token you add taxes all the others.
Quality versus context size is a peaked curve: rising while you add needed facts, falling as low-value tokens dilute the budget.
Near-miss content is worse than noise: retrieval-shaped distractors attract real attention and interfere destructively, producing grounded-sounding wrong answers.
Needle-in-a-haystack benchmarks test the zero-interference corner of the space and say little about production behavior.

FAQ

Why does my RAG system get worse when I retrieve more chunks?

Is long-context degradation the same as "lost in the middle"?

Do bigger context windows make these problems go away?

How do I measure signal density in my system?

Should agents keep full history in context?

What does this essay explain about the upgrade that made things worse?

What additional detail does the "The Upgrade That Made Things Worse" section cover?

What does this essay explain about just include everything?

What additional detail does the "Just Include Everything" section cover?

What does this essay explain about the chart with the wrong slope?

What additional detail does the "The Chart With the Wrong Slope" section cover?

What does this essay explain about attention is a conserved quantity?

What additional detail does the "Attention Is a Conserved Quantity" section cover?

What does "Attention is a conserved quantity. Context is not" mean in this essay?

What key claim does the essay make about attention is a conserved quantity?

What does this essay explain about signal density?

What additional detail does the "Signal Density" section cover?

What does this essay explain about context interference?

What additional detail does the "Context Interference" section cover?

Why Needle-in-a-Haystack Lied to You?

What additional detail does the "Why Needle-in-a-Haystack Lied to You" section cover?

What This Looks Like in Real Systems?

What additional detail does the "What This Looks Like in Real Systems" section cover?

What does "Enterprise RAG: the top-k ratchet" mean in this essay?

What does "Agents: context as entropy accumulator" mean in this essay?

What does "Multi-agent systems: shared context as shared pollution" mean in this essay?

What does "NL2SQL: the schema dump" mean in this essay?

What are the limits of the argument presented in this essay?

What additional detail does the "Where the Physics Analogy Breaks" section cover?

Where does the argument break down — first: attention isn't literally zero-sum across layers?

Where does the argument break down — second: models are improving at exactly this?

Where does the argument break down — third: some workloads genuinely need the volume?

Where does the argument break down — fourth: prompt caching changes the economics but not the physics?

Is it true that context windows grow; attention doesn't?

What should readers take away about attention is normalized - a fixed budget split across every token present - so each token you add taxes all the others?

What should readers take away about quality versus context size is a peaked curve: rising while you add needed facts?

What should readers take away about near-miss content is worse than noise: retrieval-shaped distractors attract real attention and interfere destructively?

What should readers take away about needle-in-a-haystack benchmarks test the zero-interference corner of the space and say little about production behavior?

What should readers take away about manage signal density (fraction of load-bearing tokens) and interference (confusable near-misses) explicitly?

What is The Physics of Context: More Tokens, Less Intelligence about?

Which series is The Physics of Context: More Tokens, Less Intelligence part of?

How does The Physics of Context: More Tokens, Less Intelligence relate to long context degradation?

How does The Physics of Context: More Tokens, Less Intelligence relate to attention dilution?

How does The Physics of Context: More Tokens, Less Intelligence relate to lost in the middle?

How does The Physics of Context: More Tokens, Less Intelligence relate to context window size?

How does The Physics of Context: More Tokens, Less Intelligence relate to RAG top-k tuning?

How does The Physics of Context: More Tokens, Less Intelligence relate to needle in a haystack benchmark?

How does The Physics of Context: More Tokens, Less Intelligence relate to context engineering?

How does The Physics of Context: More Tokens, Less Intelligence relate to signal to noise ratio LLM?

Who should read The Physics of Context: More Tokens, Less Intelligence?

What is this writing piece about?

What are the key takeaways from The Physics of Context: More Tokens, Less Intelligence?

How does The Physics of Context: More Tokens, Less Intelligence relate to Dhruvil Patel's work?

Related Expertise

RAG Systems

Related Concepts

attention-dilution
context-window-size
llm-context-window-performance
long-context-degradation
lost-in-the-middle
mental-models-for-ai-engineering
needle-in-a-haystack-benchmark
rag-top-k-tuning

Related Research

Designing Production RAG Systems That Don't Hallucinate

Evaluation Is the Biggest Unsolved Problem in AI Engineering

The Future of AI Is Distributed Cognition, Not Bigger Models

The Hidden Cost of Context: Introducing Context Debt

Designing AI That Knows When to Forget

Related Automations

Knowledge-Base Freshness Monitor: Keeping RAG True, Not Just Indexed
Reports That Write Themselves: SQL for the Numbers, RAG for the Narrative
Multi-Source Research Briefs: One Topic In, One Cited Brief Out
AI Ticket Triage with Confidence-Gated Draft Replies

Key Takeaways

FAQ

Related Expertise

Related Concepts

Related Research

Related Articles

Related Automations