Prevent Context Rot: Practical Context Engineering for AI Agent Harnesses

Context Engineering for AI Agents — Part 2

On December 4, 2025, a follow-up to earlier work on context engineering refined practical patterns for building resilient agent harnesses. Peak Ji (Manus) and Lance Martin (LangChain) walked through how real systems confront Context Rot, multi-agent coordination, and action-space complexity, and how a harness can shoulder most of the work so models remain focused on reasoning. The conversation builds on prior writing and a recent webinar available on YouTube: Peak Ji’s talk.

A shared vocabulary

Several terms frame the recommendations:

Context Engineering: designing systems that supply the right information and tools, in the right format, so an LLM can complete a task.
Agent Harness: the software around the model that executes tool calls, manages message history, and implements context engineering.
Context Rot: performance degradation as the context window fills, even well before the model’s technical token limit (the effective window is often < 256k tokens).
Context Pollution: irrelevant, redundant, or conflicting information in context that reduces reasoning accuracy.
Context Confusion: failure modes where the model cannot distinguish between instructions, data, and structural markers.

Context compaction and summarization to prevent Context Rot

Two distinct approaches emerge for reducing context cost: Context Compaction (reversible) and Summarization (lossy).

Context Compaction favors storing pointers to artifacts instead of full content. For example, if an agent writes a 500-line file, the chat history should show the path (e.g., Output saved to /src/main.py) rather than inlining the file. Compaction is reversible because the harness can read the file later.
Summarization uses an LLM to compress older history when approaching pre-rot thresholds (Manus often triggers at ~128k tokens). Manus preserves the most recent tool calls in raw form to maintain the model’s momentum and formatting consistency.

The practical ordering recommended is: prefer raw > compaction > summarization, using summarization only when compaction no longer provides sufficient space.

Multi-agent coordination: share by communicating

Multi-agent setups often suffer from Context Pollution when every sub-agent shares a common context. Manus applies a concurrency principle from Go: "Share memory by communicating, don't communicate by sharing memory." Two patterns stand out:

For discrete tasks (clear inputs/outputs), spawn a fresh sub-agent with a minimal, task-specific context.
For complex reasoning (where full trajectory matters), share the entire memory only when strictly necessary. Treat shared context as an expensive dependency; forked contexts avoid cache penalties.

Keep the toolset small with a Hierarchical Action Space

Large tool inventories cause Context Confusion and hallucinated parameters. Manus recommends a Hierarchical Action Space:

Level 1 (Atomic): expose ~20 core tools (e.g., file_write, browser_navigate, bash, search) that are stable and cache-friendly.
Level 2 (Sandbox Utilities): instead of many narrowly defined tools (like grep), use a general bash tool or a CLI wrapper (e.g., mcp-cli <command>) so definitions remain out of the context window.
Level 3 (Code/Packages): for multi-step logic, provide libraries or functions so complex sequences run without multiple LLM roundtrips; let the agent write a dynamic script when needed.

This structure reduces ambiguity and keeps the model’s action space tractable.

Treat agents as tools with structured schemas

Rather than creating elaborate org-charts of agents, treat a sub-agent as a deterministic function call. The main model invokes something like call_planner(goal="..."); the harness runs a temporary sub-agent loop and returns a structured JSON result. This MapReduce-style pattern ensures sub-agent outputs are immediately usable and avoids additional conversational parsing.

Best practices & implementation tips

Avoid using RAG to manage tool definitions. Dynamically fetching tool definitions creates a shifting context that breaks KV-cache behavior and confuses models.
Avoid training custom models prematurely. The field moves rapidly; harness-level flexibility is a better long-term investment than fine-tuning for a specific action space (echoing the sentiment of the Bitter Lesson).
Define a "Pre-Rot Threshold." Monitor token counts and trigger compaction or summarization well before rot (effective windows often degrade < 256k tokens).
Use Agent-as-a-Tool for planning. Returning a structured Plan object from a planner sub-agent is more token-efficient than constantly rewriting a todo.md.
Security and manual confirmation. Browser or shell access requires stronger controls than sandboxing alone; enforce token isolation and human-in-the-loop confirmation when necessary.
Measure with verifiable metrics. Prefer binary, environment-verified success/fail metrics (did the code compile? did the file exist?) over subjective LLM-as-judge scores.
Embrace iterative rewrites. Manus was rewritten multiple times in months; harnesses will evolve as models improve. Removing complexity often yields the biggest gains.

Conclusion

The guidance centers on minimal effective context: remove unnecessary scaffolding, keep action spaces small and structured, and make sub-agents behave like deterministic tools. As models increasingly handle complex reasoning, the harness should reduce friction rather than add layers. For continued reading, the full post and references are available at the original write-up and related links, including the webinar and previous Manus installment.

Original article: https://www.philschmid.de/context-engineering-part-2

Further reading: Manus post on context engineering — https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus and the webinar — https://www.youtube.com/watch?v=6_BcCthVvb8.