We define Context as Code — a design pattern in which structured context injection into a stateless language model creates persistent agent runtime behavior without training, fine-tuning, or custom infrastructure. We show that this pattern is supported by converging evidence: Anthropic's persona vectors research proves context injection creates measurable neural activation patterns; in-context learning closes the performance gap with fine-tuning to 3%; fine-tuning degrades safety alignment while context injection preserves it; and three independent billion-dollar-scale agent systems have converged on plain files as their primary behavioral specification.
We survey the landscape of agent configuration (AGENTS.md, Cursor rules, MCP servers, Codified Context) and find that while individual components of Context as Code are increasingly common, the integrated pattern — hooks as lifecycle, rules as behavior, skills as operations, files as state — is undocumented in any of the 1,400+ papers surveyed. We coin the term and define the architecture.
A companion paper documents a reference implementation that has run daily for eight months (the ALIVE Technical Whitepaper). This paper defines the pattern it implements.
This paper exists because of a problem identified in a companion paper: Personal Context Management.
The PCM paper argues that personal knowledge management (PKM) solves the wrong problem — it files knowledge for human retrieval when what the agent era requires is provisioning context for agent action. PCM systems need to capture context (not just content), maintain temporal state, scope context to prevent cross-pollination, compound across sessions, and provision context to agents at the right scope and time.
A third companion paper argues that personal context is property — exportable, inheritable, deletable with certainty, owned by the individual.
These two theses — context should be provisioned, and context should be owned — create a design constraint: the system that manages personal context must be built from sovereign, portable, human-readable infrastructure. No platform dependency. No proprietary format. No cloud service requirement.
Context as Code is the pattern that meets this constraint. It is the engineering answer to a philosophical and categorical question.
Every existing approach to agent behavior treats context as configuration for an engineered runtime. AGENTS.md configures Copilot. CLAUDE.md configures Claude Code. Cursor rules configure Cursor. The tool is the runtime. The file is its settings.
Context as Code inverts this. The files are not configuration. They are the runtime. There is no engineered system underneath — only a foundation model and a context window. Structure the context precisely enough, inject it at the right lifecycle boundaries, and runtime behavior emerges from the model's interpretation.
The model is the execution engine. The context is the program.
This inversion has a fifteen-year precedent.
Before Infrastructure as Code, engineers manually configured servers. Terraform changed the paradigm: declare the desired state in version-controlled files, let the system converge. GitOps extended it: the repo is the source of truth.
Context as Code makes the same move for agent behavior. Files on disk declare the desired behavioral state. The model converges toward it. Change a markdown file, change the behavior. Version the context, version the behavior.
| GitOps Principle | Context as Code Equivalent |
|---|---|
| Declarative specification | Rules declare constraints, not per-turn instructions |
| Version-controlled source of truth | Context files in Git with full diff history |
| Automated reconciliation | Agent reads files each session, converges to spec |
| Observable state | Human-readable plain text |
Three capabilities converged: lifecycle hooks (injection points at session boundaries), long context windows (room for a runtime alongside actual work), and plugin architectures (structured capability injection). None individually enables Context as Code. Together, they make it inevitable.
| Approach | Mechanism | Persistence | Cost | Safety | Reversibility |
|---|---|---|---|---|---|
| Training | Weight modification | Permanent | $millions | Depends | Irreversible |
| Fine-tuning | Weight adjustment | Semi-permanent | $thousands | Degrades (10 examples can break alignment) | Retrain |
| Prompt engineering | Per-turn instructions | Ephemeral | Fractions of a cent | Preserved | Instant |
| Context injection | Structured context per session | Session-persistent, file-permanent | ~$0.15-0.50/session | Preserved | Edit a file |
Context injection uniquely combines fine-tuning's persistence with prompting's reversibility and safety preservation.
Anthropic's persona vectors research (2025) proved that context injection creates measurable neural activation patterns: "the persona vector activates before the response — it predicts the persona the model will adopt in advance." This is not instruction-following. It is architectural behavioral shaping at the representation level.
Research on in-context representation learning confirms that "when supplied with structured in-context examples, transformers dynamically reorganize the geometry of their latent representation space." The model's internal state physically reconfigures based on the structure of provided context.
Anthropic's many-shot research (NeurIPS 2024) proved that behavioral change from context injection follows a power law. More context, more behavioral change, on a predictable curve. Crucially: "larger models show greater susceptibility." Context as Code becomes more effective as models become more capable.
Fine-tuning with ten harmful examples is sufficient to "undermine safety guardrails substantially" (ICLR 2024). Context injection operates within existing safety boundaries because no weights are modified. Anthropic's preventative steering via context caused "little-to-no degradation in model capabilities."
This is not a minor advantage. It is the difference between a behavioral system that respects its safety training and one that has been partially untrained.
Stanford's TART closed the ICL-fine-tuning gap to 3%. Google DeepMind found ICL achieves better generalization than fine-tuning in data-matched settings. Within 3% of fine-tuning. Better generalization. No safety degradation. Instantly reversible. Orders of magnitude cheaper.
Context injection is not "dump everything in." Chroma Research found every model degrades as input length increases. A 1M-token window "still rots at 50K tokens." The "Lost in the Middle" finding (TACL 2024) showed 30%+ performance drops at certain positions.
The governing principle (Anthropic): "find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." A 200-token skill file replacing 50,000 tokens of MCP context is the exemplar. Structure matters more than quantity.
Anthropic formalized progressive disclosure for agent skills in early 2026 — loading metadata at startup (~200-500 tokens), full instructions on trigger, and resources dynamically. Their implementation reports 72% token reduction.
Context as Code operationalizes this pattern at the personal level through what the ALIVE system's creator describes as "two points where you orient the agent":
"The first is at runtime. You orient it to your world in a lightweight way. And then the prompt that you put in is the second orientation — let's load the context from a walnut and check what needs to be done."
A third level emerges when work begins inside a specific bundle:
"The capsule then becomes the rolling sum total of the external context gathered."
The three tiers map to a deliberate narrowing of scope:
| Level | Trigger | What loads | Token cost |
|---|---|---|---|
| Runtime | Session start (hooks) | World identity, rules, preferences | ~9,000 tokens |
| Walnut | Skill invocation (load-context) | Kernel files, bundle manifests, tasks | ~2,000-5,000 tokens |
| Bundle | Work begins on specific deliverable | Manifest, observations, raw sources, bundle skills | Variable |
Each level is a contextual narrowing. The agent doesn't receive everything at once — it receives what it needs when it needs it. This is not just efficient (attention budget conservation); it is architectural (preventing context pollution across domains).
A Context as Code system has four layers. The specific implementation varies; the pattern is constant.
Hooks inject context at lifecycle boundaries. The minimum viable hook set:
The critical architectural insight is dual-tier enforcement:
Rules that must be absolutely enforced → hook guards. Rules that benefit from judgment → context injection. This split is what separates Context as Code from "a really long system prompt."
Declarative behavioral constraints in plain markdown. Not per-turn instructions — persistent behavioral constitution. Injected at every session start and re-injected at context thresholds.
A well-structured rules layer covers:
These are not suggestions. They are the operating system's kernel. English, not code.
Procedural knowledge encoded as markdown files, loaded when invoked. Skills are step-by-step protocols for complex operations — a save protocol, a context loading sequence, a search procedure.
Skills are the API surface. Rules define behavior. Hooks manage lifecycle. Skills define operations.
Plain files on disk. No database. No service. The filesystem is the database. YAML for session state. Markdown for content. JSON for computed projections.
The key property: every piece of state is human-readable, version-controllable, and portable. Copy the files, move the runtime.
A survey of 1,400+ papers on context engineering found no reference to "Context as Code" as a named pattern. The individual components are increasingly common:
As of March 2026, the integrated four-layer pattern is emerging but rare. GitAgent (open-gitagent) and AgentSys represent early convergence, though neither treats context as user-owned portable property — the PCM differentiator:
| System | Hooks | Rules-as-Behavior | Skills-as-Operations | Files-as-State | Integrated Runtime |
|---|---|---|---|---|---|
| AGENTS.md | No | Static config | No | No | No |
| Cursor rules | No | Scoped config | No | No | No |
| Hermes Agent | No | No | No | 3,575 chars total | No |
| Codified Context (paper) | Retrieval | Via domain agents | No | Partial | Partial |
| Context as Code pattern | Yes | Yes | Yes | Yes | Yes |
The closest conceptual articulation — "Markdown as an Operating System" (LeverageAI) — describes the idea without implementing it. The closest academic work — "Codified Context" (arXiv:2602.20478) — uses a more static, retrieval-based architecture without lifecycle hooks or behavioral emergence.
Three independent billion-dollar-scale systems converged on plain files:
Manus (acquired ~$2B) chose context engineering over fine-tuning: "improvements in hours instead of weeks." Treats "the file system as the ultimate context."
OpenClaw (247K+ GitHub stars) uses markdown files as primary memory and exposes a pluggable context engine slot.
Claude Code uses CLAUDE.md and MEMORY.md as its behavioral specification.
Three teams. Three architectures. Same answer: plain files.
The defining characteristic of Context as Code is that the combination of layers produces behavior that no individual layer specifies.
The model has no memory between sessions. But the hook system creates effective persistence: session records on disk, state snapshots regenerated on save, rules re-injected on every context reset. The model "forgets" but the system remembers.
Infrastructure-tier hooks prevent the model from modifying its own rules. The behavioral constitution is tamper-proof from inside. Customizations flow through a separate override channel.
As the context window fills, monitoring hooks re-inject rules at progressive thresholds. The model's behavioral consistency is actively maintained rather than silently eroding.
File timestamps and session records enable concurrent sessions to detect each other's activity — primitive multi-agent coordination using the filesystem as a shared channel.
Continuous state checkpointing means any session can recover from a crash, compaction, or abrupt termination. The next session reads the recovery state and continues.
None of these are "features" in the traditional sense. They are emergent properties of correctly structured context.
Context as Code suggests a different design philosophy. Instead of engineering runtime systems and configuring them, structure context so precisely that runtime behavior emerges. The engineering effort moves from system design to context design.
If context injection achieves within 3% of fine-tuning while preserving safety, the economics change. You don't need a custom model. You need structured context. The moat moves from weights to context.
Anthropic's own research supports this: system prompts produce "stronger behavioral patterns" than user prompts, and larger models follow them more faithfully. Context as Code becomes more powerful as models improve.
OpenClaw's pluggable context engine slot is the right architecture. But the engines that fill it should not just be RAG retrievers. They should be Context as Code runtimes — structured behavioral contexts that create agent identity, enforce constraints, and compound across sessions.
Does Context as Code scale beyond a single user? Subagent dispatch, brief packs, and tiered loading suggest it can. The proof at enterprise scale does not yet exist.
Context as Code is the engineering pattern that makes Personal Context Management implementable and Context as Property achievable. It is the answer to a constraint: how do you build an agent runtime that is sovereign, portable, human-readable, and requires no platform infrastructure?
The answer: you don't build a runtime at all. You structure context so precisely that a runtime emerges.
The mechanism is supported by neural evidence (persona vectors), performance data (3% gap), safety research (fine-tuning breaks, injection preserves), and convergent evolution (three independent billion-dollar systems chose the same approach).
A reference implementation — the Alive Context System — has run daily for eight months, documented in a companion whitepaper. But the pattern is not the implementation. Anyone with a foundation model, a hook system, and a text editor can build a Context as Code runtime.
Context is not configuration for a runtime. Context is the runtime.