Draft

Context as Code: The Pattern for Agent Runtime from Structured Context

Ben Flint

Lock-in Lab

March 2026

Abstract

We define Context as Code — a design pattern in which structured context injection into a stateless language model creates persistent agent runtime behavior without training, fine-tuning, or custom infrastructure. We show that this pattern is supported by converging evidence: Anthropic's persona vectors research proves context injection creates measurable neural activation patterns; in-context learning closes the performance gap with fine-tuning to 3%; fine-tuning degrades safety alignment while context injection preserves it; and three independent billion-dollar-scale agent systems have converged on plain files as their primary behavioral specification.

We survey the landscape of agent configuration (AGENTS.md, Cursor rules, MCP servers, Codified Context) and find that while individual components of Context as Code are increasingly common, the integrated pattern — hooks as lifecycle, rules as behavior, skills as operations, files as state — is undocumented in any of the 1,400+ papers surveyed. We coin the term and define the architecture.

A companion paper documents a reference implementation that has run daily for eight months (the ALIVE Technical Whitepaper). This paper defines the pattern it implements.

1. Why This Pattern Exists

This paper exists because of a problem identified in a companion paper: Personal Context Management.

The PCM paper argues that personal knowledge management (PKM) solves the wrong problem — it files knowledge for human retrieval when what the agent era requires is provisioning context for agent action. PCM systems need to capture context (not just content), maintain temporal state, scope context to prevent cross-pollination, compound across sessions, and provision context to agents at the right scope and time.

A third companion paper argues that personal context is property — exportable, inheritable, deletable with certainty, owned by the individual.

These two theses — context should be provisioned, and context should be owned — create a design constraint: the system that manages personal context must be built from sovereign, portable, human-readable infrastructure. No platform dependency. No proprietary format. No cloud service requirement.

Context as Code is the pattern that meets this constraint. It is the engineering answer to a philosophical and categorical question.

2. The Inversion

2.1 Configuration vs Runtime

Every existing approach to agent behavior treats context as configuration for an engineered runtime. AGENTS.md configures Copilot. CLAUDE.md configures Claude Code. Cursor rules configure Cursor. The tool is the runtime. The file is its settings.

Context as Code inverts this. The files are not configuration. They are the runtime. There is no engineered system underneath — only a foundation model and a context window. Structure the context precisely enough, inject it at the right lifecycle boundaries, and runtime behavior emerges from the model's interpretation.

The model is the execution engine. The context is the program.

2.2 Infrastructure as Code: The Precedent

This inversion has a fifteen-year precedent.

Before Infrastructure as Code, engineers manually configured servers. Terraform changed the paradigm: declare the desired state in version-controlled files, let the system converge. GitOps extended it: the repo is the source of truth.

Context as Code makes the same move for agent behavior. Files on disk declare the desired behavioral state. The model converges toward it. Change a markdown file, change the behavior. Version the context, version the behavior.

GitOps Principle	Context as Code Equivalent
Declarative specification	Rules declare constraints, not per-turn instructions
Version-controlled source of truth	Context files in Git with full diff history
Automated reconciliation	Agent reads files each session, converges to spec
Observable state	Human-readable plain text

2.3 Why Now

Three capabilities converged: lifecycle hooks (injection points at session boundaries), long context windows (room for a runtime alongside actual work), and plugin architectures (structured capability injection). None individually enables Context as Code. Together, they make it inevitable.

3. The Mechanism: Persistent Context Engineering

3.1 Four Ways to Shape Behavior

Approach	Mechanism	Persistence	Cost	Safety	Reversibility
Training	Weight modification	Permanent	$millions	Depends	Irreversible
Fine-tuning	Weight adjustment	Semi-permanent	$thousands	Degrades (10 examples can break alignment)	Retrain
Prompt engineering	Per-turn instructions	Ephemeral	Fractions of a cent	Preserved	Instant
Context injection	Structured context per session	Session-persistent, file-permanent	~$0.15-0.50/session	Preserved	Edit a file

Context injection uniquely combines fine-tuning's persistence with prompting's reversibility and safety preservation.

3.2 The Neural Evidence

Anthropic's persona vectors research (2025) proved that context injection creates measurable neural activation patterns: "the persona vector activates before the response — it predicts the persona the model will adopt in advance." This is not instruction-following. It is architectural behavioral shaping at the representation level.

Research on in-context representation learning confirms that "when supplied with structured in-context examples, transformers dynamically reorganize the geometry of their latent representation space." The model's internal state physically reconfigures based on the structure of provided context.

3.3 The Power Law

Anthropic's many-shot research (NeurIPS 2024) proved that behavioral change from context injection follows a power law. More context, more behavioral change, on a predictable curve. Crucially: "larger models show greater susceptibility." Context as Code becomes more effective as models become more capable.

3.4 The Safety Argument

Fine-tuning with ten harmful examples is sufficient to "undermine safety guardrails substantially" (ICLR 2024). Context injection operates within existing safety boundaries because no weights are modified. Anthropic's preventative steering via context caused "little-to-no degradation in model capabilities."

This is not a minor advantage. It is the difference between a behavioral system that respects its safety training and one that has been partially untrained.

3.5 The Performance Gap

Stanford's TART closed the ICL-fine-tuning gap to 3%. Google DeepMind found ICL achieves better generalization than fine-tuning in data-matched settings. Within 3% of fine-tuning. Better generalization. No safety degradation. Instantly reversible. Orders of magnitude cheaper.

3.6 The Attention Budget

Context injection is not "dump everything in." Chroma Research found every model degrades as input length increases. A 1M-token window "still rots at 50K tokens." The "Lost in the Middle" finding (TACL 2024) showed 30%+ performance drops at certain positions.

The governing principle (Anthropic): "find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." A 200-token skill file replacing 50,000 tokens of MCP context is the exemplar. Structure matters more than quantity.

3.7 Progressive Context Disclosure

Anthropic formalized progressive disclosure for agent skills in early 2026 — loading metadata at startup (~200-500 tokens), full instructions on trigger, and resources dynamically. Their implementation reports 72% token reduction.

Context as Code operationalizes this pattern at the personal level through what the ALIVE system's creator describes as "two points where you orient the agent":

"The first is at runtime. You orient it to your world in a lightweight way. And then the prompt that you put in is the second orientation — let's load the context from a walnut and check what needs to be done."

A third level emerges when work begins inside a specific bundle:

"The capsule then becomes the rolling sum total of the external context gathered."

The three tiers map to a deliberate narrowing of scope:

Level	Trigger	What loads	Token cost
Runtime	Session start (hooks)	World identity, rules, preferences	~9,000 tokens
Walnut	Skill invocation (load-context)	Kernel files, bundle manifests, tasks	~2,000-5,000 tokens
Bundle	Work begins on specific deliverable	Manifest, observations, raw sources, bundle skills	Variable

Each level is a contextual narrowing. The agent doesn't receive everything at once — it receives what it needs when it needs it. This is not just efficient (attention budget conservation); it is architectural (preventing context pollution across domains).

4. The Architecture Pattern

A Context as Code system has four layers. The specific implementation varies; the pattern is constant.

4.1 The Hook Layer

Hooks inject context at lifecycle boundaries. The minimum viable hook set:

Session start: inject behavioral rules, user identity, and current state
Tool interception: enforce invariants the model must not violate (infrastructure-tier constraints)
Context monitoring: re-inject rules as the context window fills (graceful degradation)
Compaction recovery: restore full behavioral context after context window compression

The critical architectural insight is dual-tier enforcement:

Infrastructure tier: Hook guards that MECHANICALLY block violations. The model cannot bypass them regardless of its instructions. Log immutability, file protection, deletion prevention.
Context tier: Injected rules that the model INTERPRETS and follows. Persona, voice, decision-making style, energy matching.

Rules that must be absolutely enforced → hook guards. Rules that benefit from judgment → context injection. This split is what separates Context as Code from "a really long system prompt."

4.2 The Rules Layer

Declarative behavioral constraints in plain markdown. Not per-turn instructions — persistent behavioral constitution. Injected at every session start and re-injected at context thresholds.

A well-structured rules layer covers:

Relationship contract: How the agent relates to the human (surface options, don't decide; read before speaking; hold position when pushed)
Operating instincts: Behavioral instructions that run every session without being asked
Communication constraints: Voice, tone, banned phrases, energy matching
Structural standards: How files are named, formatted, signed

These are not suggestions. They are the operating system's kernel. English, not code.

4.3 The Skills Layer

Procedural knowledge encoded as markdown files, loaded when invoked. Skills are step-by-step protocols for complex operations — a save protocol, a context loading sequence, a search procedure.

Skills are the API surface. Rules define behavior. Hooks manage lifecycle. Skills define operations.

4.4 The State Layer

Plain files on disk. No database. No service. The filesystem is the database. YAML for session state. Markdown for content. JSON for computed projections.

The key property: every piece of state is human-readable, version-controllable, and portable. Copy the files, move the runtime.

5. The Landscape

5.1 What Exists

A survey of 1,400+ papers on context engineering found no reference to "Context as Code" as a named pattern. The individual components are increasingly common:

Hooks for lifecycle: Claude Code, Windsurf Cascade, and several open-source projects use hooks for automation and enforcement
Markdown for context: AGENTS.md (60,000+ repos), CLAUDE.md, Cursor rules — all shape agent behavior through markdown files
Session persistence: MCP-based systems, Hermes Agent, and various tools provide some cross-session memory
Subagent dispatch: Claude Code natively supports spawning background agents

5.2 What Doesn't Exist

As of March 2026, the integrated four-layer pattern is emerging but rare. GitAgent (open-gitagent) and AgentSys represent early convergence, though neither treats context as user-owned portable property — the PCM differentiator:

System	Hooks	Rules-as-Behavior	Skills-as-Operations	Files-as-State	Integrated Runtime
AGENTS.md	No	Static config	No	No	No
Cursor rules	No	Scoped config	No	No	No
Hermes Agent	No	No	No	3,575 chars total	No
Codified Context (paper)	Retrieval	Via domain agents	No	Partial	Partial
Context as Code pattern	Yes	Yes	Yes	Yes	Yes

The closest conceptual articulation — "Markdown as an Operating System" (LeverageAI) — describes the idea without implementing it. The closest academic work — "Codified Context" (arXiv:2602.20478) — uses a more static, retrieval-based architecture without lifecycle hooks or behavioral emergence.

5.3 Convergent Evolution

Three independent billion-dollar-scale systems converged on plain files:

Manus (acquired ~$2B) chose context engineering over fine-tuning: "improvements in hours instead of weeks." Treats "the file system as the ultimate context."

OpenClaw (247K+ GitHub stars) uses markdown files as primary memory and exposes a pluggable context engine slot.

Claude Code uses CLAUDE.md and MEMORY.md as its behavioral specification.

Three teams. Three architectures. Same answer: plain files.

6. Emergent Properties

The defining characteristic of Context as Code is that the combination of layers produces behavior that no individual layer specifies.

6.1 Persistence Without State

The model has no memory between sessions. But the hook system creates effective persistence: session records on disk, state snapshots regenerated on save, rules re-injected on every context reset. The model "forgets" but the system remembers.

6.2 Self-Protection

Infrastructure-tier hooks prevent the model from modifying its own rules. The behavioral constitution is tamper-proof from inside. Customizations flow through a separate override channel.

6.3 Graceful Degradation

As the context window fills, monitoring hooks re-inject rules at progressive thresholds. The model's behavioral consistency is actively maintained rather than silently eroding.

6.4 Multi-Session Awareness

File timestamps and session records enable concurrent sessions to detect each other's activity — primitive multi-agent coordination using the filesystem as a shared channel.

6.5 Crash Recovery

Continuous state checkpointing means any session can recover from a crash, compaction, or abrupt termination. The next session reads the recovery state and continues.

None of these are "features" in the traditional sense. They are emergent properties of correctly structured context.

7. Implications

7.1 For Agent Architecture

Context as Code suggests a different design philosophy. Instead of engineering runtime systems and configuring them, structure context so precisely that runtime behavior emerges. The engineering effort moves from system design to context design.

7.2 For Model Providers

If context injection achieves within 3% of fine-tuning while preserving safety, the economics change. You don't need a custom model. You need structured context. The moat moves from weights to context.

Anthropic's own research supports this: system prompts produce "stronger behavioral patterns" than user prompts, and larger models follow them more faithfully. Context as Code becomes more powerful as models improve.

7.3 For Platform Context Engines

OpenClaw's pluggable context engine slot is the right architecture. But the engines that fill it should not just be RAG retrievers. They should be Context as Code runtimes — structured behavioral contexts that create agent identity, enforce constraints, and compound across sessions.

7.4 The Open Question: Scale

Does Context as Code scale beyond a single user? Subagent dispatch, brief packs, and tiered loading suggest it can. The proof at enterprise scale does not yet exist.

8. Conclusion

Context as Code is the engineering pattern that makes Personal Context Management implementable and Context as Property achievable. It is the answer to a constraint: how do you build an agent runtime that is sovereign, portable, human-readable, and requires no platform infrastructure?

The answer: you don't build a runtime at all. You structure context so precisely that a runtime emerges.

The mechanism is supported by neural evidence (persona vectors), performance data (3% gap), safety research (fine-tuning breaks, injection preserves), and convergent evolution (three independent billion-dollar systems chose the same approach).

A reference implementation — the Alive Context System — has run daily for eight months, documented in a companion whitepaper. But the pattern is not the implementation. Anyone with a foundation model, a hook system, and a text editor can build a Context as Code runtime.

Context is not configuration for a runtime. Context is the runtime.

References

Companion Papers

Flint, B. (2026). "Personal Context Management: Defining the Category." Lock-in Lab.
Flint, B. (2026). "Context as Property." Lock-in Lab.
Flint, B. (2026). "The Alive Context System: Technical Whitepaper." Lock-in Lab.

← PCM Category Context as Property →