AI-Native Transformation Framework

AI Execution Standards

The rules and expectations for delegating work to AI across the organization.


Organizational Expectations Policy

The operating principle behind these standards is the Universal Translation Rule: replace "human produces artifact" with "human defines spec → system produces artifact."

Core Principle

AI is treated as an autonomous worker, not a chatbot.

All work assigned to AI must be executable without real-time human supervision during execution. Pre-execution clarification — the agent surfacing assumptions and asking calibrated questions before producing output — is allowed and, at higher maturity, expected. See AI Lab § The Five Stages for the operational pattern.


Mandatory Work Layers

Every AI-enabled workflow must define the four input layers (1–4). Layer 5 (Process Design) extends them with the operational pipeline that consumes the inputs; it's mandatory for Tier 3 / Rung 5 work and optional below that.

Layer 1 — Prompt Craft (baseline skill)

Employees must:

  • write clear instructions
  • specify format
  • include examples when useful
  • resolve ambiguity upfront

Minimum bar: AI output should require ≤20% correction.


Layer 2 — Context Engineering

Each team must maintain a structured context file containing:

  • goals
  • constraints
  • terminology
  • quality standards
  • relevant documents
  • tool access rules

Requirement: AI tasks must load this context before execution.


Layer 3 — Intent Engineering

Every workflow must define:

  • objective hierarchy
  • tradeoff rules
  • escalation conditions
  • what AI may decide vs must escalate

No agent may run without defined intent.


Layer 4 — Specification Engineering (highest standard)

All non-trivial tasks must have a written specification.

Required spec components:

  • problem statement
  • scope
  • inputs
  • constraints
  • acceptance criteria
  • failure conditions
  • success tests
  • completion definition

Rule: If success cannot be verified objectively, the task is not spec-ready.

For brownfield codebases, the inversion matters: the code already exists, and specifications must be reverse-engineered from it before new spec-first work can resume. See brownfield engineering strategy for the spec-from-code workflow.


Layer 5 — Process Design

The operational layer. Once a spec works, the next question is the pipeline that runs it. Process design is the discipline of designing constrained, phased workflows for AI to operate consistently within — distinct from prompt engineering and from spec-writing per se. It's the layer that distinguishes Tier 3 / Rung 5 work from Tier 2 / Rung 4.

The vocabulary, drawn from Anthropic's Building Effective Agents:

  • Prompt chaining — sequential single-prompt steps with intermediate validation
  • Routing — classify the task, dispatch to the appropriate specialized prompt or workflow
  • Parallelization — run independent subtasks concurrently, aggregate results
  • Orchestrator-workers — a lead agent decomposes the task and dispatches workers
  • Evaluator-optimizer — a generator paired with a separate evaluator that scores and iterates
  • Autonomous agents — open-ended exploration with tool use and feedback loops

Decision rule: start single-prompt; add workflow when needed; add multi-agent only when value-per-task justifies the token premium (typical agents use ~4× tokens of chat; multi-agent ~15×, per Anthropic, 2025). Don't reach for multi-agent because it sounds sophisticated.

Anti-patterns:

  • Pitfall: Single megaprompt

    Combines failure modes; impossible to debug.

  • Pitfall: Multi-agent for its own sake

    Expensive, fragile, often single-agent does as well.

  • Pitfall: Context dumping

    More context is often worse context.

  • Pitfall: Skipping evals

    Without evals, AI systems degrade silently.

  • Pitfall: Optimizing prompts when the problem is context

    Failure mode misattribution.

The four input layers (Prompt → Context → Intent → Spec) describe what the human prepares before delegation. Layer 5 describes the pipeline that consumes those inputs. At T3/R5 the pipeline is also a designed artifact — and validation gates within it are risk-graded (HITL / HOTL / HOOTL).


Specification Primitives (learnable skills)

Specification engineering is built from five primitives. Each is a distinct skill to practice. For examples, templates, and worked specs for different roles, see the Specification Guide.

Primitive 1 — Self-Contained Problem Statements

State the problem with enough context that the task is solvable without the agent fetching more information. Surface hidden assumptions. Articulate constraints you normally leave implicit.

Training exercise: Take a request you'd normally make conversationally and rewrite it as if the recipient has never seen your project, doesn't know your terminology, and has access to nothing beyond what you include.

Primitive 2 — Acceptance Criteria

Define what done looks like so that an independent observer can verify the output without asking questions. If you can't write three sentences that verify completion, you don't understand the task well enough to delegate it.

Primitive 3 — Constraint Architecture

Define four categories for every task:

  • Must — non-negotiable requirements
  • Must not — forbidden actions or outputs
  • Prefer — guidance when multiple valid approaches exist
  • Escalate — conditions where the agent must stop and ask

Primitive 4 — Decomposition

Break tasks into components that can be executed independently, tested independently, and integrated predictably. Target granularity: subtasks of ≤2 hours with clear input/output boundaries, each verifiable on its own.

Primitive 5 — Evaluation Design

For every recurring AI task, build 3-5 test cases with known-good outputs. Run them after model updates to catch regressions. Outputs are judged by metrics, not appearance.

A valid spec must pass all five: self-contained, testable, constrained, decomposable, evaluable.


Delegation Readiness Checklist

Before assigning work to AI, employees must confirm:

  • I understand the task completely
  • I can define success objectively
  • I can list failure cases
  • I can describe constraints
  • I can verify results without interpretation

If any answer = no → do not delegate yet.


Failure Responsibility Model

Failure is attributed by layer:

Failure TypeRoot Cause
Bad outputPrompt issue
Irrelevant outputContext issue
Wrong directionIntent issue
Incomplete outputSpec issue
Damaging outputPermission / blast-radius issue — the agent shouldn't have been able to take this action
Unnoticed outputValidation-gate issue — the wrong oversight stance for this action's blast radius (see risk-graded gates)
Wrong direction defended confidentlyClarification skipped — the agent committed before assumptions were surfaced

Teams must fix the responsible layer, not retry prompts.


Organizational Roles

Each production AI system must have:

  • Spec Owner — accountable for specification quality, acceptance criteria, and what "done" means
  • Context Owner — accountable for context files (CLAUDE.md / AGENTS.md), context freshness, and tool/skill scope
  • Evaluation Owner — accountable for the eval suite, regression detection, and quality metrics
  • Permissions Owner — accountable for what each agent can and cannot do, and for the validation gating tier (HITL / HOTL / HOOTL) per action class

One person may hold multiple roles initially. The Permissions Owner role becomes load-bearing as soon as agents touch production systems with irreversible side effects.

These four roles govern a production AI system. They are distinct from the three roles required to govern a team's AI transformation — see Leading the Transformation § Organizational Roles.


Documentation Standard

All internal documents must be written as if an agent will execute them.

Documents must:

  • state assumptions
  • define terms
  • specify outcomes
  • include constraints
  • avoid implicit knowledge

Executive Summary Rule

Clear thinking precedes AI execution.

If you cannot specify it, you cannot automate it.


← Back to home · The reference framework · The AI Lab · Glossary