AI-Native Transformation Framework

Glossary

Definitions of concepts used in the AI transformation.


AI Maturity

AI-Assisted — AI is a personal tool; nothing structural changes if it disappears. See the reference framework.

AI-Integrated — AI is embedded in workflows; roles shift from doing to directing. See the reference framework.

AI-Native — Work design assumes AI as a first-class resource; roles defined by judgment, not execution. See the reference framework.

AI-Supportive — Leadership endorses AI personally without pushing organizational adoption. See the reference framework.

AI-Operational — Leadership sets role-based expectations and funds automation before hiring. See the reference framework.

AI-Strategic — Leadership redesigns the organization around AI and makes AI literacy a condition of leadership. See the reference framework.

Unexposed (Tier 0) — AI is not part of work. No experimentation, no awareness of capabilities. See the reference framework.

AI-Curious (Tier 0.5) — Has tried AI but it hasn't changed how work gets done. The gap to Tier 1 is not knowledge but the habit of reaching for AI when work starts. See the reference framework.

AI-Aware (Tier 1) — Individual uses AI as a personal tool without changing workflows. See the reference framework.

AI-Building (Tier 1.5) — Actively designing and testing AI workflows. Building prompts, iterating, experimenting. The construction phase between ad-hoc use and established workflows. This is where most people stall. See the reference framework.

AI-Augmented (Tier 2) — Individual integrates AI into recurring workflows systematically. See the reference framework.

AI-Advanced (Tier 2.5) — Building systems where AI handles most execution. Multiple processes redesigned. The role title hasn't changed but the work inside has. See the reference framework.

AI-Native (Tier 3) — Role redesigned around judgment and direction. The person forecasts where the human-agent boundary will move and allocates attention where it creates the most value. See the reference framework.


AI Engineering

Autonomous production (Rung 5)

Engineering model where the spec goes in and software comes out without human intervention on the code. The human defines architecture, constraints, and scenarios; AI produces, tests, and iterates the code. Also known as dark factory. See the AI Lab.

Assisted coding (Rung 0)

Development mode where the human codes and AI suggests completions. The lowest level of AI assistance in software engineering.

Non-interactive development

Working mode where specifications and scenarios drive autonomous agents. The human doesn't code and doesn't converse with the agent during execution. See the AI Lab.

Scenarios

End-to-end user journeys that describe expected behavior from the user's perspective. Favored over unit tests because they are harder for agents to circumvent. See the AI Lab.

Satisfaction metric

Evaluation approach that measures the fraction of trajectories across all scenarios that satisfy the user, rather than a binary green/red test result. See the AI Lab.

Deliberate naivety

The stance of removing traditional development conventions and systematically asking: "Why am I doing this? The model should be doing it instead." See the AI Lab.

Greenfield

A project started from scratch, with no existing code. The most natural terrain for non-interactive development. See the AI Lab.

Brownfield

A project with existing code and habits, transitioned to the autonomous production model. Harder than greenfield, but more impactful. See the AI Lab.


AI Skills

AI literacy — Structured use of AI tools and the ability to distinguish ad hoc usage from workflow integration. See the employee guide.

Prompt craft — Clear instructions, specified format, examples, resolved ambiguity. See the execution standards.

Context engineering — Structured context file loaded before AI tasks. See the execution standards.

Intent engineering — Defined objective hierarchy, tradeoff rules, and escalation conditions. See the execution standards.

Specification engineering — Every non-trivial task has a complete written specification built from five primitives. See the execution standards and the Specification Guide for practical examples.

Specification — A document defining a problem precisely enough for an agent to solve it autonomously. See the execution standards and the Specification Guide.

Self-contained problem statements — Problem stated with enough context to be solvable without additional information. See the execution standards.

Acceptance criteria — What done looks like, verifiable by an independent observer. See the execution standards.

Constraint architecture — Four categories per task: Must, Must not, Prefer, Escalate. See the execution standards.

Decomposition — Tasks broken into independently executable, testable, and integrable components. See the execution standards.

Evaluation design — Test cases with known-good outputs to validate and catch regressions. See the execution standards.

Seam design

The practice of structuring work so that transitions between human and agent phases are clean, verifiable, and recoverable. A good seam defines the handoff artifact, allows checking agent output at the transition point, and enables intervention without starting over. The seams shift as capabilities evolve. See the employee guide.


Transformation Economics

Value migration

Technology reassigns value to the scarcest layer. In the AI transformation, value leaves execution (commodity) and concentrates on judgment, framing, and risk ownership (premium). See the vision.

The 5 human functions

Direction, Judgment, Taste, Relationship, Accountability. The functions that remain irreplaceable in an AI-native organization. See the vision.


Role Evolution

Convergence — Multiple roles merge because AI removes the coordination overhead that justified separating them. The converged role retains the combined judgment surface. See Role Evolution.

Specialization — A role narrows to its irreducible human core as AI absorbs the routine layer. The role becomes sharper, not smaller. See Role Evolution.

Elevation — Humans shift from producing artifacts to specifying and evaluating them. Maps to the Universal Translation Rule. See Role Evolution.

Absorption — A role's responsibilities get absorbed into adjacent roles or systems. The responsibilities redistribute; the role contracts or disappears. See Role Evolution.

Emergence — Structurally new roles arise from the AI-native organizational structure. Named for their responsibility, not the technology. See Role Evolution.

Role Decision Matrix — A structured tool mapping observable conditions to the most likely evolution pattern and recommended action. See Role Evolution.


Adoption and Transition

Adoption J-curve

The predictable productivity dip during AI adoption. Productivity drops before it rises. Organizations that climb out are the ones that redesign their workflows around AI capabilities. See the manager guide.

Transition brief

A structured document delivered by an employee that describes their current role, AI-first vision, gap, systems to build, metrics, and 30/60/90 plan. See the employee guide.

AI clinics

Regular sessions (weekly or biweekly) where the team shares discoveries, blockers, and workflows. Short format (30 min). The goal is peer learning. See the manager guide.

Six-month wall

Failure pattern where AI-driven projects without strong human involvement (specs, scenarios, architecture) accumulate structural debt that explodes after roughly six months. Scenarios are the primary defense. See the AI Lab.

Calibration decay

AI skills expire as capabilities evolve. A person who calibrated their sense of the human-agent boundary six months ago is now either over-trusting or under-using current models. The antidote is feedback density: frequent delegate-evaluate-adjust cycles with current models, not one-time training. See the manager guide.


Cognitive Cost

Cognitive J-curve

The mental-energy counterpart to the productivity J-curve. Cognitive load rises sharply during the Tier 1→2 transition (learning to specify, evaluating unreliable output, maintaining normal workload) and drops back down once workflows stabilize at Tier 2. The exhaustion concentrates in the transition, not the end state. See Cognitive Cost.

Cognitive overload (brain fry)

Mental fatigue from AI oversight that exceeds cognitive capacity. Symptoms: mental fog, slower decisions, error rates rising. The BCG/UC Riverside study found productivity gains reverse past three concurrent AI tools. See Cognitive Cost.

Decision fatigue

Depletion from the volume of micro-decisions AI introduces. Every AI output is a decision — good enough, edit, regenerate, trust, verify — and the volume degrades the quality of the decisions that actually matter. See Cognitive Cost.

Vigilance fatigue

Exhaustion from sustained monitoring of AI systems that are usually correct. Structurally similar to aviation automation complacency: attention drifts because the system works well most of the time, and errors look plausible. See Cognitive Cost.

Work intensification

The pattern where AI expands scope rather than reducing it. Three mechanisms: task expansion (people take on work they previously wouldn't have), blurred boundaries (AI tools feel informal, work spills), and multitasking (AI generates in parallel while humans monitor). See Cognitive Cost.

Workload inflation

The organizational temptation to increase output quotas proportionally to AI-enabled speed. Production capacity scales with AI; judgment capacity doesn't. Doubling output quotas because drafts come out faster is how the most engaged people burn out. See Cognitive Cost.

AI anxiety

Anticipatory stress driven by uncertainty about job security, skill relevance, and career trajectory. Distinct from brain fry: hits people who fear AI, including those who haven't started using it. See Cognitive Cost.

Identity disruption

Loss of professional identity when AI performs skills that defined the self-image. Even when roles improve objectively, workers report feelings of obsolescence, loss of purpose, and reduced self-worth. See Cognitive Cost.

Learned helplessness

The pattern of withdrawal when AI systems make decisions workers don't understand, control, or can override. People stop thinking critically about AI output and defer even when they disagree. The most dangerous pattern for transformation because it looks like compliance. See Cognitive Cost.

Transformation fatigue

Cumulative exhaustion from constant change — new tools, new workflows, new expectations — on top of normal workload. Not specific to AI but compounded by it. A rational response to sustained cognitive demand without sufficient recovery. See Cognitive Cost.


Codebase Readiness

Codebase readiness levels

A five-level model for evaluating whether a codebase supports AI-native development: Opaque (L0), Instrumented (L1), Validated (L2), Legible (L3), Specified (L4), Scenario-governed (L5). Each level is defined by the feedback mechanism it adds. A codebase's readiness level is the ceiling on the engineering Rung that can operate reliably on it. See Codebase readiness.

Codebase Readiness Grid

The nine-dimension diagnostic at the core of the Codebase readiness model. Each dimension is scored 1–5 against its own rubric. The Grid is a vector, not a scalar — it is never summarized with an average. The ceiling (lowest score) sets the readiness level; blocking dimensions (D1, D2, D5) take priority over constraining ones. An open-source Claude Code skill runs the Grid on any repo.

Harness

The infrastructure surrounding an AI coding agent that constrains and validates its output. Two parts: guides (feedforward — types, conventions, docs, architecture) and sensors (feedback — tests, CI, observability). Framed by Fowler as "Agent = Model + Harness." In brownfield codebases, building the harness is the leverage point, not choosing a better model. See Codebase readiness.

Ambient affordances

Structural properties of a codebase that make it legible to an AI agent without explicit instruction: strong typing, clear module boundaries, consistent naming, established frameworks, explicit dependency boundaries. Their absence forces agents to invent structure or inject inconsistency. See Codebase readiness.

Feedback loop topology

The density and latency of feedback mechanisms across a codebase. The ACMM finding: maturity is defined by the presence of feedback, not by tool sophistication. Fast CI (under 30 minutes), useful tests, and structured observability close the agent's correction loop. A 72-hour test suite is not a sensor — it's a report.

Dependency and runtime currency

A Codebase readiness dimension that measures whether the stack matches patterns current AI models are trained on. An EOL runtime, abandoned libraries (Enzyme, jQuery in React), or a framework two major versions behind current make a codebase less legible to agents even when the code itself is well-structured — agents produce code for the current version's idioms while the codebase follows an older version. See Codebase readiness — dimension 9.

Blocking dimensions

The three Codebase readiness dimensions whose low scores compromise agent work fundamentally and cannot be compensated by high scores elsewhere: test coverage and feedback latency (D1), type strictness (D2), and API directness (D5). A codebase scoring 1–2 on any of these is not rescued by scoring 5 on everything else — agents are blind, hallucinate shapes, or produce confidently wrong code at opaque call sites. See Codebase readiness — how scoring works.

Constraining dimensions

The six Codebase readiness dimensions that degrade agent output quality when they score low, but don't block agent work outright: file size and context legibility, module boundary clarity, documented intent, observability, dev and deploy simplicity, and dependency and runtime currency. Low scores here mean more human review per change and more cleanup — but agents can still produce reliable value. See Codebase readiness — how scoring works. See Codebase readiness.


Brownfield Strategy

The four brownfield modes

Remediate in place, strangler-fig migration, full rebuild, isolate and bypass. Each fits a different combination of architectural soundness, seam clarity, business continuity constraints, team capacity, and remaining value in the legacy. Choosing the wrong mode is expensive. See Brownfield engineering strategy.

Isolate and bypass

A brownfield mode where the legacy is frozen in maintenance mode and new value is delivered as Level 5-ready standalone apps alongside it. The right choice when remediation cost exceeds the value remaining in the legacy. Buys time but doesn't solve the underlying problem — eventually something forces the replacement decision. See Brownfield engineering strategy.

Research, Review, Rebuild

A phase-gated methodology for AI-assisted brownfield modernization (Fowler/EPAM): Research (AI reconstructs intent from existing code), Review (domain experts validate the intent map), Rebuild (AI generates replacement code with minimal ambiguity). Skipping Research and Review produces confident wrong output faster. Human review is the throughput bottleneck, not AI generation. See Brownfield engineering strategy.

Spec-from-code

The brownfield inversion of spec-driven development. Specs precede code in greenfield; in brownfield, specs must be reverse-engineered from existing code before new spec-first work can resume. Extracting the implicit specification is the hardest and most human work in the transition — agents can document what the system does, only humans can distinguish intentional behavior from historical accident. See Brownfield engineering strategy.

Strangler-fig migration

The pattern of replacing a legacy system piece by piece, with new parts running alongside the old behind a facade, until the old system can be retired. Seam identification (finding where responsibilities can be cleanly extracted) is the critical skill. AI makes replacement cheaper but doesn't eliminate the need to find the seams. See Brownfield engineering strategy and Martin Fowler's original Strangler Fig Pattern.

Technical Debt Quadrant

Fowler's four-way categorization of technical debt by intent (deliberate vs. inadvertent) and discipline (prudent vs. reckless). The quadrant informs remediation strategy: prudent-inadvertent debt is often remediable, reckless-inadvertent debt is typically a rebuild candidate because the structure reflects ignorance that later knowledge cannot unwind in place. See Brownfield engineering strategy.

Seam identification

The practice of finding places in a legacy codebase where responsibilities can be cleanly extracted for strangler-fig migration. Popularized by Michael Feathers in Working Effectively with Legacy Code. The critical skill that determines whether a strangler-fig approach produces one cleaner system or two coupled ones.

Black Box to Blueprint techniques

Five reverse-engineering techniques (Fowler) for opaque legacy systems: UI-layer reconstruction, change data capture, server logic inference, binary archaeology, and progressive multi-pass enrichment. Two non-negotiable disciplines: triangulation (confirm every hypothesis across two independent sources) and lineage tracking (record the evidence every claim is based on). See Brownfield engineering strategy.


Operational Reality at T3 / R5

Five-stage operational unit

The recurring operational unit at Tier 3 / Rung 5: Context → Clarification → Execution → Validation → Recovery. Humans concentrate at the boundaries (front: specification and clarification; back: validation and recovery); the agent runs inside. The same shape applies across discrete-task domains regardless of substrate. See AI Lab § The Five Stages.

Two-boundary work

The structural pattern of Tier 3 / Rung 5 work: human attention concentrates at the front boundary (Context preparation + Clarification) and the back boundary (Validation + Recovery). Inside the loop, the agent runs without supervision. The shift is from per-line review to per-loop direction-and-judgment.

Discrete-task pattern

The category of work where AI operates as the execution layer: a clear unit (story, ticket, transaction, query, contract clause), verifiable outputs, gradable risk. Engineering, customer service, finance operations, legal review, and knowledge research fit. The framework's v3 patterns apply across this category. Continuous / creative / interpersonal work (sales, marketing creative, design, HR) requires a different framework — deferred to a future v4+ augmentation track.

Clarification dialogue

A discrete operational stage at Tier 3 / Rung 5 where the agent reviews the spec, exposes its assumptions, and asks calibrated questions before executing. Spec-kit's /speckit.clarify and Anthropic's plan mode + AskUserQuestion tool ship the pattern in production. Cost rule: clarification cost is bounded by minutes; correction cost scales with execution depth. See Specification Guide § Clarification dialogue.

Process design for AI

The discipline of designing constrained, phased workflows for AI to operate consistently within — distinct from prompt engineering and from spec-writing per se. Layer 5 of the AI Execution Standards. Distinguishes Tier 3 / Rung 5 from Tier 2 / Rung 4 work. See AI Execution Standards § Layer 5.

Process topologies (the six)

Anthropic's vocabulary for how the pipeline that runs a spec is structured: prompt chaining (sequential single-prompt steps with intermediate validation), routing (classify and dispatch to specialized prompts), parallelization (run independent subtasks concurrently), orchestrator-workers (lead agent decomposes and dispatches workers), evaluator-optimizer (generator paired with separate evaluator), and autonomous agents (open-ended exploration with tool use and feedback loops). Decision rule: start single-prompt; add complexity only when value-per-task justifies the token premium.


Risk-Graded Validation

Risk-graded validation gates

The principle that validation at Rung 5 is not monolithic — different action classes get different gates depending on blast radius, reversibility, and consequence. Three operational stances (HITL / HOTL / HOOTL) describe the gradient. A mature Rung 5 team operates all three concurrently, picking the gate per action class. See AI Lab § Risk-Graded Validation Gates.

HITL — Human-in-the-Loop

A validation stance where human approval is required before an AI action executes. Default for irreversible high-impact actions: financial transactions, production deploys, customer-facing communications, anything that creates legal or financial obligation. Throughput-bounded by human review capacity. See AI Lab § Risk-Graded Validation Gates.

HOTL — Human-on-the-Loop

A validation stance where the AI acts autonomously but the human monitors with intervention authority (kill switch, rollback, override). Default for reversible production work with strong eval coverage. Operationally fragile when treated as passive monitoring — vigilance fatigue makes nominal HOTL into compliance theater. See AI Lab § Risk-Graded Validation Gates.

HOOTL — Human-out-of-the-Loop

A validation stance where the AI acts within pre-defined boundaries with no real-time human involvement. Reserved for sandboxed, reversible work with strong tests and an agent reviewer on every artifact. Code merges to a well-tested repo with an agent reviewer typically run HOOTL. See AI Lab § Risk-Graded Validation Gates.

Operational Design Domain (ODD)

The conditions under which an AI agent is designed to function. Drawn from SAE J3016 (driving) as the cleanest analog. Outside the ODD, the agent makes no claims; the gate falls back to human. Defining the ODD is part of process design — what tools the agent has, what data it can access, what actions it can take. See AI Lab § Risk-Graded Validation Gates.

Agent-as-reviewer

The pattern of pairing a generator agent with a separate evaluator agent (different context, sometimes a different model) that reviews the output before merge or commit. Now the production default for code review (CodeRabbit, Graphite Diamond, Greptile, GitHub Copilot review) and being adopted in customer service, document processing, and other discrete-task domains. Replaces synchronous human review at scale because the cost-per-merged-unit math works in a way that human review at scale doesn't. See Engineering for unreliability § Agent-as-reviewer.

Permissions Owner

A named organizational role at production-grade AI systems. Accountable for what each agent can and cannot do, and for the validation gating tier (HITL / HOTL / HOOTL) per action class. Becomes load-bearing as soon as agents touch production systems with irreversible side effects. See AI Execution Standards § Organizational Roles.


Failure Modes and Recovery

Stuck-state protocol

The Rung 5 / Tier 3 procedure for handling a deliverable where the agent has hit a structural limit. Detect the stuck state (iteration limit reached, same failure pattern recurring, or subjective issue raised by a user); stop iterating; convene a recalibration session; re-spec or re-context; restart the loop from Context, not Execution. The Lab rule is explicit: don't take the work back manually. See AI Lab § Stuck-State Protocol.

AI bottleneck

The Tier 2.5+ failure mode where a deliverable misses its deadline because the agent has hit a structural limit (wrong direction, ambiguous spec, subjective edge case it cannot resolve alone), not because human capacity is short. Cemri et al. (Why Do Multi-Agent LLM Systems Fail?, 2025) found 41.8% of multi-agent failures fit this pattern. The leadership response is recalibration time, not work redistribution or added headcount. See Leading the Transformation § AI bottleneck.

Sycophancy

LLMs reliably defending wrong positions with confidence. Measured across Sharma et al. (2023), Wen et al. (2024), and OpenAI's hallucination work (2025). The literature genuinely disagrees on whether it's a tractable training fix or a structural artifact of RLHF; the framework's stance is to treat sycophancy as a structural concern for engineering purposes regardless of training trajectory. Build process safeguards (external signal, agent-as-reviewer, ground-truth retrieval, executable tests) into every loop. See Engineering for unreliability § Sycophancy.

Subjective edge case

A failure surfaced by a user, not by tests or monitoring: the AI got something qualitatively wrong (tone, intent, brand voice, customer alignment) but the technical output passed all checks. The dominant failure mode at higher maturity. Recovery is conversation, not patching — talk to the user, understand what they were trying to accomplish, update the spec or context. See Engineering for unreliability § Subjective edge cases.

Recalibration vs debugging

Two operationally distinct responses when the AI is wrong. Recalibration rebuilds the agent's understanding via fresh context, re-articulated spec, or multi-perspective brainstorm. Debugging fixes the artifact the agent produced. The literature on intrinsic self-correction is unanimous that a model that committed to a wrong direction will not reliably notice on its own — which means most non-trivial T3 / R5 failures are recalibration problems disguised as debugging problems. See Engineering for unreliability § Recalibration vs debugging.


AI Economics at Maturity

Cost per unit of output

The Level 3 measurement metric that replaces "time saved by AI" — cost per merged PR, cost per resolved ticket, cost per processed transaction, cost per customer served. The unit varies by domain; the principle is consistent: total AI spend without a denominator is meaningless at maturity. See Business Case § AI economics at maturity.

AI gross margin

The ratio of value produced to inference spend at the team or business level. Application-layer AI businesses run at 40–55% gross margin against 70–90% for traditional SaaS — a structural gap because inference is a variable cost that scales with usage. Whether the gap closes over time is contested; the floor is real and AI-native businesses must plan around it. See Business Case § AI economics at maturity.

Token economics

The discipline of measuring AI as production infrastructure: cost per task, cost per merged unit, agent throughput per dollar, AI gross margin. Replaces "time saved" as the binding metric at Level 3. Per-token costs are falling 10–40× per year, but per-task costs are often rising because reasoning models, agent loops, and longer contexts consume 10–100× the tokens of one-shot completions (Jevons paradox applied to inference). See Business Case § AI economics at maturity and AI Lab § Token Economics.


← Back to home · The reference framework · AI Execution Standards