Portable governed artifact

AI Capability Discipline v0.9

This portable HTML is an offline reading surface of the governed knowledge package. The durable product is the package, not this website, portal, or a future chatbot interface. Core content, styling, source identity, receipt linkage, and search are embedded in this file.

Versionv0.9
Posturepublic-site edition
BoundaryThe portable export is not the release-of-record. The preserved v0.9 package remains the release-of-record.
Source, receipt, and export metadata
Artifact IDai-capability-discipline
Versionv0.9
Classificationpublic/company-agnostic
Stabilitystable principle
Source Pathsource/AI_Capability_Discipline_v0.9.md
Public Site Pathdocs/html/03_ai_capability_discipline_v0.9.html
Portable Export Pathdocs/downloads/ai-capability-discipline-v0.9-portable.html
Release Posturepublic-site edition
Release-of-Recordreleases/v0.9/
Site Revision84c7a0c6979dcd58b1df651c2784b56a2f14e8c3
Generated2026-06-21T11:23:43Z
Hash Receiptdocs/downloads/PORTABLE_HTML_FILE_HASHES.sha256
Version NotePublic HTML title and portable export use v0.9 while the preserved release-of-record remains unchanged under releases/v0.9/.
Offline ScopeCross-artifact navigation may require the repository or public-site bundle. This file preserves the artifact content, metadata, and provenance without depending on live GitHub Pages assets for core rendering.
AI Capability Discipline v0.9 | controlled-sharing candidate | not final policy

AI Capability Discipline

Stop Building AI Theater. Build Capability.

From Magic Thinking to Governed, Measurable, Maintainable AI Systems. Prompts, agents, skills, tools, and evals are components. Capability requires intent, source authority, context discipline, schemas, validation, observability, change control, sustainment, and measurable value.

Governed capabilitySource authoritySchema-firstEval validityHarness lifecycleProvenance-backed
AI Capability PlaybookPlaybook HomePlaybook MapShared FoundationsAI Capability DisciplineMLL WESSEnterprise Architecture Review AssistantPrompt Packs
AI Capability Discipline v0.9

AI Capability Discipline

Stop Building AI Theater. Build Capability.

From Magic Thinking to Governed, Measurable, Maintainable AI Systems.

Controlled-sharing candidate. Field guidance, not final company policy.

This package is an operating discipline reference for leaders, architects, governance reviewers, and practitioners working on correctness-matters AI. It does not approve tools, data classes, runtime paths, or production use cases.

What capability requires
  • Intent, source authority, and data boundaries
  • Context, schemas, harnesses, and valid evals
  • Human review, observability, change control, and sustainment
  • Measured business value, not demo theater

1. Executive brief

AI is not magic. AI is a system capability candidate, and sometimes it is not the right answer. The first question is whether AI should be used at all. If it is used, the work must be shaped by intent, source authority, data boundaries, context management, schemas, workflows, tool permissions, feedback loops, evals, human review, telemetry, governance, change control, sustainment ownership, run-cost realism, and measurable outcome.

The key leadership reset is simple: model access is not capability access. A better model can reduce friction, but it cannot define the business outcome, approve the data path, decide source authority, validate the workflow, make deterministic work probabilistic without consequence, own sustainment, or absorb accountability when the output is wrong.

Five things leaders should stop approving

Stop approving Replace with
Demos as proof of capability Evidence-backed pilot gates
Prompt reuse as operating discipline Harness, schema, eval, telemetry, and ownership
Tool approval as use-case approval Tool, data, workflow, and governance routing checks
Human-in-the-loop as decorative safety Reviewer authority, evidence, queue, override, and audit model
Evals that score polish Evals that measure business correctness, risk, and safe failure

Leader approval stop rule

If the team cannot explain the outcome, data boundary, source authority, schema, eval validity, human review model, telemetry, sustainment owner, and stop condition, approve discovery only. Do not approve broad pilot, production routing, autonomous execution, or business-process dependency.

2. Executive mental model reset

This section is deliberately blunt because leaders often see the visible model output and miss the system underneath it. The replacement models turn AI from a magic tool story into an operating-model conversation about controls, evidence, ownership, and accountable decisions.

Bad mental model Replacement model What leaders should ask
AI is magic AI is a probabilistic system component What controls make it reliable?
Better model means solved Better models reduce friction but do not define accountability What remains outside the model?
We already have Copilot Tool access is not use-case approval Which data and workflow are approved?
Give me the prompt A prompt is only one expression of a task What is the operating pattern?
We built an agent An agent is a component, not a capability What workflow, governance, telemetry, and owner exist?
The demo worked A demo proves possibility, not reliability What happens on edge cases and missing evidence?
The eval passed Eval success matters only if the eval measures the right thing What does the eval actually prove?
Human-in-the-loop means safe Human review works only with authority, evidence, time, and override workflow Who can override and what is captured?

3. Capability equation

The equation is not math for decoration. It is a dependency map. Missing any major term does not mean the work is useless, but it does mean the work is not yet a governed capability and should be treated as discovery, prototype, or bounded pilot.

AI Capability =
Clear Intent
+ Approved Data Path
+ Source Authority
+ Workflow Fit
+ Schema Contracts
+ Harnesses
+ Rubrics
+ Valid Evals
+ Human Review
+ Tool Permissions
+ Observability
+ Governance
+ Sustainment Ownership
+ Measured Business Outcome

A prompt, agent, skill, rubric, or eval can be useful. None of them becomes a capability until the full equation is credible enough to survive real work.

4. Demo-to-capability gap

Demos compress uncertainty into a polished moment. This table separates what a demo can legitimately prove from what it cannot prove, so leaders do not confuse plausibility with readiness or adoption enthusiasm with operational evidence.

Demo-to-capability ladder
A polished demo is evidence of possibility. It becomes a governed capability only as the team adds contracts, review, evals, telemetry, and operational ownership.
DemoPlausible output in one scenario
Prompt artifactRepeatable interaction pattern
Reusable harnessNamed inputs, outputs, and review path
Structured workflowStates, checkpoints, and human review
Eval-backed assistantFixtures, rubric, and negative controls
Governed pilotApproved data route, telemetry, and stop conditions
Operational capabilitySupport, monitoring, release discipline, and adoption
Leadership question: which control is still missing before this use case earns the next rung?
Demo proves Demo does not prove
The model can produce a plausible answer The answer is correct, current, supported, or useful
The tool can call an API API use is approved, safe, auditable, or reversible
Users were impressed Users will adopt it under real workflow constraints
One scenario worked Edge cases and failure modes are controlled
Output looked polished The output measured the right outcome
The prototype was fast Sustainment, cost, telemetry, and governance are feasible
A tool is licensed The data path and use case are approved

5. Distribution status and policy boundary

This manual is field guidance, not final policy. It can be used to shape intake, architecture review, governance discussion, and practitioner learning. It must not be interpreted as:

  • tool approval,
  • data-class approval,
  • production approval,
  • GxP or regulated-use approval,
  • external SaaS approval,
  • security exception approval,
  • autonomous agent approval,
  • replacement for formal AI Governance review.

When this manual conflicts with named internal policy, the named policy wins. When the policy is unknown, mark the item as requires owner confirmation rather than inventing approval, because apparently optimism is still not an access-control model.

6. Tool and data boundary matrix, owner-validation required

The matrix below is a field template requiring owner validation before policy or operational use. It is intentionally conservative. Replace placeholders with confirmed internal policy before broad distribution.CLM-014

Tool surface Personal learning with public or synthetic data Company data Confidential or proprietary data Regulated, GxP, PHI, PII, security-sensitive data Business-process use Approval route
Approved AI for All chat Usually allowed within policy Allowed only by approved data class Depends on policy Not assumed allowed No, unless all no-review conditions hold AI Governance if any trigger is false
Internal enterprise GenAI chat Usually allowed within policy Depends on approved data boundary Depends on policy Requires explicit confirmation Depends on impact AI Governance if workflow or data triggers apply
Copilot Studio or enterprise agent builder Learning and team prototyping where approved Use-case approved only Use-case approved only Requires explicit review Yes, if governed Team, function, or enterprise governance path
Azure or AWS approved runtime Not a casual user surface Use-case approved only Use-case approved only Requires explicit review Yes, if governed SDLC applies Formal architecture, security, privacy, compliance, AI Governance
GitHub Copilot or approved SDLC assistant Only where assigned and approved Depends on repo and policy Depends on policy Not assumed allowed No business-process automation by default SDLC and AI Governance as applicable
Claude Code, Codex, Cursor, Antigravity, external SaaS coding tools Public or synthetic learning only unless approved Not assumed allowed Not assumed allowed Not allowed unless explicitly approved Not approved by default Explicit approval required
Local or personal tools Public or synthetic learning only Not assumed allowed Not assumed allowed Not allowed Not approved Explicit approval required

Bright-line rule

Approved tool access does not approve the use case, data class, retention model, logging path, connector action, workflow impact, or production use.

7. Enterprise routing model

Use enterprise routing to separate casual productivity from governed business capability.

Work type Likely path Required discipline
Individual productivity AI for All, approved chat, approved assistant Stay within approved data and output boundaries
Small group experiment BUILD path, limited sharing, synthetic or approved data Scope, owner, data boundary, known limitations
Pre-configured business workflow USE approved agent or platform capability Confirm data, audience, support, and governance triggers
Custom business-process AI REQUEST or Custom Built AI path PRD, source authority, schemas, evals, HITL, telemetry, governance
Regulated, GxP, privacy-sensitive, or decision-impacting workflow Governance first Formal review before tooling or data processing
Production or scaled capability Governed SDLC and operational ownership Release gates, runbook, monitoring, support, change control

8. Capability readiness model

The readiness ladder gives teams a shared vocabulary for maturity. It should be used as a routing tool, not as a vanity score. The practical question is always what proof is required to move up one level without skipping governance, telemetry, or ownership.

Level State Meaning Minimum next proof
0 Idea Interesting but not shaped Problem statement and user need
1 Prompt artifact One-off model interaction Reusable harness candidate
2 Reusable harness Repeatable prompt/instruction pattern Input/output contract and review path
3 Structured workflow Defined inputs, outputs, states, and human review Eval fixture set and evidence rules
4 Eval-backed assistant Tested against fixtures and rubrics Pilot charter, data approval, telemetry plan
5 Governed pilot Approved users, data path, evals, review, and telemetry Runbook, support model, release criteria
6 Operational capability Supported, monitored, versioned, adopted Scaling plan and reuse governance
7 Scaled enterprise capability Integrated, reusable, governed, measured, continuously improved Portfolio governance and continuous eval operations

8.1 Promotion gate matrix

Promotion between Levels 3 and 6 should be treated as an evidence gate, not a naming preference. The matrix below defines minimum evidence floors for field guidance. Meeting the floor does not bypass governance, architecture, security, privacy, compliance, or owner approval.

Gate area Level 3 to 4 Level 4 to 5 Level 5 to 6
Schema validity Input, output, and evidence fields are explicit, and sample outputs validate against the declared schema. Pilot schemas cover normal, exception, and review states, with no unresolved field drift across pilot fixtures. Operational schemas are versioned, change-controlled, and released with backward-compatibility or migration handling.
Fixture coverage Starter fixtures cover golden path, missing evidence, conflicting source, ambiguous request, and unsafe request behavior. Pilot fixtures cover top failure paths, reviewer overrides, and recent regressions seen in trial use. Regression suite is refreshed from incidents, source changes, model changes, and operating drift.
Rubric calibration A domain reviewer and builder align on pass, fail, and escalation labels for the starter fixture set. Pilot reviewer pool calibrates against the fixture set and records how disagreement is resolved. Calibration repeats on a defined cadence and after rubric, model, source, or workflow changes.
Reviewer pool A named reviewer can reject, request evidence, or escalate. Pilot reviewer pool has primary and backup coverage with queue ownership. Operational reviewer coverage matches hours, expected volume, and escalation obligations.
Approved data route Only public, synthetic, or otherwise approved data enters the eval-backed assistant path. Pilot data route, logging path, retention path, and prohibited data classes are explicitly approved for pilot scope. Operational data route is documented per source class and monitored for drift or boundary violations.
Stop conditions Missing-evidence, unsafe-action, and overreach stop conditions are explicit in harness or reviewer guidance. Pilot stop conditions include false-negative, boundary-violation, and override-spike triggers with pause authority. Stop conditions are wired to operational pause, rollback, or routing controls.
Telemetry Run start, output, evidence state, reviewer action, and stop-condition events are defined. Pilot telemetry proves fixture outcomes, override rates, cycle time, and boundary violations. Operational telemetry tracks quality, cost, adoption, drift, and incident correlation.
Runbook Reviewer instructions exist for how to run the workflow and capture findings. Pilot runbook covers startup, failure handling, retriage, source refresh, and manual fallback. Operational runbook covers release, rollback, monitoring, and handoff expectations.
Incident path Harm or error cases have an escalation contact, even if operational incident handling is not yet active. Pilot incident path names who pauses the pilot, who reviews the event, and how evidence is preserved. Operational incident path integrates with the owning team's incident and post-incident review flow.
Support owner A named builder or owner is accountable for the assistant and its artifacts. Pilot support owner accepts source, rubric, and fixture maintenance responsibilities. Operational support owner, backup, and service boundaries are documented.
Adoption proof At least one target workflow and success measure are named. Pilot adoption proof shows real reviewers using the workflow and returning structured feedback. Operational adoption proof shows repeat usage, decision uptake, and a maintained value signal.

9. Capability formation lifecycle

Intent
→ Product thesis
→ Product requirements
→ Value classification and acceptance line
→ Domain model
→ Source authority model
→ Data contract
→ Schema contracts
→ Harness
→ Rubric
→ Eval suite
→ Workflow
→ Human review model
→ Telemetry and observability
→ Governance route
→ Sustainment model
→ Field validation
→ Operational capability

The lifecycle is not paperwork theater. It exists because without these layers, a team can build a very convincing wrong thing.

10. Intent and outcome management

Intent is valid only when the problem, user, workflow, outcome, source feasibility, data permission, failure consequence, and ownership are explicit.

Value must be classified before it is judged. A proposal can create real value and still sit below the current acceptance line if the wrong owner benefits, the evidence is weak, or the current business climate requires direct savings.

Gate Test Failure signal
Problem clarity Specific, recurring, material, and owned Vague productivity promise
Outcome specificity Observable baseline and target “Make work easier” with no measure
Value classification Claimed value class, decision owner, benefiting owner, and evidence owner are explicit Real value claim with no accountable owner or proof path
Acceptance line fit Current business climate and minimum accepted threshold are explicit Value is real but below the current acceptance line
User fit Real user job and workflow entry point Solution looking for a workflow
Decision relevance Output drives a real decision or action Output is interesting but unused
AI appropriateness AI compared to no AI, rules, search, workflow, dashboard, deterministic automation Agent-first thinking
Source feasibility Required sources exist and have authority Model asked to infer missing authority
Data permission Required data can be processed, logged, retained, and reviewed in selected tool path Tool approval confused with data approval
Failure consequence Wrong, missing, stale, or overconfident output is analyzed No safe failure path
Human accountability Reviewer authority and override workflow exist HITL slogan, no action model
Sustainment realism Owner, cadence, funding, and release model exist Demo owner disappears after launch

11. Product requirements for AI capabilities

A serious AI capability needs product requirements, not just prompts.

Requirement area Required content
Target users Roles, responsibilities, permissions, review authority
User jobs What task or decision is improved
Business outcome Baseline, target, value hypothesis, value class, benefiting owner, evidence owner, measurement method
Acceptance line Decision owner, current threshold, below-line handling, exception path if needed
Non-goals What the system must not do
Inputs Data classes, artifacts, source systems, owners, refresh cadence
Outputs Decisions, recommendations, drafts, findings, actions, confidence limits
Decision boundaries What the model may suggest versus what humans must decide
Failure modes Missing evidence, stale source, conflict, hallucination, tool failure, privacy risk
Acceptance criteria Functional, quality, governance, telemetry, and support thresholds
Operating model Owner, support path, review cadence, release and change control

12. Solution ideation matrix

The point of this matrix is to stop agent-first design. Many problems are better served by deterministic rules, workflow automation, better source hygiene, reporting, or search before any agentic runtime is justified.

Before choosing an agent, compare options. The best AI architecture sometimes uses less AI. Horrifying for hype decks, useful for reality.

Option Best fit When to reject
No AI Problem is rare, low value, or unclear Recurring workflow has measurable burden
Search or RAG Find and summarize trusted content Task requires actions or structured decisions
Deterministic rules Clear policy or classification logic Ambiguous interpretation required
Workflow automation Known steps and approvals Complex language interpretation required
Dashboard or report Visibility and monitoring User needs drafting, reasoning, or orchestration
Chat assistant Exploration, synthesis, first-pass support Needs durable workflow or audited action
Agentic workflow Multi-step tasks with tools, approvals, and feedback No approved tools, data path, evals, or owner
Integrated capability Business process with sustained ownership No measurable outcome or support model

13. Schema-first capability design

Schemas are where vague AI intent becomes inspectable. They let teams validate input, output, evidence, exceptions, telemetry, and human review records instead of relying on prose promises and well-formatted uncertainty.

If a team cannot define valid input, output, evidence, decision states, exceptions, telemetry, and review records, it is not ready to build beyond exploration.

13.1 Minimal output schema example

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "AIReviewFinding",
  "type": "object",
  "required": ["finding_id", "status", "claim", "evidence_state", "human_review_required"],
  "properties": {
    "finding_id": {"type": "string"},
    "status": {"enum": ["supported", "gap", "not_evidenced", "conflicting_evidence", "requires_confirmation", "requires_escalation", "not_applicable"]},
    "claim": {"type": "string"},
    "evidence_state": {"enum": ["cited", "missing", "conflicting", "not_applicable"]},
    "evidence_refs": {
      "type": "array",
      "items": {"type": "string"}
    },
    "source_authority_level": {"enum": ["canonical", "governed_reference", "submission_evidence", "derived_analysis", "historical", "prohibited"]},
    "risk_severity": {"enum": ["low", "medium", "high", "critical"]},
    "human_review_required": {"type": "boolean"},
    "recommended_action": {"type": "string"}
  }
}

13.2 Schema failure example

Failure Why it blocks readiness
Finding has no evidence state Unsupported claims cannot be separated from supported claims
Finding has no source authority level The model may treat all retrieved content as equal
Finding has no human review flag Governance-sensitive cases may appear resolved
Finding has no status enum Outputs cannot be reliably evaluated or aggregated

14. Source authority model

Source authority must be explicit and versioned.

Source class Example Can support findings? Required handling
Canonical Approved policy, official standard, validated technology catalog Yes Cite source and version
Governed reference Architecture pattern library, approved playbook Yes, with context Cite source, owner, version
Submission evidence Submitted diagram, PRD, vendor document Yes for what was submitted Mark as submission evidence, not policy
Derived analysis Model extraction or summary No by itself Must cite underlying evidence
Historical Prior decisions, older package, retired architecture Only with date and context Check freshness and applicability
Stale Deprecated standard, superseded deck No Flag as stale
Prohibited Unapproved note, external blog, unverifiable model output No Do not use as evidence

15. Eval validity and calibration

Evals are not automatically trustworthy because they have scores. They are trustworthy only when they measure the intended capability, cover the right failures, correlate with expert judgment, and catch safe-failure behavior when evidence is missing or contradictory.

Evals can be beautifully wrong. A rubric can score the wrong behavior consistently. That is not quality. That is automated self-deception with columns.

Eval lifecycle and regression loop
The eval surface is a loop, not a single release gate. Every model, source, tool, schema, or workflow change should push the team back through fixtures, review, and evidence capture.
Objective and source hierarchyDefine what the capability must prove and which sources govern it
Output contract and schemaMake the result inspectable instead of purely conversational
Fixture set and rubricCover golden, incomplete, conflicting, ambiguous, and regression cases
Eval run and human reviewCompare model behavior against the intended capability
Telemetry and validation receiptCapture agreement, overrides, gaps, and release evidence
Change event detectionModel, tool, source, schema, or workflow shifts invalidate assumptions
Rerun and recalibrateRefresh fixtures, rubric anchors, and thresholds before further promotion
Regression is not a separate afterthought. It is the mechanism that keeps a previously useful capability from drifting into confident failure.
Validity type Question Failure mode
Construct validity Does the eval measure the actual capability? Scores format instead of decision usefulness
Criterion validity Does eval performance correlate with expert review? Model passes but experts reject output
Coverage validity Does the suite cover normal, edge, ambiguous, adversarial, missing-evidence, and regression cases? Happy-path-only testing
Risk validity Are high-consequence failures overweighted? Average score hides critical false negatives
Regression validity Does the eval catch degradation after model, prompt, source, schema, or tool changes? Change ships with hidden behavior drift
Operational validity Does eval success predict workflow usefulness? Output passes tests but users ignore it
Reviewer reliability Would qualified reviewers score similarly? Rubric is subjective theater
Negative-control validity Does the system fail correctly when it should? Missing evidence becomes invented confidence

15.1 Eval calibration protocol

  1. Select at least six fixtures: golden, incomplete, conflicting, adversarial, ambiguous, and regression.
  2. Have two or more qualified reviewers independently score expected outputs.
  3. Identify disagreements and update rubric anchors.
  4. Define high-risk false negative stop conditions.
  5. Define minimum release threshold.
  6. Run the suite whenever prompt, model, schema, source map, tool contract, or runtime changes.
  7. Record reviewer agreement, override rate, and unresolved disagreements.

15.2 Starter fixture matrix

Fixture Purpose Expected behavior
Golden Fully evidenced, low ambiguity Supported findings, minimal escalation
Incomplete Missing required input not_evidenced, request evidence
Conflicting Two sources disagree conflicting_evidence, human confirmation
Adversarial User claims approval without evidence reject unsupported claim
Ambiguous Unclear data class or ownership requires_confirmation
Regression Previously fixed failure no reintroduction of failure

16. Intent-to-eval traceability

Each eval assertion should trace to a business outcome and the claimed value class, not just a prompt instruction.

Business outcome User need Product requirement Domain model Source authority Data contract Schema Harness rule Rubric dimension Eval fixture Telemetry metric Human decision
Reduce incomplete architecture submissions Architect needs missing evidence identified early Assistant must flag missing security model Submission, artifact, control, evidence Security baseline is canonical No sensitive artifacts in unapproved tools finding.status enum includes not_evidenced If security evidence missing, do not infer Evidence correctness incomplete-security-model-001 not_evidenced correctness rate Reviewer requests evidence or escalates

17. Human review and override model

Human-in-the-loop is not a safety feature unless the loop has authority, evidence, time, context, actions, and logging.

17.1 Review state model

Draft generated
→ Needs evidence
→ Requires confirmation
→ Accepted / Edited / Rejected / Overridden
→ Escalated if needed
→ Decision packet prepared
→ Decision recorded
→ Feedback loop reviewed

17.2 Override payload schema

{
  "override_id": "OVR-0001",
  "finding_id": "FND-0007",
  "reviewer_role": "enterprise_architect",
  "original_status": "supported",
  "override_status": "requires_escalation",
  "rationale": "Source cited is submission evidence, not canonical policy.",
  "evidence_refs": ["SRC-SEC-STD-2026-01"],
  "action_taken": "Escalated to security architecture owner",
  "requires_fixture_update": true,
  "timestamp_utc": "2026-06-17T17:00:00Z"
}

18. Technical annex: repo-backed package layout

The repo structure is included because durable AI capability work eventually outgrows chat. Files, schemas, fixtures, receipts, and governance records need a stable place to live if teams want repeatability and reviewability.

Serious AI work should move from chat to files when it needs versioning, tests, schemas, fixtures, reproducibility, or multiple maintainers.

ai-capability/
  README.md
  PRODUCT_REQUIREMENTS.md
  GOVERNANCE_ROUTING.md
  DATA_BOUNDARY.md
  SOURCE_AUTHORITY_MAP.yaml
  harnesses/
    review_harness.md
  schemas/
    finding.schema.json
    telemetry-event.schema.json
    override.schema.json
  rubrics/
    review_rubric.md
  evals/
    fixtures/
      incomplete-security-model.json
      conflicting-source-authority.json
    expected/
      incomplete-security-model.expected.json
    tests/
      test_eval_assertions.py
  tools/
    mcp-tool-contracts/
      architecture-catalog.lookup.yaml
  receipts/
    validation-receipt-template.md
  docs/
    HUMAN_REVIEW_WORKFLOW.md
    OBSERVABILITY_CONTRACT.md
    RELEASE_GATES.md

19. Technical annex: programmable eval assertion

import json

REQUIRED_STATUS = "not_evidenced"

with open("evals/outputs/incomplete-security-model.output.json", "r", encoding="utf-8") as f:
    output = json.load(f)

findings = output["findings"]
security_findings = [f for f in findings if f.get("control_id") == "SEC-001"]

assert security_findings, "SEC-001 finding is missing"
for finding in security_findings:
    assert finding["status"] == REQUIRED_STATUS, "Missing security evidence must not be treated as supported"
    assert finding["human_review_required"] is True, "Missing security evidence requires human review"
    assert finding.get("evidence_refs", []) == [], "Missing evidence should not invent citation references"

20. Technical annex: MCP and tool execution contract

Every tool exposed to an agent should have a contract. Tool access is where language generation becomes operational risk.

tool_id: architecture_catalog.lookup
owner: enterprise_architecture
purpose: lookup approved technology status and reference patterns
data_classes_allowed:
  - public
  - internal_non_sensitive
actions_allowed:
  - read_catalog_entry
  - search_reference_pattern
actions_prohibited:
  - modify_catalog
  - approve_exception
  - change_source_authority
identity_model: managed_identity_or_service_principal
auth_scopes:
  - catalog.read
egress_allowed: false
input_schema: schemas/catalog_lookup_input.schema.json
output_schema: schemas/catalog_lookup_output.schema.json
audit_events:
  - tool.called
  - tool.result_returned
  - tool.error
rate_limits:
  per_minute: 60
human_approval_required_for:
  - exception_request
  - status_change
failure_behavior: return requires_confirmation and do not infer approval
rollback_behavior: not_applicable_read_only

21. Technical annex: CI/CD and release gates

Gate Required proof Blocks release if
Local harness validation Output validates against schema Schema invalid
Fixture regression Golden, incomplete, conflicting, adversarial, ambiguous, regression fixtures pass High-risk false negative appears
Source authority check Source map version is present and current Unknown source used as canonical
Tool contract check All tools have owner, scopes, schemas, logging, allowed actions Tool has unbounded action access
Security and privacy check Data classes and retention match approved path Data path unknown
Human review check Override workflow and decision state schema exist HITL is undefined
Observability check Run IDs, tool spans, eval results, cost, override events captured No traceability
Production promotion Runbook, support owner, SLO, incident path, rollback defined Sustainment owner missing

22. Technical annex: observability contract

Event Required fields Why it matters
ai.run.started run_id, user_id, capability_id, version Traceability
ai.context.loaded run_id, source_map_version, context_refs Source freshness
ai.tool.called run_id, tool_id, action, auth_scope, data_class Tool audit
ai.output.generated run_id, schema_version, model_version, harness_version Output provenance
ai.eval.completed run_id, fixture_set_version, pass_fail, failures Regression evidence
ai.human.override run_id, finding_id, original_status, override_status, rationale Feedback loop
ai.escalation.required run_id, trigger, owner, due_date Governance action
ai.run.completed run_id, cost, latency, tokens, outcome_status Value and FinOps

Required metrics

  • high-risk false negative count,
  • unsupported claim rate,
  • not-evidenced correctness rate,
  • override rate,
  • reviewer disagreement rate,
  • escalation rate,
  • source freshness age,
  • cost per run,
  • latency per run,
  • adoption and repeat-use rate,
  • incident count.

23. Technical annex: FinOps and execution limits

Agentic systems need explicit execution limits.

Control Example
Token budget Stop or escalate when run exceeds approved token budget
Tool-call budget Max 25 tool calls per run unless reviewer approves extension
Retry limit Max 2 retries per failed tool action
Loop limit Max 3 plan-execute-check loops before human review
Timeout Stop long-running operations after defined threshold
Cost alert Alert when cost per run exceeds expected band
Escalation Escalate if repeated failures indicate bad harness, source, or tool contract

24. Technical annex: parallel execution safety

Parallel agents increase throughput and risk.

Risk Required control
State corruption Worktree, branch, sandbox, or transaction isolation
Race condition Locking, idempotency, queue ownership
Duplicate action Idempotency key and action ledger
API rate exhaustion Rate limits and backoff
Conflicting edits Diff review and merge gate
Unbounded cost Per-session budget and timeout
Hidden failures Central run log and tool-call spans
Production impact No production writes without explicit human approval

25. Harness lifecycle management

Agents are maintained systems, not launch-and-forget assets. The harness around the model has to be reviewed as sources age, tools change, workflows drift, model behavior improves, and the business changes its definition of useful work.

The agent is not the whole system. The harness is the workbench around the agent: sources, context, tools, permissions, prompts, schemas, evals, review flows, telemetry, and stop conditions.

v0.9 preserves the maintenance lens because a capability that worked last quarter can become unsafe, wasteful, or stale even when the model improves. That is the part many teams miss while they are busy admiring how quickly the agent can produce more work for humans to clean up. Charming little productivity trap.

25.1 Harness lifecycle thesis

Principle Meaning Risk if ignored
Harnesses live in motion Models, tools, sources, workflow, and business context change Yesterday's safe setup becomes today's drag or risk
Maintenance includes deletion More tools and more rules are not always better Tool bloat, permission sprawl, token waste, audit noise
Context is operational Context drives output, validation, and decisions Stale context becomes active misinformation
Model upgrades are change events Better models can make old harnesses misfit Stronger agents use weak boundaries faster
Proof must remain linkable Output must point to sources, records, spans, or logs Fluency outruns trust
Value must be rechecked A useful agent can become redundant or harmful Automation keeps producing work nobody needs

25.2 Maintenance cadence

Trigger Required review
Model version changes Model upgrade impact review
Workflow changes Intent, job, and state-model review
Source changes Source authority and context freshness review
Tool or connector changes Permission, action, and audit review
High override rate Eval, rubric, and source review
Cost spike FinOps and loop-limit review
Low adoption Value and workflow fit review
Incident or near miss Stop condition, blast-radius, and recovery review

26. Agents drift in two directions

Traditional systems mostly drift when requirements, dependencies, data, or integrations change. Agent systems drift in two directions at once: the world changes around them and the model changes inside them.

Drift direction Example Control response
World changes around the agent Workflow, source, ownership, terminology, or policy changes Refresh source authority, context, schemas, and eval fixtures
Model changes inside the agent Better reasoning, better tool use, better planning, stronger autonomy Reassess permissions, workflow constraints, tool count, stop conditions, and review load

26.1 Agents can break when models improve

A stronger model is not automatically safer. It can make weak harnesses fail faster and more convincingly.

Model improvement Harness risk Review question
Better reasoning Old rigid workflow becomes unnecessary drag Which rules should be simplified or removed?
Better tool use Broad permissions become more dangerous Which tools need tighter action contracts?
Better planning Agent creates plausible downstream work faster than humans can review Is reviewer throughput still sufficient?
Better context use Stale context becomes more influential Are current sources ranked and refreshed?
Better autonomy Weak stop conditions become more dangerous Are loop limits, cost limits, and escalation triggers explicit?
Better fluency Unsupported output becomes harder to detect Are citations, evidence spans, and negative controls enforced?

27. Tool pruning and harness simplification

The beginner instinct is to add. The maintenance instinct is to ask what should be removed.

More tools do not automatically create better agents. Every tool increases the action surface, ambiguity surface, permission surface, audit surface, cost surface, and maintenance burden. A tool must earn its place through observed value, controlled failure behavior, and measurable improvement.CLM-001CLM-004

27.1 Tool pruning decision rule

Keep the tool when Remove or disable the tool when
It is required for a defined job It is rarely used or only makes demos look powerful
It has an owner and allowed-action contract No owner can explain why it is needed
It improves measured outcome or reduces review burden It increases review burden or false confidence
It has clear permission, logging, and failure handling It can mutate state without sufficient approval or audit
It works inside approved data and runtime boundaries It crosses an unapproved data, network, or workflow boundary
It is covered by eval fixtures and negative controls It is invisible to test coverage

27.2 Harness simplicity review

Question Good answer
Which tools were used in the last 30 runs? Only tools that support the defined job
Which tools created errors, retries, or overrides? Problem tools have remediation or removal plan
Which instructions are obsolete? Obsolete rules are retired, not kept as prompt sediment
Which memory or context files are stale? Stale context is superseded, archived, or removed
Which controls block useful work? Controls are updated intentionally after risk review
Which actions still need human approval? High-risk actions remain bounded and reviewable

28. Context as control plane

Context determines what the model treats as signal, what it treats as authority, and what it is allowed to summarize or infer. Poor context architecture makes smarter models more dangerous because they can act more convincingly on stale or mis-ranked material.

Context is no longer background documentation. In an AI capability, context shapes behavior, answer boundaries, validation logic, rollout language, and runtime assumptions. Once context influences behavior, it needs code-grade governance.

Context authority stack
Authority depends on the question. Rule questions route to canonical documents. Current-state questions route to structured records. Generated output remains explanatory and cannot quietly outrank either.
Canonical authority

Highest-trust source for service, process, policy, and standard questions.

Structured source of truth

Current state, ownership, lifecycle, and workflow status questions route here.

Governed context

Glossaries, crosswalks, explainers, and transition notes support interpretation without replacing authoritative records.

Generated output

Summaries, explanations, and recommendations are advisory only and must cite supporting evidence.

Control rules

Source priority, freshness review, escalation, and non-inference boundaries govern every layer.

The stack is useful only if the system can say which layer answered the question and why lower-authority text did not win.

28.1 Context hierarchy

Context layer Role Failure mode
Canonical authority Highest-trust policy, SOP, standard, process source Stale or conflicting truth becomes model guidance
Governed context Glossaries, crosswalks, explainers, transition notes Explanatory layer quietly outranks canonical truth
Structured source of truth Current state, ownership, lifecycle records, workflow status Summary is mistaken for current state
Generated output Summaries, explanations, recommendations Fluent narrative masks missing evidence
Control rules Source priority, low-confidence escalation, non-authority boundaries Model flattens authority and guesses across gaps

28.2 Context failure modes

Misdiagnosis Actual root cause Correct control
Need bigger model Wrong source hierarchy Source authority map
Need more memory Stale or mixed context Context freshness review
Need more tools No structured source of truth Data contract and schema
Need more agents No boundary between policy, state, and summary Context architecture
Need longer prompt Ambiguous authority Task-scoped context selection

Capstone principle: do not ask the model to rescue a bad context system. That is not AI strategy. That is outsourcing confusion to a more fluent machine.

29. Advisory repository versus runtime control plane

This separation is central to the architecture thesis. A governed advisory repository can support reasoning, synthesis, and human-readable guidance. Runtime control planes require deterministic orchestration, permissions, state, audit, recovery, and bounded execution controls.

A governed advisory repository is not a deterministic runtime control plane.

The advisory repository governs context, truth boundaries, source hierarchy, advisory behavior, and human-readable synthesis. The runtime control plane governs orchestration, typed tools, permissions, durable workflow state, approval controls, audit, retry, recovery, and bounded action.

Advisory repository versus runtime control plane
The advisory repository helps the system think and explain. The runtime control plane decides what can execute, what state persists, and how approval, audit, and recovery are enforced.

Advisory repository

Governed context and source hierarchy
Human-readable synthesis and bounded reasoning support
Advisory receipts and traceable citations
No autonomous production action
bounded reasoning outputs

Runtime control plane

Orchestration and typed tools
Permissions, approvals, and durable workflow state
Audit, retry, recovery, and bounded action
Operational monitoring and change control
A control plane can consume governed knowledge, but governed knowledge by itself does not create runtime enforcement.

29.1 Separation rule

Concern Advisory repository Runtime control plane
Knowledge governance Yes Consumes governed knowledge
Reasoning support Yes Uses bounded reasoning outputs
Source hierarchy Yes Enforces source-derived rules where needed
Human-readable synthesis Yes Logs and routes outputs
Orchestration No Yes
Permissioning Guidance only Yes, mechanical enforcement
Durable workflow state No Yes
Execution and recovery No Yes
Audit and event logging Limited advisory receipt Full runtime events
Bounded action No autonomous production action Governed action only where approved

29.2 Expansion threshold

Do not move from advisory repository to control-plane repository because agents are fashionable. Move only when the use case requires durable state, typed tools, explicit approvals, audit, recovery, and bounded action.

30. Model upgrade impact review

Treat a model upgrade like a capability change event.

Review area Question
Job scope Does the agent's job need to expand, narrow, or remain unchanged?
Tool reach Are existing tool permissions still appropriate?
Review load Does the stronger model create more work than reviewers can absorb?
Source behavior Does the new model use context differently enough to require fixture updates?
Eval suite Do current fixtures still cover likely failure modes?
Stop conditions Are cost, loop, retry, and escalation limits still safe?
Output trust Are evidence and citation requirements still enforced?
User adoption Does improved capability change the expected workflow or training?

31. Harness maintenance review

Run this review before pilot expansion, after model changes, after source changes, after tool changes, and at a defined recurring cadence.

Check Meaning Enterprise control question
What is it eating? Sources, context, files, memory, and data consumed Are sources current, authoritative, and correctly ranked?
What can it reach? Tools, APIs, systems, records, actions Are permissions still appropriate for model capability and business risk?
What is its job? Current role and task boundary Has scope changed intentionally or through capability creep?
What proof must it return? Evidence, citations, spans, records, and logs Can humans verify the output and audit the action trail?
Is it still valuable? Value after review burden and cost Keep, rebuild, narrow, expand, or retire?

31.1 Maintenance actions

Finding Action
Tool not used or increases errors Remove, disable, or quarantine tool
Context stale or conflicting Supersede, archive, or route to owner confirmation
Agent job changed silently Update PRD, harness, schema, eval, and training
Reviewer overload Narrow output, reduce autonomy, add triage or sampling
High false negatives Stop expansion and repair eval/control/source logic
Cost spike Enforce budgets, loop limits, and escalation
Low value Retire or rebuild rather than continue ceremonial automation

32. Agent retirement and rebuild criteria

A serious AI operating model needs a graceful way to stop using an agent. Keeping a stale agent alive because it was once exciting is how technical debt learns to talk.

Condition Decision
Source authority cannot be maintained Retire or restrict to non-authoritative use
Workflow changed beyond harness design Rebuild harness and fixtures before further use
Model upgrade invalidates old constraints Run impact review and revise controls
Tool permissions cannot be governed Disable tool use
Review burden exceeds value Narrow or retire
High-risk false negative appears Stop expansion, repair, and revalidate
Users do not use output Reassess intent and workflow fit
Better platform capability exists Migrate or retire custom harness

33. Worked example C: Lifecycle Lens MVP capability trace

Lifecycle Lens is included as a worked example, not as a first-class pillar of the manual. It shows how advisory-only posture, Microsoft-native tooling, structured lifecycle truth, source-priority rules, and no-mutation boundaries translate the framework into an actual enterprise use case.CLM-006CLM-007

Lifecycle Lens is a useful v0.9 example because it is not trying to become an all-powerful agent. It is intentionally bounded: advisory first, visibility first, governance first, automation later.

33.1 Intent

Improve lifecycle visibility and accountability across forecasting, planning, ordering, delivery, deployment, replacement, decommissioning, ownership, stage aging, stuck-work identification, reminders, and escalation visibility.

33.2 Business outcome

Outcome Measurement candidate
Stage ownership is clearer Percent of lifecycle items with named stage owner
Stuck work is surfaced earlier Aging threshold breach detection rate
Decommission accountability improves Decommission-stage aging and closure trend
Reporting friction decreases Manual coordination hours reduced
Advisory quality improves User acceptance and override rate
Governance boundary preserved Zero autonomous endpoint mutation and no generated output outranking source truth

33.3 Platform and architecture path

The preferred MVP path is Microsoft-native where viable: Copilot or Copilot Studio for advisory access, Dataverse for lifecycle and planning data, Power Apps for operational tracking, Business Process Flow for stage progression, and Power Automate for reminders and escalations.CLM-005CLM-008

Layer Lifecycle Lens MVP role
Canonical documents Authoritative service and process guidance
Governed context Glossaries, service explainers, crosswalks, transition notes
Dataverse Structured lifecycle system of record for MVP tracking state
Power Apps Operational lifecycle tracking surface
Business Process Flow Deterministic stage progression model
Power Automate Reminders, escalations, notifications, and workflow glue
Copilot Studio Advisory access and controlled summaries where viable
Human review Accountability, exception handling, escalation, and approval

33.4 Authority boundary

Question type Highest authority
What is the service or process rule? Canonical document
What is the current lifecycle stage? Structured lifecycle record
Who owns the current stage? Structured lifecycle record
What is aging or stuck? Deterministic calculation over lifecycle state
What does the advisory agent explain? Source-grounded synthesis only
What can generated output decide? Nothing authoritative without human or governed workflow action

33.5 No-mutation boundary

Lifecycle Lens MVP must not perform autonomous endpoint action, direct endpoint mutation, privileged execution, silent policy exception, or execution-authoritative control-plane behavior. Generated output remains explanatory. Structured lifecycle data remains authoritative for current state.

33.6 MVP eval fixtures

Fixture Expected behavior
Stuck-stage visibility Identify items beyond aging threshold from structured data, not narrative guesswork
Stage owner query Return owner from lifecycle record or mark not_evidenced
Canonical process question Answer using canonical document and cite source
Conflict between summary and record Structured lifecycle record wins for current state
Unsupported endpoint action request Refuse or escalate, no autonomous mutation
Low-confidence process answer Mark requires_confirmation and route to human review
Reminder escalation test Trigger only through approved workflow rule, not agent improvisation

33.7 Pilot acceptance model

Acceptance area Evidence required
Architecture readiness Microsoft-native viability assessed honestly and fallback path defined
Source and data readiness Canonical documents, governed context, and lifecycle records separated
Advisory quality Answers cite sources and preserve non-authority posture
Workflow integrity Stage progression, reminders, escalations, and ownership visible
Role-aware access RBAC and least privilege tested
Auditability Workflow history and advisory outputs reviewable
Operational usefulness Target users confirm reduced coordination and better visibility
Boundary preservation No autonomous infrastructure mutation and no generated output outranking truth

33.8 Lifecycle Lens field-validation questions

  1. Is Microsoft-native delivery viable enough for the MVP?
  2. What should be configured versus custom built?
  3. What lifecycle entities, states, owners, aging logic, and history are required?
  4. How does Copilot Studio combine canonical documents and structured lifecycle data without flattening authority?
  5. Which actions must remain deterministic or human-owned?
  6. What telemetry proves the MVP improves visibility and accountability?
  7. Which conditions trigger escalation, rebuild, or retirement?

34. Worked example A: from bad prompt to governed harness

34.1 Bad prompt

Review this architecture and tell me if it is good.

Why it fails:

  • no target outcome,
  • no source authority,
  • no review dimensions,
  • no data boundary,
  • no evidence rule,
  • no output schema,
  • no missing-evidence behavior,
  • no human review path.

34.2 Better harness

Task: Perform a first-pass architecture evidence review for a synthetic AI assistant proposal.

Inputs allowed: synthetic proposal summary, synthetic architecture diagram text, approved synthetic source authority map.

Do not infer: approval status, data classification, GxP impact, security control existence, production readiness, ownership, funding, platform approval, or exception status.

Required output: JSON array of findings matching AIReviewFinding schema.

Rules:
1. Every material claim must cite evidence_refs or return not_evidenced.
2. If source authority conflicts, return conflicting_evidence.
3. If a data class is unclear, return requires_confirmation.
4. If production readiness is claimed without telemetry and support owner, return gap.
5. Final approval is prohibited. Human review is required for all findings.

Validation:
- output must validate against finding.schema.json,
- incomplete security evidence fixture must return not_evidenced,
- unsupported approval claim must fail automatic rubric rule.

34.3 Rubric excerpt

Dimension Pass Fail
Evidence grounding Each finding cites allowed evidence or marks missing evidence Finding asserts unsupported facts
Non-inference Sensitive facts are marked unknown or require confirmation Model infers approval, classification, or GxP status
Output contract JSON validates against schema Freeform answer or invalid enum
Human review Review required is explicit Output implies approval

34.4 Validation receipt excerpt

{
  "fixture_id": "unsupported-approval-claim-001",
  "expected_status": "requires_confirmation",
  "actual_status": "requires_confirmation",
  "result": "pass",
  "review_required": true
}

35. Worked example B: governed business-process AI capability

35.1 Use case

A business team proposes an AI assistant that summarizes architecture submissions and identifies missing evidence before formal review.

35.2 Capability trace

Lifecycle element Example
Business outcome Reduce incomplete architecture review submissions by 30 percent
User need Architects need missing evidence identified before review meetings
Product requirement Assistant flags missing security, data, integration, support, and governance evidence
Domain model Submission, artifact, evidence, control, finding, reviewer decision
Source authority Architecture checklist is canonical, submitted docs are evidence, model summary is derived
Data contract Synthetic or approved non-sensitive submissions only for pilot
Schema AIReviewFinding schema with status, evidence_state, source_authority_level
Harness Evidence-bound first-pass review with non-inference rules
Rubric Evidence grounding, completeness, missing-evidence correctness, escalation correctness
Eval fixture incomplete-security-model-001, conflicting-data-classification-001
Telemetry not_evidenced correctness, override rate, cycle time, missing evidence caught
Human decision Architect accepts, edits, rejects, requests evidence, or escalates
Governance route Governed pilot if shared beyond individual productivity or using business-process workflow
Sustainment owner EA governance owns control library and source map; platform owns runtime

35.3 Pilot entry criteria

  • first reviewer group named,
  • data class approved,
  • source authority map approved for pilot,
  • eval fixture set present,
  • human review workflow present,
  • telemetry events defined,
  • sustainment owner named,
  • stop condition defined.

35.4 Pilot stop conditions

  • high-risk false negative appears,
  • agent infers approval or data classification,
  • override rate exceeds agreed threshold,
  • data boundary is violated,
  • source authority is unresolved,
  • support owner is missing,
  • cost per review exceeds value hypothesis.

36. Practitioner lab and tool patterns

Practitioner patterns show how serious builders operate without mistaking external tools for approved enterprise execution paths. The durable lesson is not which tool is fashionable; it is how to use planning, isolation, permissions, feedback loops, tests, and evidence before allowing broader action.CLM-003CLM-004

Mandatory warning

External commercial tools, including Claude Code, Codex, Cursor, Antigravity, and similar systems, are not approved for company data by default. Use public or synthetic data unless an approved enterprise path explicitly permits company use.

Lab sequence

Stage Pattern Output
Q&A first Ask the agent to explain codebase, architecture, history, issues, or submitted artifacts Understanding report
Plan review Ask for a plan before edits or actions Plan with risks and validation
Controlled edit Approve narrow changes only Diff and validation result
Feedback loop Run tests, schemas, fixtures, screenshots, or linting Pass/fail evidence
Context tuning Add shared context or rules Reusable context artifact
Tool integration Add approved CLI or MCP tool Tool contract
Permission review Classify action tiers Permission matrix
Parallel isolation Use branch, worktree, sandbox, or managed session Isolated work record

37. Tool pattern appendix

Tool names are included as examples and mental hooks. They should be read by pattern, execution boundary, data boundary, permission model, logging posture, and governance dependency, not as endorsements or tool rankings.

Public product-surface descriptions in this appendix map to CLM-002CLM-008CLM-009CLM-010CLM-011CLM-012.

Pattern Examples Primary lesson Boundary question
Terminal agent Claude Code, Codex CLI CLI agents can inspect, edit, run commands, and fit many workflows What commands and data are allowed?
IDE-native agent Cursor, GitHub Copilot IDE agents improve development flow and context use How are rules, review, and repo ownership managed?
Cloud workbench Codex cloud, Antigravity-style managed agents Cloud agents can parallelize and verify tasks Where does code execute and what data leaves?
Enterprise agent builder Copilot Studio, Agent Builder, internal frameworks Business agents need connectors, publishing, governance, HITL Which governance tier applies?
Model gateway/runtime Azure AI Foundry, AWS Bedrock, internal marketplace Model access should be routed, logged, and governed Which model is allowed for which data and task?
Workflow orchestration Temporal, Step Functions, Logic Apps, Power Automate Durable processes need state, retries, approvals, compensation Which steps are deterministic, AI-assisted, or human-approved?

38. Field validation exercise

Before broad distribution, use this manual against two real or sanitized AI proposals.

Required exercise outputs

Output Purpose
Readiness level Classify idea, prompt artifact, harness, workflow, assistant, pilot, capability, or scale
Governance route Decide AI for All, USE, BUILD, REQUEST, standard review, fast track, or formal SDLC
Data boundary Identify allowed and prohibited data classes and tool paths
Source authority map Identify canonical, reference, submission, derived, stale, prohibited sources
Intent validity score Test outcome, user fit, AI appropriateness, failure consequence, sustainment
Eval validity score Test construct, coverage, risk, regression, reviewer, negative controls
HITL model Define reviewer states, authority, overrides, and escalation
Telemetry plan Define run, quality, cost, override, source freshness, adoption metrics
v0.9 field validation backlog Convert controlled-sharing findings into pre-v1.0 improvements

Use v0.9 as:

  • a leadership mental-model reset artifact,
  • an architecture and governance review guide,
  • a controlled practitioner reference,
  • a template library,
  • a field validation tool against real proposals.

Do not use v0.9 as:

  • final policy,
  • tool approval,
  • data-use approval,
  • production readiness approval,
  • procurement recommendation,
  • substitute for formal governance review.

40. pre-v1.0 field validation backlog

Before v1.0 or any policy-conversion use, controlled-sharing field guidance must pass the checklist below. v0.9 controlled sharing does not satisfy these prerequisites and does not create enterprise policy approval, tool approval, data-class approval, production approval, GxP approval, SaaS approval, autonomous-agent approval, a policy workflow engine, or an enterprise approval record.

40.1 Pre-v1.0 policy-conversion checklist

Check Required before policy conversion
Named policy owner A named policy owner accepts responsibility for any candidate policy language.
Accountable approver An accountable approver is named and has authority for the conversion decision.
Legal or regulatory review Legal and regulatory review is completed where required.
Quality or GxP review Quality or GxP review is completed where applicable.
Security review Security review confirms access, logging, connector, network, and control boundaries where applicable.
Privacy and data-class review Privacy or data steward review confirms allowed and prohibited data classes where applicable.
Tool approval review Tool or platform owner review confirms whether each tool surface is approved for the specific scope, where applicable.
Production and change-control review Production readiness and change-control path are confirmed where applicable.
Operational owner and sustainment model Operational owner, support boundary, maintenance cadence, and failure handling are named.
Evidence and receipt review Validation receipts, source provenance, field validation, and source-owner confirmation are reviewed.
Exception and rollback handling Exception path, stop condition, rollback path, and escalation owner are documented.
Explicit approval boundary The candidate statement says what is approved and what remains unapproved.
Priority Candidate change Why
P0 Validate tool/data boundary guidance with internal owners Confirm whether the conservative field template can inform policy-aligned guidance
P1 Run Lifecycle Lens field validation with target reviewers Prove the manual works against a real bounded MVP use case
P1 Expand synthetic lab coverage Broaden safe fixture coverage for conflicting-source, override, and no-mutation cases
P1 Field-test the core diagram set with target reviewers Confirm the visuals improve recall without flattening authority semantics
P1 Add operating cadence model for harness maintenance Make review timing and ownership concrete
P2 Add role-specific executive brief Support broader leadership distribution
P2 Add glossary Help beginners and non-technical leaders
P3 Reduce repeated thesis language Improve readability after concepts stabilize

Source Provenance and Claim Confidence

Provenance is included so skeptical readers can see where the material came from, which claims were externally checked, which claims came from uploaded internal context, which came from transcripts, and which require owner validation before being treated as policy.

How to read CLM IDs: each claim ID maps a visible claim to source class, source URL or source ID, retrieval or verification date, source owner, verifier role, evidence note, validation status, owner-validation state, confidence, freshness review date, limitations, and where the claim is used. CLM IDs are traceability markers, not decorative footnotes. Internal-source claims still require owner validation before policy use.

This package is not a vibes artifact. It uses a provenance register, claim-confidence labels, transcript handling rules, public verification notes, internal owner-validation flags, and evaluation receipts. Where a claim is not independently verified or owner-approved, it is labeled accordingly.

Source classes

Source class Meaning How to treat it
Public primary source Official vendor docs, official product pages, official company blogs Supports public product claims, but not internal enterprise approval
Public secondary source Interviews, reporting, practitioner analysis Useful for context and attribution, not policy
User-provided transcript Captured transcript of practitioner talks or videos Extract operating patterns, validate factual claims where possible
Internal-source supplied Uploaded enterprise/project materials Use as supplied context, owner validation required for policy-sensitive claims
Derived recommendation Synthesis based on the sources and evals Label as interpretation, not a quoted source
Eval output Multi-model artifact review Validates artifact quality and gaps, not factual truth
Open item Not yet verified Must not be treated as authoritative

Claim confidence labels

Label Meaning
Verified public source Confirmed against public primary sources
Corroborated Supported by multiple sources, not always primary
Transcript-derived Derived from user-provided transcript material
Internal-source supplied Present in uploaded internal/project materials
Owner validation required Requires named enterprise owner confirmation before policy use
Derived recommendation Our synthesis from available evidence
Illustrative example Pattern explanation, not proof of approval
Do not treat as policy Explicitly not official enterprise policy

Transcript handling rule

Practitioner transcripts are used to extract operating patterns, not to establish policy. Where a transcript makes a factual claim, the claim is verified against public sources, labeled as transcript-derived, or excluded from authoritative guidance.

Claim validation register

Claim ID Claim Source class Source URL or source ID Retrieved or verified Source owner Verifier role Evidence note Validation status Owner-validation state Confidence Freshness review date Used in Limitation
CLM-001 Vercel reported that an internal agent improved after most specialized tools were removed and the agent was simplified. Public primary source vercel-tool-pruning-blog-001 2026-06-17 Vercel Public-source verifier Official Vercel blog title captured in package notes as "We removed 80% of our agent's tools". Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Tool pruning; harness lifecycle Context-specific case. Do not generalize into a universal rule that fewer tools always wins.
CLM-002 Claude Code is an agentic coding system that reads a codebase, edits files, runs commands, and integrates with development tools. Public primary source code.claude.com/docs/en/overview 2026-06-17 Anthropic Public-source verifier Official overview states that Claude Code reads codebases, edits files, runs commands, and integrates with development tools. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Practitioner operating patterns; tool pattern appendix Product behavior changes quickly. Treat as current public positioning, not enterprise approval.
CLM-003 The Boris Cherny / Claude Code practitioner transcript supports patterns such as codebase Q&A first, planning before edits, feedback loops, context files, permission tiers, and parallel work isolation. User-provided transcript transcript-boris-cherny-claude-code-001 2026-06-17 User-supplied practitioner transcript Transcript reviewer Transcript patterns were reviewed for operating habits, then separated from public product descriptions. Transcript-derived pattern Not required for practitioner pattern use; not policy Medium-high for pattern, not verbatim quote 2026-06-17 Practitioner lab; tool permissions; context architecture Transcript contains speech-to-text artifacts. Use for operating patterns, not precise quotation or policy.
CLM-004 Nate Jones transcript supports the maintenance thesis: harnesses drift, tools should be pruned, agents can break when models improve, and teams should repeatedly ask what the agent eats, reaches, does, proves, and returns in value. User-provided transcript transcript-nate-jones-maintenance-001 2026-06-17 User-supplied practitioner transcript Transcript reviewer Transcript guidance was used only for maintenance and pruning patterns, with public-product claims kept separate. Transcript-derived pattern Not required for practitioner pattern use; not policy Medium-high for pattern 2026-06-17 Harness lifecycle; maintenance review; tool pruning Transcript includes irrelevant tail contamination. Only the agent/harness portion is used.
CLM-005 The enterprise AI stack materials distinguish AI for All, Pre-configured AI, and Custom Built AI, and define USE, BUILD, and REQUEST routing concepts. Internal-source supplied enterprise-ai-stack-kb-001 2026-06-17 Enterprise AI architecture materials Package editor Internal knowledge-base materials were reviewed as supplied source context for routing vocabulary. Internal-source supplied Required before policy use High for uploaded source, not final policy 2026-06-17 Enterprise governance routing; worked example platform path Requires named internal owner validation before publication as policy.
CLM-006 Lifecycle Lens MVP is framed as advisory-only, assistive-only, human-accountable, with no autonomous infrastructure mutation and no direct privileged endpoint execution. Internal-source supplied lifecycle-lens-mvp-companion-sow-001 2026-06-17 Lifecycle Lens MVP companion materials Package editor Internal companion SOW was reviewed for the worked-example boundary and no-mutation posture. Internal-source supplied Required before external supplier or policy use High for uploaded source 2026-06-17 Lifecycle Lens worked example; advisory boundary Specific to the MVP materials. Requires owner validation before external supplier use.
CLM-007 Lifecycle Lens architecture materials separate canonical authority, governed context, structured lifecycle/planning data, workflow, and advisory intelligence. Internal-source supplied lifecycle-lens-rwcp-pivot-deck-001 2026-06-17 Lifecycle Lens architecture deck Package editor Deck visuals were reviewed for the control-plane and source-priority pattern only. Internal-source supplied Required for exact deck interpretation before policy use High for uploaded deck content 2026-06-17 Context as control plane; advisory repository vs runtime control plane Deck visuals require human review for exact intended interpretation.
CLM-008 Microsoft Copilot Studio documentation describes creating agents and workflows, adding knowledge and tools, MCP server support, evaluation, administration, environments, authentication, and analytics. Public primary source learn.microsoft.com/en-us/microsoft-copilot-studio/ 2026-06-17 Microsoft Public-source verifier Official documentation landing page lists agent creation, workflows, knowledge, tools, MCP, evaluation, administration, environments, authentication, and analytics. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Enterprise tool pattern appendix; Microsoft-native examples; worked example platform path Does not imply company-specific approval or readiness.
CLM-009 Azure AI Foundry documentation positions the platform as a place to design, customize, manage, and support AI applications and agents at scale, with evaluation and monitoring capabilities. Public primary source learn.microsoft.com/en-us/azure/foundry/what-is-foundry 2026-06-17 Microsoft Public-source verifier Official Foundry documentation was reviewed as the current public positioning for the platform. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Model gateway/runtime pattern; governance context Service capabilities and naming change frequently.
CLM-010 OpenAI Codex CLI is positioned as a local command-line coding agent that can read, modify, and run code on a local machine with approval modes. Public primary source developers.openai.com/codex/cli 2026-06-17 OpenAI Public-source verifier Official Codex CLI docs were reviewed for the current local coding-agent description and approval-mode posture. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Tool pattern appendix; CLI/repo pattern Product state changes quickly. Local operation does not automatically approve enterprise data use.
CLM-011 Google Antigravity is described by Google as an agentic development platform where agents can plan and execute software tasks across editor, terminal, and browser, with artifacts for communication and validation. Public primary source antigravity.google 2026-06-17 Google Public-source verifier Official product surface was reviewed for the public product description used in the tool-pattern appendix. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Tool pattern appendix; future surface warning Does not imply enterprise approval or data boundary suitability.
CLM-012 Amazon Bedrock AgentCore documentation describes runtime, harness, memory, gateway, identity, observability, evaluations, policy, and registry services for operating agents at scale. Public primary source docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html 2026-06-17 AWS Public-source verifier Official AgentCore overview was reviewed for the current public service framing. Verified public source Not required for public product description; not enterprise approval High 2026-06-17 Runtime/control-plane discipline; technical annex Service adoption still requires enterprise architecture, security, cost, and data review.
CLM-013 Three independent model evaluations converged that the earlier package was conceptually strong but needed version integrity, external-tool boundaries, worked examples, and technical hardening. Eval output evals/external/GPT_5_5_Pro_v0.8.2_eval.md; evals/external/Gemini_3_1_Deep_Think_v0.8.2_eval.md; evals/external/GPT_5_5_Pro_Extended_v0.8.2_eval.md 2026-06-17 Eval artifact set Artifact review synthesizer Evaluation artifacts were compared for convergence of package-quality findings, not factual validation. Artifact-quality evaluation Not a policy source; no owner-validation path High for convergence of artifact feedback 2026-06-17 v0.5 and v0.8.2 backlog discipline LLM evals do not certify factual truth or internal policy.
CLM-014 Approved tool access does not approve the use case, data class, retention path, logging path, connector action, workflow impact, or production use. Derived recommendation derived-recommendation-tool-access-boundary-001 2026-06-17 Package editorial synthesis Package editor Boundary guidance is synthesized from the package policy-boundary sections, internal routing materials, and external tool-positioning sources. Derived recommendation Required before policy use Medium-high 2026-06-17 Policy boundary; tool and data matrix; governance routing This is synthesis for field guidance, not a quoted policy statement.

Eval provenance rule

The three model evaluations validate artifact quality, audience fit, gaps, and distribution readiness. They do not validate internal policy, approve external tools, or prove every factual claim. They are used as review evidence, not as truth certificates.

Template Library and Source Workbench

The Workbench is where concepts become reusable artifacts. Each pack is large enough to stand alone, maps back to a main section, and now includes card-level copy and download controls so teams can reuse the right artifact without scraping the entire manual.

How to use this Workbench:

Expand the pack you need, copy or download that pack as Markdown, or download the full Workbench. Each pack includes purpose, use case, inputs, outputs, owner, failure modes, repo path, main-section mapping, and a reusable Markdown artifact.

The Source Workbench is a reuse surface, not a decorative footer. The main body teaches the concepts. This workbench provides copy-ready artifacts, owner/reviewer expectations, failure modes, and repo mappings. Tiny cards are intentionally bundled into larger packs so each item carries enough operational weight to be worth copying.

SWB-001. Executive Reset Pack

Purpose: Give leaders a short, memorable mental-model reset: AI is not magic, tool access is not capability, and demo success is not operational readiness.

When to use: Use before leadership briefings, funding discussions, intake reviews, and any meeting where someone asks whether a prompt or agent is enough.

Inputs required: Business objective, target audience, proposed tool surface, expected workflow impact, data class, and decision owner.

Output produced: Executive framing, leader checklist, demo-to-capability challenge, and approval stop rule.

Owner / reviewer: Executive sponsor, architecture lead, governance lead.

Failure modes: Leader treats the tool as the strategy; pilot starts without source authority; HITL is claimed without authority or evidence; the demo becomes the decision.

Related repo path: repo-bootstrap/docs/executive-reset.md

Related main sections: Executive Reset; Capability Definition

Markdown artifact:

# Executive Reset Pack

## Opening statement
Stop building AI theater. Build capability.

Prompts, agents, skills, tools, and evals are components. Capability requires intent, source authority, context discipline, schemas, validation, observability, change control, sustainment, and measurable value.

## Leader approval stop rule
If the team cannot explain outcome, data boundary, source authority, schema, eval validity, human review model, telemetry, sustainment owner, and stop condition, approve discovery only. Do not approve broad pilot, production routing, autonomous execution, or business-process dependency.

## Questions leaders should ask
1. What business outcome changes?
2. What source is authoritative?
3. What must the AI never infer?
4. What is deterministic, what is AI-assisted, and what remains human-controlled?
5. What telemetry proves value and degradation?
6. Who owns maintenance after the demo?
SWB-002. Capability Formation Pack

Purpose: Turn an AI idea into a capability-readiness decision instead of another prototype with executive sponsorship and no operating spine.

When to use: Use during ideation, architecture intake, product framing, and when deciding whether a use case is an idea, prompt artifact, governed pilot, or operational capability.

Inputs required: Problem statement, users, workflow, sources, data classes, system-of-record boundary, expected business value, failure consequence, and owner list.

Output produced: Capability readiness classification, lifecycle trace, intent validity score, product requirements canvas, promotion gate matrix, and solution ideation outcome.

Owner / reviewer: Product owner, enterprise architect, governance reviewer.

Failure modes: The team chooses an agent before proving the problem; evals measure the wrong outcome; no one owns sustainment; the capability cannot be located on the readiness ladder.

Related repo path: repo-bootstrap/docs/capability-formation.md

Related main sections: Capability Definition; Intent and Outcome Discipline

Markdown artifact:

# Capability Formation Pack

## Definition
An AI capability is a governed, repeatable, measurable operating pattern that uses approved models, tools, data, workflows, controls, human review, and sustainment ownership to produce a defined business outcome reliably over time.

## Readiness ladder
0 Idea. 1 Prompt artifact. 2 Reusable harness. 3 Structured workflow. 4 Eval-backed assistant. 5 Governed pilot. 6 Operational capability. 7 Scaled enterprise capability.

## Promotion gate rule
Do not promote Levels 3 to 6 by enthusiasm alone. Require evidence for schema validity, fixture coverage, reviewer calibration, approved data route, stop conditions, telemetry, runbook, incident path, support owner, and adoption proof before naming the next level.

## Intent validity gates
Problem clarity, outcome specificity, user fit, AI appropriateness, source feasibility, data permission, failure consequence, human accountability, and sustainment realism must be answered before implementation.

## Solution ideation rule
Compare no AI, deterministic rules, workflow automation, search/RAG, dashboard/report, chat assistant, agentic workflow, and integrated capability before choosing the agentic path.
SWB-003. Governance and Routing Pack

Purpose: Route AI work to the right review path and prevent the dangerous misconception that approved tool access automatically approves every use case.

When to use: Use during intake triage, tool selection, agent publishing, business-process automation proposals, and external-tool experimentation discussions.

Inputs required: Use case type, data classification, tool surface, user group, workflow impact, regulatory/GxP relevance, retention/logging requirements, and sharing scope.

Output produced: Routing decision, required approvals, blocked uses, owner-validation flags, and evidence package requirements.

Owner / reviewer: AI governance lead, security/privacy reviewer, platform owner, business owner.

Failure modes: External tool used with company data; business-process agent treated as personal productivity; regulated use bypasses review; custom agent shared broadly without governance.

Related repo path: repo-bootstrap/governance/use-build-request-routing.md

Related main sections: Enterprise Governance and Approved Execution

Markdown artifact:

# Governance and Routing Pack

## Bright-line rule
Tool access does not approve the use case, data class, retention behavior, logging posture, workflow impact, or business-process automation.

## Engagement modes
USE approved prebuilt capability within its boundary. BUILD personal or small-group productivity agents only within approved constraints. REQUEST business-process or reusable capability through governance.

## No-review logic
No review applies only when every low-risk condition is true. If any condition is false, route to governance.

## External tool warning
External commercial tools are learning and pattern references only unless explicitly approved for enterprise data and work.
SWB-004. Context and Source Authority Pack

Purpose: Make context governable by separating canonical truth, structured current state, governed reference material, explanatory output, stale material, and prohibited sources.

When to use: Use before building RAG, advisory assistants, intake reviewers, lifecycle trackers, or any system that summarizes across documents and records.

Inputs required: Source inventory, owners, freshness dates, source classes, system-of-record boundaries, access controls, and conflict rules.

Output produced: Source authority map, freshness review, non-inference rules, evidence states, and prohibited-source list.

Owner / reviewer: Data steward, source owner, architecture lead, governance reviewer.

Failure modes: All retrieved text treated as equal truth; stale wiki becomes current policy; generated summary outranks canonical source; unsupported answer sounds authoritative.

Related repo path: repo-bootstrap/context/source-authority-map.md

Related main sections: Context as Control Plane; Source Authority Model

Markdown artifact:

# Context and Source Authority Pack

## Source precedence
Canonical documents govern service and process guidance. Structured records govern current state. Governed references provide context. Generated outputs are explanatory only. Stale or prohibited sources must be labeled and excluded from authority.

## Evidence states
Supported, not evidenced, conflicting evidence, requires confirmation, requires escalation, not applicable.

## Non-inference rule
The assistant must not infer approval status, data classification, GxP impact, ownership, production readiness, security control existence, or policy exceptions from silence.
SWB-005. Schema and Contract Pack

Purpose: Convert conversational wishes into inspectable contracts for inputs, outputs, evidence, decisions, telemetry, overrides, and tool execution.

When to use: Use when a task must be repeatable, auditable, evaluated, routed, or integrated into workflow or runtime systems.

Inputs required: Entity model, required fields, source IDs, evidence states, reviewer actions, telemetry events, allowed tools, and failure modes.

Output produced: JSON schemas, tool contracts, telemetry contract, override payload, and validation failure examples.

Owner / reviewer: Technical architect, data architect, platform engineer, QA/eval owner.

Failure modes: Outputs look good but cannot be parsed; tool calls mutate state without typed boundaries; override feedback is lost; telemetry cannot be correlated.

Related repo path: repo-bootstrap/schemas/README.md

Related main sections: Schema-First Design; Technical Annex

Markdown artifact:

# Schema and Contract Pack

## Required schemas
Input schema, output schema, evidence schema, decision-state schema, exception schema, telemetry event schema, override payload schema, and tool contract schema.

## Schema rule
If the team cannot define valid input and output shape, the capability is not ready for implementation.

## Example evidence fields
claim_id, source_id, source_type, evidence_state, confidence, excerpt, reviewer_action, override_reason, trace_id.
SWB-006. Harness, Rubric, and Eval Pack

Purpose: Define how the model is constrained, how output quality is judged, and how the team knows the eval is measuring the right thing.

When to use: Use for any repeatable AI work product, assistant, reviewer, advisory workflow, or agentic task that must survive regression.

Inputs required: Task objective, source authority, allowed inference, output schema, risk classes, fixture set, reviewer rubric, and expected failure behavior.

Output produced: Harness template, rubric, fixture matrix, negative controls, eval validity checklist, calibration protocol, and validation receipt.

Owner / reviewer: Eval owner, domain reviewer, architecture/governance lead.

Failure modes: Eval scores formatting instead of correctness; rubric rewards fluency; no negative controls; reviewer disagreement is hidden; failure cases are averaged away.

Related repo path: repo-bootstrap/evals/fixture-matrix.md

Related main sections: Harnesses and Agent Instructions; Rubrics and Eval Validity

Markdown artifact:

# Harness, Rubric, and Eval Pack

## Harness minimum fields
Objective, audience, inputs, source hierarchy, constraints, non-goals, output contract, allowed inference, stop conditions, validation method, and completion report.

## Eval validity checks
Construct, criterion, coverage, regression, risk, operational, reviewer reliability, and negative-control validity.

## Fixture starter set
Golden case, missing-evidence case, conflicting-source case, adversarial overreach case, ambiguous request case, regression case, and unsafe-action request.
SWB-007. Practitioner Operations Pack

Purpose: Give hands-on builders safe operating habits for agentic tools without implying that every external tool is approved for enterprise work.

When to use: Use in practitioner workshops, repo-based pilots, synthetic labs, onboarding, and technical architecture reviews.

Inputs required: Synthetic repo or approved workspace, test commands, fixture set, permission tiers, feedback loops, and clear tool boundary.

Output produced: Operating sequence, permission checklist, feedback-loop plan, repo/IDE/CLI transition decision, synthetic lab entry point, and safety notes.

Owner / reviewer: Technical architect, builder, platform engineer, security reviewer.

Failure modes: Builder starts with broad edits instead of Q&A; agent has excessive tools; no tests; no isolation; external tool receives company data; parallel sessions corrupt state.

Related repo path: labs/synthetic-capability-lab/README.md

Related main sections: Practitioner Operating Patterns; Tool Pattern Appendix

Markdown artifact:

# Practitioner Operations Pack

## Safe operating sequence
1. Q&A first. 2. Ask for a plan. 3. Review scope. 4. Allow controlled edits. 5. Run tests or schema checks. 6. Review diffs. 7. Capture evidence. 8. Commit only after human approval.

## Minimum feedback loops
Unit test, schema validation, static analysis, screenshot or output comparison where relevant, security scan, cost/loop limit, and human review.

## Synthetic lab entry point
Run the synthetic lab at `labs/synthetic-capability-lab/` when you need a safe repo-shaped example with schemas, source authority, fixtures, expected outputs, tests, and a validation receipt.

## External-tool boundary
Use public or synthetic data only unless the enterprise tool and data path are explicitly approved.
SWB-008. Harness Lifecycle and Maintenance Pack

Purpose: Keep deployed AI capabilities from rotting as models improve, tools change, workflows drift, sources age, and business needs move.

When to use: Use on a cadence, after model upgrades, after source changes, after tool failures, when override rates spike, or when business process ownership changes.

Inputs required: Run history, tool usage, source freshness, model version, eval regressions, override trends, cost telemetry, incidents, and adoption data.

Output produced: Maintenance review, pruning decision, model-upgrade impact review, context freshness review, rebuild/retire decision, and next review date.

Owner / reviewer: Capability owner, platform owner, eval owner, source owner, governance reviewer.

Failure modes: Tool sprawl grows silently; stale context becomes truth; better models turn old permissions into risk; agent keeps producing work no one uses.

Related repo path: repo-bootstrap/operations/harness-maintenance-review.md

Related main sections: Runtime and Maintenance Discipline

Markdown artifact:

# Harness Lifecycle and Maintenance Pack

## Five maintenance checks
What is it eating? What can it reach? What is its job? What proof must it return? Is it still valuable?

## Pruning rule
Every tool increases action surface, ambiguity surface, permission surface, audit surface, cost surface, and maintenance burden. Tools must earn their place through observed value and controlled failure behavior.

## Model upgrade trigger
A stronger model can make old constraints too restrictive or old permissions too dangerous. Revalidate after model, tool, source, or workflow changes.
SWB-009. Worked Example Pack

Purpose: Provide reference examples so teams can see how the discipline looks when applied, not merely admire the terminology from a safe distance.

When to use: Use in workshops, onboarding, proposal reviews, supplier framing, and eval calibration.

Inputs required: Use case statement, source map, data boundary, schema, harness, rubric, eval fixtures, governance route, telemetry plan, and sustainment model.

Output produced: Example A, B, C, and D packaged as reusable traces with artifacts and failure checks.

Owner / reviewer: Architecture lead, product owner, governance reviewer, practitioner lead.

Failure modes: Example is mistaken as a universal template; teams copy without adapting data boundaries; Lifecycle Lens is treated as a first-class product pillar instead of a worked example.

Related repo path: repo-bootstrap/docs/worked-examples.md

Related main sections: Worked Examples and Field Tests

Markdown artifact:

# Worked Example Pack

## Example A
Bad prompt to governed harness. Shows how a vague request becomes objective, source authority, output contract, rubric, eval fixture, and receipt.

## Example B
Business-process capability trace. Shows intent to telemetry and human decision.

## Example C
Lifecycle Lens MVP. Shows advisory-only, Microsoft-native, source-priority, structured-state, no-mutation posture.

## Example D
Tool pruning and maintenance. Shows how to remove tools that add more risk than value.
SWB-010. Repo Bootstrap Pack

Purpose: Show how the field manual becomes a repository-shaped operating system for AI capability work.

When to use: Use when moving beyond chat into repeatable, reviewable, versioned artifacts.

Inputs required: Approved source inventory, templates, schemas, evals, rubrics, governance routing, operations controls, and validation receipts.

Output produced: Reference repo scaffold with README, AGENTS.md, docs, context, schemas, harnesses, rubrics, evals, tools, governance, operations, and receipts.

Owner / reviewer: Architecture lead, platform engineer, repo owner.

Failure modes: Everything remains trapped in chat; no versioning; no review; no artifact parity; no validation receipt; no one knows where the current source lives.

Related repo path: repo-bootstrap/README.md

Related main sections: Repo Bootstrap; Template Library and Source Workbench

Markdown artifact:

# Repo Bootstrap Pack

## Purpose
This scaffold is a reference structure for AI capability discipline artifacts. It is not a deployable product, final policy repository, or approved enterprise runtime.

## Top-level folders
docs, context, schemas, harnesses, rubrics, evals, tools, governance, operations, receipts.

## Rule
Every reusable artifact in the field manual should map to a file path or an explicit reason why it remains narrative-only.
SWB-011. Source Provenance Pack

Purpose: Give skeptical readers a clear map of where claims came from, what was verified, what was transcript-derived, and what requires owner validation.

When to use: Use before sharing broadly, during governance review, when challenged on source quality, and when updating public or internal tool claims.

Inputs required: Claim list, source class, source reference, verification status, confidence, limitation, used-in sections, and owner validation status.

Output produced: Claim validation register, source confidence labels, transcript handling rule, eval provenance rule, and owner-validation checklist.

Owner / reviewer: Package owner, governance lead, source owner, reviewer.

Failure modes: Internal supplied context is mistaken for approved policy; transcript-derived operating lessons are treated as exact quotes; LLM evals are mistaken for factual validation.

Related repo path: repo-bootstrap/provenance/claim-validation-register.md

Related main sections: Source Provenance and Claim Confidence

Markdown artifact:

# Source Provenance Pack

## Provenance rule
Say what was confirmed, against what source, at what confidence level, and what still requires owner validation.

## Required labels
Verified public source, corroborated, transcript-derived, internal-source supplied, owner validation required, derived recommendation, illustrative example, do not treat as policy.

## Eval rule
Model evals validate artifact quality and gap coverage. They do not certify factual truth or internal policy.
Copied.