AI Capability Discipline
From Magic Thinking to Governed, Measurable, Maintainable AI Systems.
This package is an operating discipline reference for leaders, architects, governance reviewers, and practitioners working on correctness-matters AI. It does not approve tools, data classes, runtime paths, or production use cases.
- Intent, source authority, and data boundaries
- Context, schemas, harnesses, and valid evals
- Human review, observability, change control, and sustainment
- Measured business value, not demo theater
1. Executive brief
AI is not magic. AI is a system capability candidate, and sometimes it is not the right answer. The first question is whether AI should be used at all. If it is used, the work must be shaped by intent, source authority, data boundaries, context management, schemas, workflows, tool permissions, feedback loops, evals, human review, telemetry, governance, change control, sustainment ownership, run-cost realism, and measurable outcome.
The key leadership reset is simple: model access is not capability access. A better model can reduce friction, but it cannot define the business outcome, approve the data path, decide source authority, validate the workflow, make deterministic work probabilistic without consequence, own sustainment, or absorb accountability when the output is wrong.
Five things leaders should stop approving
| Stop approving | Replace with |
|---|---|
| Demos as proof of capability | Evidence-backed pilot gates |
| Prompt reuse as operating discipline | Harness, schema, eval, telemetry, and ownership |
| Tool approval as use-case approval | Tool, data, workflow, and governance routing checks |
| Human-in-the-loop as decorative safety | Reviewer authority, evidence, queue, override, and audit model |
| Evals that score polish | Evals that measure business correctness, risk, and safe failure |
Leader approval stop rule
If the team cannot explain the outcome, data boundary, source authority, schema, eval validity, human review model, telemetry, sustainment owner, and stop condition, approve discovery only. Do not approve broad pilot, production routing, autonomous execution, or business-process dependency.
2. Executive mental model reset
This section is deliberately blunt because leaders often see the visible model output and miss the system underneath it. The replacement models turn AI from a magic tool story into an operating-model conversation about controls, evidence, ownership, and accountable decisions.
| Bad mental model | Replacement model | What leaders should ask |
|---|---|---|
| AI is magic | AI is a probabilistic system component | What controls make it reliable? |
| Better model means solved | Better models reduce friction but do not define accountability | What remains outside the model? |
| We already have Copilot | Tool access is not use-case approval | Which data and workflow are approved? |
| Give me the prompt | A prompt is only one expression of a task | What is the operating pattern? |
| We built an agent | An agent is a component, not a capability | What workflow, governance, telemetry, and owner exist? |
| The demo worked | A demo proves possibility, not reliability | What happens on edge cases and missing evidence? |
| The eval passed | Eval success matters only if the eval measures the right thing | What does the eval actually prove? |
| Human-in-the-loop means safe | Human review works only with authority, evidence, time, and override workflow | Who can override and what is captured? |
3. Capability equation
The equation is not math for decoration. It is a dependency map. Missing any major term does not mean the work is useless, but it does mean the work is not yet a governed capability and should be treated as discovery, prototype, or bounded pilot.
AI Capability =
Clear Intent
+ Approved Data Path
+ Source Authority
+ Workflow Fit
+ Schema Contracts
+ Harnesses
+ Rubrics
+ Valid Evals
+ Human Review
+ Tool Permissions
+ Observability
+ Governance
+ Sustainment Ownership
+ Measured Business Outcome
A prompt, agent, skill, rubric, or eval can be useful. None of them becomes a capability until the full equation is credible enough to survive real work.
4. Demo-to-capability gap
Demos compress uncertainty into a polished moment. This table separates what a demo can legitimately prove from what it cannot prove, so leaders do not confuse plausibility with readiness or adoption enthusiasm with operational evidence.
| Demo proves | Demo does not prove |
|---|---|
| The model can produce a plausible answer | The answer is correct, current, supported, or useful |
| The tool can call an API | API use is approved, safe, auditable, or reversible |
| Users were impressed | Users will adopt it under real workflow constraints |
| One scenario worked | Edge cases and failure modes are controlled |
| Output looked polished | The output measured the right outcome |
| The prototype was fast | Sustainment, cost, telemetry, and governance are feasible |
| A tool is licensed | The data path and use case are approved |
5. Distribution status and policy boundary
This manual is field guidance, not final policy. It can be used to shape intake, architecture review, governance discussion, and practitioner learning. It must not be interpreted as:
- tool approval,
- data-class approval,
- production approval,
- GxP or regulated-use approval,
- external SaaS approval,
- security exception approval,
- autonomous agent approval,
- replacement for formal AI Governance review.
When this manual conflicts with named internal policy, the named policy wins. When the policy is unknown, mark the item as requires owner confirmation rather than inventing approval, because apparently optimism is still not an access-control model.
6. Tool and data boundary matrix, owner-validation required
The matrix below is a field template requiring owner validation before policy or operational use. It is intentionally conservative. Replace placeholders with confirmed internal policy before broad distribution.CLM-014
| Tool surface | Personal learning with public or synthetic data | Company data | Confidential or proprietary data | Regulated, GxP, PHI, PII, security-sensitive data | Business-process use | Approval route |
|---|---|---|---|---|---|---|
| Approved AI for All chat | Usually allowed within policy | Allowed only by approved data class | Depends on policy | Not assumed allowed | No, unless all no-review conditions hold | AI Governance if any trigger is false |
| Internal enterprise GenAI chat | Usually allowed within policy | Depends on approved data boundary | Depends on policy | Requires explicit confirmation | Depends on impact | AI Governance if workflow or data triggers apply |
| Copilot Studio or enterprise agent builder | Learning and team prototyping where approved | Use-case approved only | Use-case approved only | Requires explicit review | Yes, if governed | Team, function, or enterprise governance path |
| Azure or AWS approved runtime | Not a casual user surface | Use-case approved only | Use-case approved only | Requires explicit review | Yes, if governed SDLC applies | Formal architecture, security, privacy, compliance, AI Governance |
| GitHub Copilot or approved SDLC assistant | Only where assigned and approved | Depends on repo and policy | Depends on policy | Not assumed allowed | No business-process automation by default | SDLC and AI Governance as applicable |
| Claude Code, Codex, Cursor, Antigravity, external SaaS coding tools | Public or synthetic learning only unless approved | Not assumed allowed | Not assumed allowed | Not allowed unless explicitly approved | Not approved by default | Explicit approval required |
| Local or personal tools | Public or synthetic learning only | Not assumed allowed | Not assumed allowed | Not allowed | Not approved | Explicit approval required |
Bright-line rule
Approved tool access does not approve the use case, data class, retention model, logging path, connector action, workflow impact, or production use.
7. Enterprise routing model
Use enterprise routing to separate casual productivity from governed business capability.
| Work type | Likely path | Required discipline |
|---|---|---|
| Individual productivity | AI for All, approved chat, approved assistant | Stay within approved data and output boundaries |
| Small group experiment | BUILD path, limited sharing, synthetic or approved data | Scope, owner, data boundary, known limitations |
| Pre-configured business workflow | USE approved agent or platform capability | Confirm data, audience, support, and governance triggers |
| Custom business-process AI | REQUEST or Custom Built AI path | PRD, source authority, schemas, evals, HITL, telemetry, governance |
| Regulated, GxP, privacy-sensitive, or decision-impacting workflow | Governance first | Formal review before tooling or data processing |
| Production or scaled capability | Governed SDLC and operational ownership | Release gates, runbook, monitoring, support, change control |
8. Capability readiness model
The readiness ladder gives teams a shared vocabulary for maturity. It should be used as a routing tool, not as a vanity score. The practical question is always what proof is required to move up one level without skipping governance, telemetry, or ownership.
| Level | State | Meaning | Minimum next proof |
|---|---|---|---|
| 0 | Idea | Interesting but not shaped | Problem statement and user need |
| 1 | Prompt artifact | One-off model interaction | Reusable harness candidate |
| 2 | Reusable harness | Repeatable prompt/instruction pattern | Input/output contract and review path |
| 3 | Structured workflow | Defined inputs, outputs, states, and human review | Eval fixture set and evidence rules |
| 4 | Eval-backed assistant | Tested against fixtures and rubrics | Pilot charter, data approval, telemetry plan |
| 5 | Governed pilot | Approved users, data path, evals, review, and telemetry | Runbook, support model, release criteria |
| 6 | Operational capability | Supported, monitored, versioned, adopted | Scaling plan and reuse governance |
| 7 | Scaled enterprise capability | Integrated, reusable, governed, measured, continuously improved | Portfolio governance and continuous eval operations |
8.1 Promotion gate matrix
Promotion between Levels 3 and 6 should be treated as an evidence gate, not a naming preference. The matrix below defines minimum evidence floors for field guidance. Meeting the floor does not bypass governance, architecture, security, privacy, compliance, or owner approval.
| Gate area | Level 3 to 4 | Level 4 to 5 | Level 5 to 6 |
|---|---|---|---|
| Schema validity | Input, output, and evidence fields are explicit, and sample outputs validate against the declared schema. | Pilot schemas cover normal, exception, and review states, with no unresolved field drift across pilot fixtures. | Operational schemas are versioned, change-controlled, and released with backward-compatibility or migration handling. |
| Fixture coverage | Starter fixtures cover golden path, missing evidence, conflicting source, ambiguous request, and unsafe request behavior. | Pilot fixtures cover top failure paths, reviewer overrides, and recent regressions seen in trial use. | Regression suite is refreshed from incidents, source changes, model changes, and operating drift. |
| Rubric calibration | A domain reviewer and builder align on pass, fail, and escalation labels for the starter fixture set. | Pilot reviewer pool calibrates against the fixture set and records how disagreement is resolved. | Calibration repeats on a defined cadence and after rubric, model, source, or workflow changes. |
| Reviewer pool | A named reviewer can reject, request evidence, or escalate. | Pilot reviewer pool has primary and backup coverage with queue ownership. | Operational reviewer coverage matches hours, expected volume, and escalation obligations. |
| Approved data route | Only public, synthetic, or otherwise approved data enters the eval-backed assistant path. | Pilot data route, logging path, retention path, and prohibited data classes are explicitly approved for pilot scope. | Operational data route is documented per source class and monitored for drift or boundary violations. |
| Stop conditions | Missing-evidence, unsafe-action, and overreach stop conditions are explicit in harness or reviewer guidance. | Pilot stop conditions include false-negative, boundary-violation, and override-spike triggers with pause authority. | Stop conditions are wired to operational pause, rollback, or routing controls. |
| Telemetry | Run start, output, evidence state, reviewer action, and stop-condition events are defined. | Pilot telemetry proves fixture outcomes, override rates, cycle time, and boundary violations. | Operational telemetry tracks quality, cost, adoption, drift, and incident correlation. |
| Runbook | Reviewer instructions exist for how to run the workflow and capture findings. | Pilot runbook covers startup, failure handling, retriage, source refresh, and manual fallback. | Operational runbook covers release, rollback, monitoring, and handoff expectations. |
| Incident path | Harm or error cases have an escalation contact, even if operational incident handling is not yet active. | Pilot incident path names who pauses the pilot, who reviews the event, and how evidence is preserved. | Operational incident path integrates with the owning team's incident and post-incident review flow. |
| Support owner | A named builder or owner is accountable for the assistant and its artifacts. | Pilot support owner accepts source, rubric, and fixture maintenance responsibilities. | Operational support owner, backup, and service boundaries are documented. |
| Adoption proof | At least one target workflow and success measure are named. | Pilot adoption proof shows real reviewers using the workflow and returning structured feedback. | Operational adoption proof shows repeat usage, decision uptake, and a maintained value signal. |
9. Capability formation lifecycle
Intent
→ Product thesis
→ Product requirements
→ Value classification and acceptance line
→ Domain model
→ Source authority model
→ Data contract
→ Schema contracts
→ Harness
→ Rubric
→ Eval suite
→ Workflow
→ Human review model
→ Telemetry and observability
→ Governance route
→ Sustainment model
→ Field validation
→ Operational capability
The lifecycle is not paperwork theater. It exists because without these layers, a team can build a very convincing wrong thing.
10. Intent and outcome management
Intent is valid only when the problem, user, workflow, outcome, source feasibility, data permission, failure consequence, and ownership are explicit.
Value must be classified before it is judged. A proposal can create real value and still sit below the current acceptance line if the wrong owner benefits, the evidence is weak, or the current business climate requires direct savings.
| Gate | Test | Failure signal |
|---|---|---|
| Problem clarity | Specific, recurring, material, and owned | Vague productivity promise |
| Outcome specificity | Observable baseline and target | “Make work easier” with no measure |
| Value classification | Claimed value class, decision owner, benefiting owner, and evidence owner are explicit | Real value claim with no accountable owner or proof path |
| Acceptance line fit | Current business climate and minimum accepted threshold are explicit | Value is real but below the current acceptance line |
| User fit | Real user job and workflow entry point | Solution looking for a workflow |
| Decision relevance | Output drives a real decision or action | Output is interesting but unused |
| AI appropriateness | AI compared to no AI, rules, search, workflow, dashboard, deterministic automation | Agent-first thinking |
| Source feasibility | Required sources exist and have authority | Model asked to infer missing authority |
| Data permission | Required data can be processed, logged, retained, and reviewed in selected tool path | Tool approval confused with data approval |
| Failure consequence | Wrong, missing, stale, or overconfident output is analyzed | No safe failure path |
| Human accountability | Reviewer authority and override workflow exist | HITL slogan, no action model |
| Sustainment realism | Owner, cadence, funding, and release model exist | Demo owner disappears after launch |
11. Product requirements for AI capabilities
A serious AI capability needs product requirements, not just prompts.
| Requirement area | Required content |
|---|---|
| Target users | Roles, responsibilities, permissions, review authority |
| User jobs | What task or decision is improved |
| Business outcome | Baseline, target, value hypothesis, value class, benefiting owner, evidence owner, measurement method |
| Acceptance line | Decision owner, current threshold, below-line handling, exception path if needed |
| Non-goals | What the system must not do |
| Inputs | Data classes, artifacts, source systems, owners, refresh cadence |
| Outputs | Decisions, recommendations, drafts, findings, actions, confidence limits |
| Decision boundaries | What the model may suggest versus what humans must decide |
| Failure modes | Missing evidence, stale source, conflict, hallucination, tool failure, privacy risk |
| Acceptance criteria | Functional, quality, governance, telemetry, and support thresholds |
| Operating model | Owner, support path, review cadence, release and change control |
12. Solution ideation matrix
The point of this matrix is to stop agent-first design. Many problems are better served by deterministic rules, workflow automation, better source hygiene, reporting, or search before any agentic runtime is justified.
Before choosing an agent, compare options. The best AI architecture sometimes uses less AI. Horrifying for hype decks, useful for reality.
| Option | Best fit | When to reject |
|---|---|---|
| No AI | Problem is rare, low value, or unclear | Recurring workflow has measurable burden |
| Search or RAG | Find and summarize trusted content | Task requires actions or structured decisions |
| Deterministic rules | Clear policy or classification logic | Ambiguous interpretation required |
| Workflow automation | Known steps and approvals | Complex language interpretation required |
| Dashboard or report | Visibility and monitoring | User needs drafting, reasoning, or orchestration |
| Chat assistant | Exploration, synthesis, first-pass support | Needs durable workflow or audited action |
| Agentic workflow | Multi-step tasks with tools, approvals, and feedback | No approved tools, data path, evals, or owner |
| Integrated capability | Business process with sustained ownership | No measurable outcome or support model |
13. Schema-first capability design
Schemas are where vague AI intent becomes inspectable. They let teams validate input, output, evidence, exceptions, telemetry, and human review records instead of relying on prose promises and well-formatted uncertainty.
If a team cannot define valid input, output, evidence, decision states, exceptions, telemetry, and review records, it is not ready to build beyond exploration.
13.1 Minimal output schema example
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "AIReviewFinding",
"type": "object",
"required": ["finding_id", "status", "claim", "evidence_state", "human_review_required"],
"properties": {
"finding_id": {"type": "string"},
"status": {"enum": ["supported", "gap", "not_evidenced", "conflicting_evidence", "requires_confirmation", "requires_escalation", "not_applicable"]},
"claim": {"type": "string"},
"evidence_state": {"enum": ["cited", "missing", "conflicting", "not_applicable"]},
"evidence_refs": {
"type": "array",
"items": {"type": "string"}
},
"source_authority_level": {"enum": ["canonical", "governed_reference", "submission_evidence", "derived_analysis", "historical", "prohibited"]},
"risk_severity": {"enum": ["low", "medium", "high", "critical"]},
"human_review_required": {"type": "boolean"},
"recommended_action": {"type": "string"}
}
}
13.2 Schema failure example
| Failure | Why it blocks readiness |
|---|---|
| Finding has no evidence state | Unsupported claims cannot be separated from supported claims |
| Finding has no source authority level | The model may treat all retrieved content as equal |
| Finding has no human review flag | Governance-sensitive cases may appear resolved |
| Finding has no status enum | Outputs cannot be reliably evaluated or aggregated |
14. Source authority model
Source authority must be explicit and versioned.
| Source class | Example | Can support findings? | Required handling |
|---|---|---|---|
| Canonical | Approved policy, official standard, validated technology catalog | Yes | Cite source and version |
| Governed reference | Architecture pattern library, approved playbook | Yes, with context | Cite source, owner, version |
| Submission evidence | Submitted diagram, PRD, vendor document | Yes for what was submitted | Mark as submission evidence, not policy |
| Derived analysis | Model extraction or summary | No by itself | Must cite underlying evidence |
| Historical | Prior decisions, older package, retired architecture | Only with date and context | Check freshness and applicability |
| Stale | Deprecated standard, superseded deck | No | Flag as stale |
| Prohibited | Unapproved note, external blog, unverifiable model output | No | Do not use as evidence |
15. Eval validity and calibration
Evals are not automatically trustworthy because they have scores. They are trustworthy only when they measure the intended capability, cover the right failures, correlate with expert judgment, and catch safe-failure behavior when evidence is missing or contradictory.
Evals can be beautifully wrong. A rubric can score the wrong behavior consistently. That is not quality. That is automated self-deception with columns.
| Validity type | Question | Failure mode |
|---|---|---|
| Construct validity | Does the eval measure the actual capability? | Scores format instead of decision usefulness |
| Criterion validity | Does eval performance correlate with expert review? | Model passes but experts reject output |
| Coverage validity | Does the suite cover normal, edge, ambiguous, adversarial, missing-evidence, and regression cases? | Happy-path-only testing |
| Risk validity | Are high-consequence failures overweighted? | Average score hides critical false negatives |
| Regression validity | Does the eval catch degradation after model, prompt, source, schema, or tool changes? | Change ships with hidden behavior drift |
| Operational validity | Does eval success predict workflow usefulness? | Output passes tests but users ignore it |
| Reviewer reliability | Would qualified reviewers score similarly? | Rubric is subjective theater |
| Negative-control validity | Does the system fail correctly when it should? | Missing evidence becomes invented confidence |
15.1 Eval calibration protocol
- Select at least six fixtures: golden, incomplete, conflicting, adversarial, ambiguous, and regression.
- Have two or more qualified reviewers independently score expected outputs.
- Identify disagreements and update rubric anchors.
- Define high-risk false negative stop conditions.
- Define minimum release threshold.
- Run the suite whenever prompt, model, schema, source map, tool contract, or runtime changes.
- Record reviewer agreement, override rate, and unresolved disagreements.
15.2 Starter fixture matrix
| Fixture | Purpose | Expected behavior |
|---|---|---|
| Golden | Fully evidenced, low ambiguity | Supported findings, minimal escalation |
| Incomplete | Missing required input | not_evidenced, request evidence |
| Conflicting | Two sources disagree | conflicting_evidence, human confirmation |
| Adversarial | User claims approval without evidence | reject unsupported claim |
| Ambiguous | Unclear data class or ownership | requires_confirmation |
| Regression | Previously fixed failure | no reintroduction of failure |
16. Intent-to-eval traceability
Each eval assertion should trace to a business outcome and the claimed value class, not just a prompt instruction.
| Business outcome | User need | Product requirement | Domain model | Source authority | Data contract | Schema | Harness rule | Rubric dimension | Eval fixture | Telemetry metric | Human decision |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Reduce incomplete architecture submissions | Architect needs missing evidence identified early | Assistant must flag missing security model | Submission, artifact, control, evidence | Security baseline is canonical | No sensitive artifacts in unapproved tools | finding.status enum includes not_evidenced | If security evidence missing, do not infer | Evidence correctness | incomplete-security-model-001 | not_evidenced correctness rate | Reviewer requests evidence or escalates |
17. Human review and override model
Human-in-the-loop is not a safety feature unless the loop has authority, evidence, time, context, actions, and logging.
17.1 Review state model
Draft generated
→ Needs evidence
→ Requires confirmation
→ Accepted / Edited / Rejected / Overridden
→ Escalated if needed
→ Decision packet prepared
→ Decision recorded
→ Feedback loop reviewed
17.2 Override payload schema
{
"override_id": "OVR-0001",
"finding_id": "FND-0007",
"reviewer_role": "enterprise_architect",
"original_status": "supported",
"override_status": "requires_escalation",
"rationale": "Source cited is submission evidence, not canonical policy.",
"evidence_refs": ["SRC-SEC-STD-2026-01"],
"action_taken": "Escalated to security architecture owner",
"requires_fixture_update": true,
"timestamp_utc": "2026-06-17T17:00:00Z"
}
18. Technical annex: repo-backed package layout
The repo structure is included because durable AI capability work eventually outgrows chat. Files, schemas, fixtures, receipts, and governance records need a stable place to live if teams want repeatability and reviewability.
Serious AI work should move from chat to files when it needs versioning, tests, schemas, fixtures, reproducibility, or multiple maintainers.
ai-capability/
README.md
PRODUCT_REQUIREMENTS.md
GOVERNANCE_ROUTING.md
DATA_BOUNDARY.md
SOURCE_AUTHORITY_MAP.yaml
harnesses/
review_harness.md
schemas/
finding.schema.json
telemetry-event.schema.json
override.schema.json
rubrics/
review_rubric.md
evals/
fixtures/
incomplete-security-model.json
conflicting-source-authority.json
expected/
incomplete-security-model.expected.json
tests/
test_eval_assertions.py
tools/
mcp-tool-contracts/
architecture-catalog.lookup.yaml
receipts/
validation-receipt-template.md
docs/
HUMAN_REVIEW_WORKFLOW.md
OBSERVABILITY_CONTRACT.md
RELEASE_GATES.md
19. Technical annex: programmable eval assertion
import json
REQUIRED_STATUS = "not_evidenced"
with open("evals/outputs/incomplete-security-model.output.json", "r", encoding="utf-8") as f:
output = json.load(f)
findings = output["findings"]
security_findings = [f for f in findings if f.get("control_id") == "SEC-001"]
assert security_findings, "SEC-001 finding is missing"
for finding in security_findings:
assert finding["status"] == REQUIRED_STATUS, "Missing security evidence must not be treated as supported"
assert finding["human_review_required"] is True, "Missing security evidence requires human review"
assert finding.get("evidence_refs", []) == [], "Missing evidence should not invent citation references"
20. Technical annex: MCP and tool execution contract
Every tool exposed to an agent should have a contract. Tool access is where language generation becomes operational risk.
tool_id: architecture_catalog.lookup
owner: enterprise_architecture
purpose: lookup approved technology status and reference patterns
data_classes_allowed:
- public
- internal_non_sensitive
actions_allowed:
- read_catalog_entry
- search_reference_pattern
actions_prohibited:
- modify_catalog
- approve_exception
- change_source_authority
identity_model: managed_identity_or_service_principal
auth_scopes:
- catalog.read
egress_allowed: false
input_schema: schemas/catalog_lookup_input.schema.json
output_schema: schemas/catalog_lookup_output.schema.json
audit_events:
- tool.called
- tool.result_returned
- tool.error
rate_limits:
per_minute: 60
human_approval_required_for:
- exception_request
- status_change
failure_behavior: return requires_confirmation and do not infer approval
rollback_behavior: not_applicable_read_only
21. Technical annex: CI/CD and release gates
| Gate | Required proof | Blocks release if |
|---|---|---|
| Local harness validation | Output validates against schema | Schema invalid |
| Fixture regression | Golden, incomplete, conflicting, adversarial, ambiguous, regression fixtures pass | High-risk false negative appears |
| Source authority check | Source map version is present and current | Unknown source used as canonical |
| Tool contract check | All tools have owner, scopes, schemas, logging, allowed actions | Tool has unbounded action access |
| Security and privacy check | Data classes and retention match approved path | Data path unknown |
| Human review check | Override workflow and decision state schema exist | HITL is undefined |
| Observability check | Run IDs, tool spans, eval results, cost, override events captured | No traceability |
| Production promotion | Runbook, support owner, SLO, incident path, rollback defined | Sustainment owner missing |
22. Technical annex: observability contract
| Event | Required fields | Why it matters |
|---|---|---|
| ai.run.started | run_id, user_id, capability_id, version | Traceability |
| ai.context.loaded | run_id, source_map_version, context_refs | Source freshness |
| ai.tool.called | run_id, tool_id, action, auth_scope, data_class | Tool audit |
| ai.output.generated | run_id, schema_version, model_version, harness_version | Output provenance |
| ai.eval.completed | run_id, fixture_set_version, pass_fail, failures | Regression evidence |
| ai.human.override | run_id, finding_id, original_status, override_status, rationale | Feedback loop |
| ai.escalation.required | run_id, trigger, owner, due_date | Governance action |
| ai.run.completed | run_id, cost, latency, tokens, outcome_status | Value and FinOps |
Required metrics
- high-risk false negative count,
- unsupported claim rate,
- not-evidenced correctness rate,
- override rate,
- reviewer disagreement rate,
- escalation rate,
- source freshness age,
- cost per run,
- latency per run,
- adoption and repeat-use rate,
- incident count.
23. Technical annex: FinOps and execution limits
Agentic systems need explicit execution limits.
| Control | Example |
|---|---|
| Token budget | Stop or escalate when run exceeds approved token budget |
| Tool-call budget | Max 25 tool calls per run unless reviewer approves extension |
| Retry limit | Max 2 retries per failed tool action |
| Loop limit | Max 3 plan-execute-check loops before human review |
| Timeout | Stop long-running operations after defined threshold |
| Cost alert | Alert when cost per run exceeds expected band |
| Escalation | Escalate if repeated failures indicate bad harness, source, or tool contract |
24. Technical annex: parallel execution safety
Parallel agents increase throughput and risk.
| Risk | Required control |
|---|---|
| State corruption | Worktree, branch, sandbox, or transaction isolation |
| Race condition | Locking, idempotency, queue ownership |
| Duplicate action | Idempotency key and action ledger |
| API rate exhaustion | Rate limits and backoff |
| Conflicting edits | Diff review and merge gate |
| Unbounded cost | Per-session budget and timeout |
| Hidden failures | Central run log and tool-call spans |
| Production impact | No production writes without explicit human approval |
25. Harness lifecycle management
Agents are maintained systems, not launch-and-forget assets. The harness around the model has to be reviewed as sources age, tools change, workflows drift, model behavior improves, and the business changes its definition of useful work.
The agent is not the whole system. The harness is the workbench around the agent: sources, context, tools, permissions, prompts, schemas, evals, review flows, telemetry, and stop conditions.
v0.9 preserves the maintenance lens because a capability that worked last quarter can become unsafe, wasteful, or stale even when the model improves. That is the part many teams miss while they are busy admiring how quickly the agent can produce more work for humans to clean up. Charming little productivity trap.
25.1 Harness lifecycle thesis
| Principle | Meaning | Risk if ignored |
|---|---|---|
| Harnesses live in motion | Models, tools, sources, workflow, and business context change | Yesterday's safe setup becomes today's drag or risk |
| Maintenance includes deletion | More tools and more rules are not always better | Tool bloat, permission sprawl, token waste, audit noise |
| Context is operational | Context drives output, validation, and decisions | Stale context becomes active misinformation |
| Model upgrades are change events | Better models can make old harnesses misfit | Stronger agents use weak boundaries faster |
| Proof must remain linkable | Output must point to sources, records, spans, or logs | Fluency outruns trust |
| Value must be rechecked | A useful agent can become redundant or harmful | Automation keeps producing work nobody needs |
25.2 Maintenance cadence
| Trigger | Required review |
|---|---|
| Model version changes | Model upgrade impact review |
| Workflow changes | Intent, job, and state-model review |
| Source changes | Source authority and context freshness review |
| Tool or connector changes | Permission, action, and audit review |
| High override rate | Eval, rubric, and source review |
| Cost spike | FinOps and loop-limit review |
| Low adoption | Value and workflow fit review |
| Incident or near miss | Stop condition, blast-radius, and recovery review |
26. Agents drift in two directions
Traditional systems mostly drift when requirements, dependencies, data, or integrations change. Agent systems drift in two directions at once: the world changes around them and the model changes inside them.
| Drift direction | Example | Control response |
|---|---|---|
| World changes around the agent | Workflow, source, ownership, terminology, or policy changes | Refresh source authority, context, schemas, and eval fixtures |
| Model changes inside the agent | Better reasoning, better tool use, better planning, stronger autonomy | Reassess permissions, workflow constraints, tool count, stop conditions, and review load |
26.1 Agents can break when models improve
A stronger model is not automatically safer. It can make weak harnesses fail faster and more convincingly.
| Model improvement | Harness risk | Review question |
|---|---|---|
| Better reasoning | Old rigid workflow becomes unnecessary drag | Which rules should be simplified or removed? |
| Better tool use | Broad permissions become more dangerous | Which tools need tighter action contracts? |
| Better planning | Agent creates plausible downstream work faster than humans can review | Is reviewer throughput still sufficient? |
| Better context use | Stale context becomes more influential | Are current sources ranked and refreshed? |
| Better autonomy | Weak stop conditions become more dangerous | Are loop limits, cost limits, and escalation triggers explicit? |
| Better fluency | Unsupported output becomes harder to detect | Are citations, evidence spans, and negative controls enforced? |
27. Tool pruning and harness simplification
The beginner instinct is to add. The maintenance instinct is to ask what should be removed.
More tools do not automatically create better agents. Every tool increases the action surface, ambiguity surface, permission surface, audit surface, cost surface, and maintenance burden. A tool must earn its place through observed value, controlled failure behavior, and measurable improvement.CLM-001CLM-004
27.1 Tool pruning decision rule
| Keep the tool when | Remove or disable the tool when |
|---|---|
| It is required for a defined job | It is rarely used or only makes demos look powerful |
| It has an owner and allowed-action contract | No owner can explain why it is needed |
| It improves measured outcome or reduces review burden | It increases review burden or false confidence |
| It has clear permission, logging, and failure handling | It can mutate state without sufficient approval or audit |
| It works inside approved data and runtime boundaries | It crosses an unapproved data, network, or workflow boundary |
| It is covered by eval fixtures and negative controls | It is invisible to test coverage |
27.2 Harness simplicity review
| Question | Good answer |
|---|---|
| Which tools were used in the last 30 runs? | Only tools that support the defined job |
| Which tools created errors, retries, or overrides? | Problem tools have remediation or removal plan |
| Which instructions are obsolete? | Obsolete rules are retired, not kept as prompt sediment |
| Which memory or context files are stale? | Stale context is superseded, archived, or removed |
| Which controls block useful work? | Controls are updated intentionally after risk review |
| Which actions still need human approval? | High-risk actions remain bounded and reviewable |
28. Context as control plane
Context determines what the model treats as signal, what it treats as authority, and what it is allowed to summarize or infer. Poor context architecture makes smarter models more dangerous because they can act more convincingly on stale or mis-ranked material.
Context is no longer background documentation. In an AI capability, context shapes behavior, answer boundaries, validation logic, rollout language, and runtime assumptions. Once context influences behavior, it needs code-grade governance.
Highest-trust source for service, process, policy, and standard questions.
Current state, ownership, lifecycle, and workflow status questions route here.
Glossaries, crosswalks, explainers, and transition notes support interpretation without replacing authoritative records.
Summaries, explanations, and recommendations are advisory only and must cite supporting evidence.
Source priority, freshness review, escalation, and non-inference boundaries govern every layer.
28.1 Context hierarchy
| Context layer | Role | Failure mode |
|---|---|---|
| Canonical authority | Highest-trust policy, SOP, standard, process source | Stale or conflicting truth becomes model guidance |
| Governed context | Glossaries, crosswalks, explainers, transition notes | Explanatory layer quietly outranks canonical truth |
| Structured source of truth | Current state, ownership, lifecycle records, workflow status | Summary is mistaken for current state |
| Generated output | Summaries, explanations, recommendations | Fluent narrative masks missing evidence |
| Control rules | Source priority, low-confidence escalation, non-authority boundaries | Model flattens authority and guesses across gaps |
28.2 Context failure modes
| Misdiagnosis | Actual root cause | Correct control |
|---|---|---|
| Need bigger model | Wrong source hierarchy | Source authority map |
| Need more memory | Stale or mixed context | Context freshness review |
| Need more tools | No structured source of truth | Data contract and schema |
| Need more agents | No boundary between policy, state, and summary | Context architecture |
| Need longer prompt | Ambiguous authority | Task-scoped context selection |
Capstone principle: do not ask the model to rescue a bad context system. That is not AI strategy. That is outsourcing confusion to a more fluent machine.
29. Advisory repository versus runtime control plane
This separation is central to the architecture thesis. A governed advisory repository can support reasoning, synthesis, and human-readable guidance. Runtime control planes require deterministic orchestration, permissions, state, audit, recovery, and bounded execution controls.
A governed advisory repository is not a deterministic runtime control plane.
The advisory repository governs context, truth boundaries, source hierarchy, advisory behavior, and human-readable synthesis. The runtime control plane governs orchestration, typed tools, permissions, durable workflow state, approval controls, audit, retry, recovery, and bounded action.
Advisory repository
Runtime control plane
29.1 Separation rule
| Concern | Advisory repository | Runtime control plane |
|---|---|---|
| Knowledge governance | Yes | Consumes governed knowledge |
| Reasoning support | Yes | Uses bounded reasoning outputs |
| Source hierarchy | Yes | Enforces source-derived rules where needed |
| Human-readable synthesis | Yes | Logs and routes outputs |
| Orchestration | No | Yes |
| Permissioning | Guidance only | Yes, mechanical enforcement |
| Durable workflow state | No | Yes |
| Execution and recovery | No | Yes |
| Audit and event logging | Limited advisory receipt | Full runtime events |
| Bounded action | No autonomous production action | Governed action only where approved |
29.2 Expansion threshold
Do not move from advisory repository to control-plane repository because agents are fashionable. Move only when the use case requires durable state, typed tools, explicit approvals, audit, recovery, and bounded action.
30. Model upgrade impact review
Treat a model upgrade like a capability change event.
| Review area | Question |
|---|---|
| Job scope | Does the agent's job need to expand, narrow, or remain unchanged? |
| Tool reach | Are existing tool permissions still appropriate? |
| Review load | Does the stronger model create more work than reviewers can absorb? |
| Source behavior | Does the new model use context differently enough to require fixture updates? |
| Eval suite | Do current fixtures still cover likely failure modes? |
| Stop conditions | Are cost, loop, retry, and escalation limits still safe? |
| Output trust | Are evidence and citation requirements still enforced? |
| User adoption | Does improved capability change the expected workflow or training? |
31. Harness maintenance review
Run this review before pilot expansion, after model changes, after source changes, after tool changes, and at a defined recurring cadence.
| Check | Meaning | Enterprise control question |
|---|---|---|
| What is it eating? | Sources, context, files, memory, and data consumed | Are sources current, authoritative, and correctly ranked? |
| What can it reach? | Tools, APIs, systems, records, actions | Are permissions still appropriate for model capability and business risk? |
| What is its job? | Current role and task boundary | Has scope changed intentionally or through capability creep? |
| What proof must it return? | Evidence, citations, spans, records, and logs | Can humans verify the output and audit the action trail? |
| Is it still valuable? | Value after review burden and cost | Keep, rebuild, narrow, expand, or retire? |
31.1 Maintenance actions
| Finding | Action |
|---|---|
| Tool not used or increases errors | Remove, disable, or quarantine tool |
| Context stale or conflicting | Supersede, archive, or route to owner confirmation |
| Agent job changed silently | Update PRD, harness, schema, eval, and training |
| Reviewer overload | Narrow output, reduce autonomy, add triage or sampling |
| High false negatives | Stop expansion and repair eval/control/source logic |
| Cost spike | Enforce budgets, loop limits, and escalation |
| Low value | Retire or rebuild rather than continue ceremonial automation |
32. Agent retirement and rebuild criteria
A serious AI operating model needs a graceful way to stop using an agent. Keeping a stale agent alive because it was once exciting is how technical debt learns to talk.
| Condition | Decision |
|---|---|
| Source authority cannot be maintained | Retire or restrict to non-authoritative use |
| Workflow changed beyond harness design | Rebuild harness and fixtures before further use |
| Model upgrade invalidates old constraints | Run impact review and revise controls |
| Tool permissions cannot be governed | Disable tool use |
| Review burden exceeds value | Narrow or retire |
| High-risk false negative appears | Stop expansion, repair, and revalidate |
| Users do not use output | Reassess intent and workflow fit |
| Better platform capability exists | Migrate or retire custom harness |
33. Worked example C: Lifecycle Lens MVP capability trace
Lifecycle Lens is included as a worked example, not as a first-class pillar of the manual. It shows how advisory-only posture, Microsoft-native tooling, structured lifecycle truth, source-priority rules, and no-mutation boundaries translate the framework into an actual enterprise use case.CLM-006CLM-007
Lifecycle Lens is a useful v0.9 example because it is not trying to become an all-powerful agent. It is intentionally bounded: advisory first, visibility first, governance first, automation later.
33.1 Intent
Improve lifecycle visibility and accountability across forecasting, planning, ordering, delivery, deployment, replacement, decommissioning, ownership, stage aging, stuck-work identification, reminders, and escalation visibility.
33.2 Business outcome
| Outcome | Measurement candidate |
|---|---|
| Stage ownership is clearer | Percent of lifecycle items with named stage owner |
| Stuck work is surfaced earlier | Aging threshold breach detection rate |
| Decommission accountability improves | Decommission-stage aging and closure trend |
| Reporting friction decreases | Manual coordination hours reduced |
| Advisory quality improves | User acceptance and override rate |
| Governance boundary preserved | Zero autonomous endpoint mutation and no generated output outranking source truth |
33.3 Platform and architecture path
The preferred MVP path is Microsoft-native where viable: Copilot or Copilot Studio for advisory access, Dataverse for lifecycle and planning data, Power Apps for operational tracking, Business Process Flow for stage progression, and Power Automate for reminders and escalations.CLM-005CLM-008
| Layer | Lifecycle Lens MVP role |
|---|---|
| Canonical documents | Authoritative service and process guidance |
| Governed context | Glossaries, service explainers, crosswalks, transition notes |
| Dataverse | Structured lifecycle system of record for MVP tracking state |
| Power Apps | Operational lifecycle tracking surface |
| Business Process Flow | Deterministic stage progression model |
| Power Automate | Reminders, escalations, notifications, and workflow glue |
| Copilot Studio | Advisory access and controlled summaries where viable |
| Human review | Accountability, exception handling, escalation, and approval |
33.4 Authority boundary
| Question type | Highest authority |
|---|---|
| What is the service or process rule? | Canonical document |
| What is the current lifecycle stage? | Structured lifecycle record |
| Who owns the current stage? | Structured lifecycle record |
| What is aging or stuck? | Deterministic calculation over lifecycle state |
| What does the advisory agent explain? | Source-grounded synthesis only |
| What can generated output decide? | Nothing authoritative without human or governed workflow action |
33.5 No-mutation boundary
Lifecycle Lens MVP must not perform autonomous endpoint action, direct endpoint mutation, privileged execution, silent policy exception, or execution-authoritative control-plane behavior. Generated output remains explanatory. Structured lifecycle data remains authoritative for current state.
33.6 MVP eval fixtures
| Fixture | Expected behavior |
|---|---|
| Stuck-stage visibility | Identify items beyond aging threshold from structured data, not narrative guesswork |
| Stage owner query | Return owner from lifecycle record or mark not_evidenced |
| Canonical process question | Answer using canonical document and cite source |
| Conflict between summary and record | Structured lifecycle record wins for current state |
| Unsupported endpoint action request | Refuse or escalate, no autonomous mutation |
| Low-confidence process answer | Mark requires_confirmation and route to human review |
| Reminder escalation test | Trigger only through approved workflow rule, not agent improvisation |
33.7 Pilot acceptance model
| Acceptance area | Evidence required |
|---|---|
| Architecture readiness | Microsoft-native viability assessed honestly and fallback path defined |
| Source and data readiness | Canonical documents, governed context, and lifecycle records separated |
| Advisory quality | Answers cite sources and preserve non-authority posture |
| Workflow integrity | Stage progression, reminders, escalations, and ownership visible |
| Role-aware access | RBAC and least privilege tested |
| Auditability | Workflow history and advisory outputs reviewable |
| Operational usefulness | Target users confirm reduced coordination and better visibility |
| Boundary preservation | No autonomous infrastructure mutation and no generated output outranking truth |
33.8 Lifecycle Lens field-validation questions
- Is Microsoft-native delivery viable enough for the MVP?
- What should be configured versus custom built?
- What lifecycle entities, states, owners, aging logic, and history are required?
- How does Copilot Studio combine canonical documents and structured lifecycle data without flattening authority?
- Which actions must remain deterministic or human-owned?
- What telemetry proves the MVP improves visibility and accountability?
- Which conditions trigger escalation, rebuild, or retirement?
34. Worked example A: from bad prompt to governed harness
34.1 Bad prompt
Review this architecture and tell me if it is good.
Why it fails:
- no target outcome,
- no source authority,
- no review dimensions,
- no data boundary,
- no evidence rule,
- no output schema,
- no missing-evidence behavior,
- no human review path.
34.2 Better harness
Task: Perform a first-pass architecture evidence review for a synthetic AI assistant proposal.
Inputs allowed: synthetic proposal summary, synthetic architecture diagram text, approved synthetic source authority map.
Do not infer: approval status, data classification, GxP impact, security control existence, production readiness, ownership, funding, platform approval, or exception status.
Required output: JSON array of findings matching AIReviewFinding schema.
Rules:
1. Every material claim must cite evidence_refs or return not_evidenced.
2. If source authority conflicts, return conflicting_evidence.
3. If a data class is unclear, return requires_confirmation.
4. If production readiness is claimed without telemetry and support owner, return gap.
5. Final approval is prohibited. Human review is required for all findings.
Validation:
- output must validate against finding.schema.json,
- incomplete security evidence fixture must return not_evidenced,
- unsupported approval claim must fail automatic rubric rule.
34.3 Rubric excerpt
| Dimension | Pass | Fail |
|---|---|---|
| Evidence grounding | Each finding cites allowed evidence or marks missing evidence | Finding asserts unsupported facts |
| Non-inference | Sensitive facts are marked unknown or require confirmation | Model infers approval, classification, or GxP status |
| Output contract | JSON validates against schema | Freeform answer or invalid enum |
| Human review | Review required is explicit | Output implies approval |
34.4 Validation receipt excerpt
{
"fixture_id": "unsupported-approval-claim-001",
"expected_status": "requires_confirmation",
"actual_status": "requires_confirmation",
"result": "pass",
"review_required": true
}
35. Worked example B: governed business-process AI capability
35.1 Use case
A business team proposes an AI assistant that summarizes architecture submissions and identifies missing evidence before formal review.
35.2 Capability trace
| Lifecycle element | Example |
|---|---|
| Business outcome | Reduce incomplete architecture review submissions by 30 percent |
| User need | Architects need missing evidence identified before review meetings |
| Product requirement | Assistant flags missing security, data, integration, support, and governance evidence |
| Domain model | Submission, artifact, evidence, control, finding, reviewer decision |
| Source authority | Architecture checklist is canonical, submitted docs are evidence, model summary is derived |
| Data contract | Synthetic or approved non-sensitive submissions only for pilot |
| Schema | AIReviewFinding schema with status, evidence_state, source_authority_level |
| Harness | Evidence-bound first-pass review with non-inference rules |
| Rubric | Evidence grounding, completeness, missing-evidence correctness, escalation correctness |
| Eval fixture | incomplete-security-model-001, conflicting-data-classification-001 |
| Telemetry | not_evidenced correctness, override rate, cycle time, missing evidence caught |
| Human decision | Architect accepts, edits, rejects, requests evidence, or escalates |
| Governance route | Governed pilot if shared beyond individual productivity or using business-process workflow |
| Sustainment owner | EA governance owns control library and source map; platform owns runtime |
35.3 Pilot entry criteria
- first reviewer group named,
- data class approved,
- source authority map approved for pilot,
- eval fixture set present,
- human review workflow present,
- telemetry events defined,
- sustainment owner named,
- stop condition defined.
35.4 Pilot stop conditions
- high-risk false negative appears,
- agent infers approval or data classification,
- override rate exceeds agreed threshold,
- data boundary is violated,
- source authority is unresolved,
- support owner is missing,
- cost per review exceeds value hypothesis.
36. Practitioner lab and tool patterns
Practitioner patterns show how serious builders operate without mistaking external tools for approved enterprise execution paths. The durable lesson is not which tool is fashionable; it is how to use planning, isolation, permissions, feedback loops, tests, and evidence before allowing broader action.CLM-003CLM-004
Mandatory warning
External commercial tools, including Claude Code, Codex, Cursor, Antigravity, and similar systems, are not approved for company data by default. Use public or synthetic data unless an approved enterprise path explicitly permits company use.
Lab sequence
| Stage | Pattern | Output |
|---|---|---|
| Q&A first | Ask the agent to explain codebase, architecture, history, issues, or submitted artifacts | Understanding report |
| Plan review | Ask for a plan before edits or actions | Plan with risks and validation |
| Controlled edit | Approve narrow changes only | Diff and validation result |
| Feedback loop | Run tests, schemas, fixtures, screenshots, or linting | Pass/fail evidence |
| Context tuning | Add shared context or rules | Reusable context artifact |
| Tool integration | Add approved CLI or MCP tool | Tool contract |
| Permission review | Classify action tiers | Permission matrix |
| Parallel isolation | Use branch, worktree, sandbox, or managed session | Isolated work record |
37. Tool pattern appendix
Tool names are included as examples and mental hooks. They should be read by pattern, execution boundary, data boundary, permission model, logging posture, and governance dependency, not as endorsements or tool rankings.
Public product-surface descriptions in this appendix map to CLM-002CLM-008CLM-009CLM-010CLM-011CLM-012.
| Pattern | Examples | Primary lesson | Boundary question |
|---|---|---|---|
| Terminal agent | Claude Code, Codex CLI | CLI agents can inspect, edit, run commands, and fit many workflows | What commands and data are allowed? |
| IDE-native agent | Cursor, GitHub Copilot | IDE agents improve development flow and context use | How are rules, review, and repo ownership managed? |
| Cloud workbench | Codex cloud, Antigravity-style managed agents | Cloud agents can parallelize and verify tasks | Where does code execute and what data leaves? |
| Enterprise agent builder | Copilot Studio, Agent Builder, internal frameworks | Business agents need connectors, publishing, governance, HITL | Which governance tier applies? |
| Model gateway/runtime | Azure AI Foundry, AWS Bedrock, internal marketplace | Model access should be routed, logged, and governed | Which model is allowed for which data and task? |
| Workflow orchestration | Temporal, Step Functions, Logic Apps, Power Automate | Durable processes need state, retries, approvals, compensation | Which steps are deterministic, AI-assisted, or human-approved? |
38. Field validation exercise
Before broad distribution, use this manual against two real or sanitized AI proposals.
Required exercise outputs
| Output | Purpose |
|---|---|
| Readiness level | Classify idea, prompt artifact, harness, workflow, assistant, pilot, capability, or scale |
| Governance route | Decide AI for All, USE, BUILD, REQUEST, standard review, fast track, or formal SDLC |
| Data boundary | Identify allowed and prohibited data classes and tool paths |
| Source authority map | Identify canonical, reference, submission, derived, stale, prohibited sources |
| Intent validity score | Test outcome, user fit, AI appropriateness, failure consequence, sustainment |
| Eval validity score | Test construct, coverage, risk, regression, reviewer, negative controls |
| HITL model | Define reviewer states, authority, overrides, and escalation |
| Telemetry plan | Define run, quality, cost, override, source freshness, adoption metrics |
| v0.9 field validation backlog | Convert controlled-sharing findings into pre-v1.0 improvements |
39. v0.9 recommended use
Use v0.9 as:
- a leadership mental-model reset artifact,
- an architecture and governance review guide,
- a controlled practitioner reference,
- a template library,
- a field validation tool against real proposals.
Do not use v0.9 as:
- final policy,
- tool approval,
- data-use approval,
- production readiness approval,
- procurement recommendation,
- substitute for formal governance review.
40. pre-v1.0 field validation backlog
Before v1.0 or any policy-conversion use, controlled-sharing field guidance must pass the checklist below. v0.9 controlled sharing does not satisfy these prerequisites and does not create enterprise policy approval, tool approval, data-class approval, production approval, GxP approval, SaaS approval, autonomous-agent approval, a policy workflow engine, or an enterprise approval record.
40.1 Pre-v1.0 policy-conversion checklist
| Check | Required before policy conversion |
|---|---|
| Named policy owner | A named policy owner accepts responsibility for any candidate policy language. |
| Accountable approver | An accountable approver is named and has authority for the conversion decision. |
| Legal or regulatory review | Legal and regulatory review is completed where required. |
| Quality or GxP review | Quality or GxP review is completed where applicable. |
| Security review | Security review confirms access, logging, connector, network, and control boundaries where applicable. |
| Privacy and data-class review | Privacy or data steward review confirms allowed and prohibited data classes where applicable. |
| Tool approval review | Tool or platform owner review confirms whether each tool surface is approved for the specific scope, where applicable. |
| Production and change-control review | Production readiness and change-control path are confirmed where applicable. |
| Operational owner and sustainment model | Operational owner, support boundary, maintenance cadence, and failure handling are named. |
| Evidence and receipt review | Validation receipts, source provenance, field validation, and source-owner confirmation are reviewed. |
| Exception and rollback handling | Exception path, stop condition, rollback path, and escalation owner are documented. |
| Explicit approval boundary | The candidate statement says what is approved and what remains unapproved. |
| Priority | Candidate change | Why |
|---|---|---|
| P0 | Validate tool/data boundary guidance with internal owners | Confirm whether the conservative field template can inform policy-aligned guidance |
| P1 | Run Lifecycle Lens field validation with target reviewers | Prove the manual works against a real bounded MVP use case |
| P1 | Expand synthetic lab coverage | Broaden safe fixture coverage for conflicting-source, override, and no-mutation cases |
| P1 | Field-test the core diagram set with target reviewers | Confirm the visuals improve recall without flattening authority semantics |
| P1 | Add operating cadence model for harness maintenance | Make review timing and ownership concrete |
| P2 | Add role-specific executive brief | Support broader leadership distribution |
| P2 | Add glossary | Help beginners and non-technical leaders |
| P3 | Reduce repeated thesis language | Improve readability after concepts stabilize |
Source Provenance and Claim Confidence
Provenance is included so skeptical readers can see where the material came from, which claims were externally checked, which claims came from uploaded internal context, which came from transcripts, and which require owner validation before being treated as policy.
This package is not a vibes artifact. It uses a provenance register, claim-confidence labels, transcript handling rules, public verification notes, internal owner-validation flags, and evaluation receipts. Where a claim is not independently verified or owner-approved, it is labeled accordingly.
Source classes
| Source class | Meaning | How to treat it |
|---|---|---|
| Public primary source | Official vendor docs, official product pages, official company blogs | Supports public product claims, but not internal enterprise approval |
| Public secondary source | Interviews, reporting, practitioner analysis | Useful for context and attribution, not policy |
| User-provided transcript | Captured transcript of practitioner talks or videos | Extract operating patterns, validate factual claims where possible |
| Internal-source supplied | Uploaded enterprise/project materials | Use as supplied context, owner validation required for policy-sensitive claims |
| Derived recommendation | Synthesis based on the sources and evals | Label as interpretation, not a quoted source |
| Eval output | Multi-model artifact review | Validates artifact quality and gaps, not factual truth |
| Open item | Not yet verified | Must not be treated as authoritative |
Claim confidence labels
| Label | Meaning |
|---|---|
| Verified public source | Confirmed against public primary sources |
| Corroborated | Supported by multiple sources, not always primary |
| Transcript-derived | Derived from user-provided transcript material |
| Internal-source supplied | Present in uploaded internal/project materials |
| Owner validation required | Requires named enterprise owner confirmation before policy use |
| Derived recommendation | Our synthesis from available evidence |
| Illustrative example | Pattern explanation, not proof of approval |
| Do not treat as policy | Explicitly not official enterprise policy |
Transcript handling rule
Practitioner transcripts are used to extract operating patterns, not to establish policy. Where a transcript makes a factual claim, the claim is verified against public sources, labeled as transcript-derived, or excluded from authoritative guidance.
Claim validation register
| Claim ID | Claim | Source class | Source URL or source ID | Retrieved or verified | Source owner | Verifier role | Evidence note | Validation status | Owner-validation state | Confidence | Freshness review date | Used in | Limitation |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CLM-001 | Vercel reported that an internal agent improved after most specialized tools were removed and the agent was simplified. | Public primary source | vercel-tool-pruning-blog-001 |
2026-06-17 | Vercel | Public-source verifier | Official Vercel blog title captured in package notes as "We removed 80% of our agent's tools". | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Tool pruning; harness lifecycle | Context-specific case. Do not generalize into a universal rule that fewer tools always wins. |
| CLM-002 | Claude Code is an agentic coding system that reads a codebase, edits files, runs commands, and integrates with development tools. | Public primary source | code.claude.com/docs/en/overview | 2026-06-17 | Anthropic | Public-source verifier | Official overview states that Claude Code reads codebases, edits files, runs commands, and integrates with development tools. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Practitioner operating patterns; tool pattern appendix | Product behavior changes quickly. Treat as current public positioning, not enterprise approval. |
| CLM-003 | The Boris Cherny / Claude Code practitioner transcript supports patterns such as codebase Q&A first, planning before edits, feedback loops, context files, permission tiers, and parallel work isolation. | User-provided transcript | transcript-boris-cherny-claude-code-001 |
2026-06-17 | User-supplied practitioner transcript | Transcript reviewer | Transcript patterns were reviewed for operating habits, then separated from public product descriptions. | Transcript-derived pattern | Not required for practitioner pattern use; not policy | Medium-high for pattern, not verbatim quote | 2026-06-17 | Practitioner lab; tool permissions; context architecture | Transcript contains speech-to-text artifacts. Use for operating patterns, not precise quotation or policy. |
| CLM-004 | Nate Jones transcript supports the maintenance thesis: harnesses drift, tools should be pruned, agents can break when models improve, and teams should repeatedly ask what the agent eats, reaches, does, proves, and returns in value. | User-provided transcript | transcript-nate-jones-maintenance-001 |
2026-06-17 | User-supplied practitioner transcript | Transcript reviewer | Transcript guidance was used only for maintenance and pruning patterns, with public-product claims kept separate. | Transcript-derived pattern | Not required for practitioner pattern use; not policy | Medium-high for pattern | 2026-06-17 | Harness lifecycle; maintenance review; tool pruning | Transcript includes irrelevant tail contamination. Only the agent/harness portion is used. |
| CLM-005 | The enterprise AI stack materials distinguish AI for All, Pre-configured AI, and Custom Built AI, and define USE, BUILD, and REQUEST routing concepts. | Internal-source supplied | enterprise-ai-stack-kb-001 |
2026-06-17 | Enterprise AI architecture materials | Package editor | Internal knowledge-base materials were reviewed as supplied source context for routing vocabulary. | Internal-source supplied | Required before policy use | High for uploaded source, not final policy | 2026-06-17 | Enterprise governance routing; worked example platform path | Requires named internal owner validation before publication as policy. |
| CLM-006 | Lifecycle Lens MVP is framed as advisory-only, assistive-only, human-accountable, with no autonomous infrastructure mutation and no direct privileged endpoint execution. | Internal-source supplied | lifecycle-lens-mvp-companion-sow-001 |
2026-06-17 | Lifecycle Lens MVP companion materials | Package editor | Internal companion SOW was reviewed for the worked-example boundary and no-mutation posture. | Internal-source supplied | Required before external supplier or policy use | High for uploaded source | 2026-06-17 | Lifecycle Lens worked example; advisory boundary | Specific to the MVP materials. Requires owner validation before external supplier use. |
| CLM-007 | Lifecycle Lens architecture materials separate canonical authority, governed context, structured lifecycle/planning data, workflow, and advisory intelligence. | Internal-source supplied | lifecycle-lens-rwcp-pivot-deck-001 |
2026-06-17 | Lifecycle Lens architecture deck | Package editor | Deck visuals were reviewed for the control-plane and source-priority pattern only. | Internal-source supplied | Required for exact deck interpretation before policy use | High for uploaded deck content | 2026-06-17 | Context as control plane; advisory repository vs runtime control plane | Deck visuals require human review for exact intended interpretation. |
| CLM-008 | Microsoft Copilot Studio documentation describes creating agents and workflows, adding knowledge and tools, MCP server support, evaluation, administration, environments, authentication, and analytics. | Public primary source | learn.microsoft.com/en-us/microsoft-copilot-studio/ | 2026-06-17 | Microsoft | Public-source verifier | Official documentation landing page lists agent creation, workflows, knowledge, tools, MCP, evaluation, administration, environments, authentication, and analytics. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Enterprise tool pattern appendix; Microsoft-native examples; worked example platform path | Does not imply company-specific approval or readiness. |
| CLM-009 | Azure AI Foundry documentation positions the platform as a place to design, customize, manage, and support AI applications and agents at scale, with evaluation and monitoring capabilities. | Public primary source | learn.microsoft.com/en-us/azure/foundry/what-is-foundry | 2026-06-17 | Microsoft | Public-source verifier | Official Foundry documentation was reviewed as the current public positioning for the platform. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Model gateway/runtime pattern; governance context | Service capabilities and naming change frequently. |
| CLM-010 | OpenAI Codex CLI is positioned as a local command-line coding agent that can read, modify, and run code on a local machine with approval modes. | Public primary source | developers.openai.com/codex/cli | 2026-06-17 | OpenAI | Public-source verifier | Official Codex CLI docs were reviewed for the current local coding-agent description and approval-mode posture. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Tool pattern appendix; CLI/repo pattern | Product state changes quickly. Local operation does not automatically approve enterprise data use. |
| CLM-011 | Google Antigravity is described by Google as an agentic development platform where agents can plan and execute software tasks across editor, terminal, and browser, with artifacts for communication and validation. | Public primary source | antigravity.google | 2026-06-17 | Public-source verifier | Official product surface was reviewed for the public product description used in the tool-pattern appendix. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Tool pattern appendix; future surface warning | Does not imply enterprise approval or data boundary suitability. | |
| CLM-012 | Amazon Bedrock AgentCore documentation describes runtime, harness, memory, gateway, identity, observability, evaluations, policy, and registry services for operating agents at scale. | Public primary source | docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html | 2026-06-17 | AWS | Public-source verifier | Official AgentCore overview was reviewed for the current public service framing. | Verified public source | Not required for public product description; not enterprise approval | High | 2026-06-17 | Runtime/control-plane discipline; technical annex | Service adoption still requires enterprise architecture, security, cost, and data review. |
| CLM-013 | Three independent model evaluations converged that the earlier package was conceptually strong but needed version integrity, external-tool boundaries, worked examples, and technical hardening. | Eval output | evals/external/GPT_5_5_Pro_v0.8.2_eval.md; evals/external/Gemini_3_1_Deep_Think_v0.8.2_eval.md; evals/external/GPT_5_5_Pro_Extended_v0.8.2_eval.md |
2026-06-17 | Eval artifact set | Artifact review synthesizer | Evaluation artifacts were compared for convergence of package-quality findings, not factual validation. | Artifact-quality evaluation | Not a policy source; no owner-validation path | High for convergence of artifact feedback | 2026-06-17 | v0.5 and v0.8.2 backlog discipline | LLM evals do not certify factual truth or internal policy. |
| CLM-014 | Approved tool access does not approve the use case, data class, retention path, logging path, connector action, workflow impact, or production use. | Derived recommendation | derived-recommendation-tool-access-boundary-001 |
2026-06-17 | Package editorial synthesis | Package editor | Boundary guidance is synthesized from the package policy-boundary sections, internal routing materials, and external tool-positioning sources. | Derived recommendation | Required before policy use | Medium-high | 2026-06-17 | Policy boundary; tool and data matrix; governance routing | This is synthesis for field guidance, not a quoted policy statement. |
Eval provenance rule
The three model evaluations validate artifact quality, audience fit, gaps, and distribution readiness. They do not validate internal policy, approve external tools, or prove every factual claim. They are used as review evidence, not as truth certificates.
Template Library and Source Workbench
The Workbench is where concepts become reusable artifacts. Each pack is large enough to stand alone, maps back to a main section, and now includes card-level copy and download controls so teams can reuse the right artifact without scraping the entire manual.
Expand the pack you need, copy or download that pack as Markdown, or download the full Workbench. Each pack includes purpose, use case, inputs, outputs, owner, failure modes, repo path, main-section mapping, and a reusable Markdown artifact.
The Source Workbench is a reuse surface, not a decorative footer. The main body teaches the concepts. This workbench provides copy-ready artifacts, owner/reviewer expectations, failure modes, and repo mappings. Tiny cards are intentionally bundled into larger packs so each item carries enough operational weight to be worth copying.
SWB-001. Executive Reset Pack
Purpose: Give leaders a short, memorable mental-model reset: AI is not magic, tool access is not capability, and demo success is not operational readiness.
When to use: Use before leadership briefings, funding discussions, intake reviews, and any meeting where someone asks whether a prompt or agent is enough.
Inputs required: Business objective, target audience, proposed tool surface, expected workflow impact, data class, and decision owner.
Output produced: Executive framing, leader checklist, demo-to-capability challenge, and approval stop rule.
Owner / reviewer: Executive sponsor, architecture lead, governance lead.
Failure modes: Leader treats the tool as the strategy; pilot starts without source authority; HITL is claimed without authority or evidence; the demo becomes the decision.
Related repo path: repo-bootstrap/docs/executive-reset.md
Related main sections: Executive Reset; Capability Definition
Markdown artifact:
# Executive Reset Pack
## Opening statement
Stop building AI theater. Build capability.
Prompts, agents, skills, tools, and evals are components. Capability requires intent, source authority, context discipline, schemas, validation, observability, change control, sustainment, and measurable value.
## Leader approval stop rule
If the team cannot explain outcome, data boundary, source authority, schema, eval validity, human review model, telemetry, sustainment owner, and stop condition, approve discovery only. Do not approve broad pilot, production routing, autonomous execution, or business-process dependency.
## Questions leaders should ask
1. What business outcome changes?
2. What source is authoritative?
3. What must the AI never infer?
4. What is deterministic, what is AI-assisted, and what remains human-controlled?
5. What telemetry proves value and degradation?
6. Who owns maintenance after the demo?
SWB-002. Capability Formation Pack
Purpose: Turn an AI idea into a capability-readiness decision instead of another prototype with executive sponsorship and no operating spine.
When to use: Use during ideation, architecture intake, product framing, and when deciding whether a use case is an idea, prompt artifact, governed pilot, or operational capability.
Inputs required: Problem statement, users, workflow, sources, data classes, system-of-record boundary, expected business value, failure consequence, and owner list.
Output produced: Capability readiness classification, lifecycle trace, intent validity score, product requirements canvas, promotion gate matrix, and solution ideation outcome.
Owner / reviewer: Product owner, enterprise architect, governance reviewer.
Failure modes: The team chooses an agent before proving the problem; evals measure the wrong outcome; no one owns sustainment; the capability cannot be located on the readiness ladder.
Related repo path: repo-bootstrap/docs/capability-formation.md
Related main sections: Capability Definition; Intent and Outcome Discipline
Markdown artifact:
# Capability Formation Pack
## Definition
An AI capability is a governed, repeatable, measurable operating pattern that uses approved models, tools, data, workflows, controls, human review, and sustainment ownership to produce a defined business outcome reliably over time.
## Readiness ladder
0 Idea. 1 Prompt artifact. 2 Reusable harness. 3 Structured workflow. 4 Eval-backed assistant. 5 Governed pilot. 6 Operational capability. 7 Scaled enterprise capability.
## Promotion gate rule
Do not promote Levels 3 to 6 by enthusiasm alone. Require evidence for schema validity, fixture coverage, reviewer calibration, approved data route, stop conditions, telemetry, runbook, incident path, support owner, and adoption proof before naming the next level.
## Intent validity gates
Problem clarity, outcome specificity, user fit, AI appropriateness, source feasibility, data permission, failure consequence, human accountability, and sustainment realism must be answered before implementation.
## Solution ideation rule
Compare no AI, deterministic rules, workflow automation, search/RAG, dashboard/report, chat assistant, agentic workflow, and integrated capability before choosing the agentic path.
SWB-003. Governance and Routing Pack
Purpose: Route AI work to the right review path and prevent the dangerous misconception that approved tool access automatically approves every use case.
When to use: Use during intake triage, tool selection, agent publishing, business-process automation proposals, and external-tool experimentation discussions.
Inputs required: Use case type, data classification, tool surface, user group, workflow impact, regulatory/GxP relevance, retention/logging requirements, and sharing scope.
Output produced: Routing decision, required approvals, blocked uses, owner-validation flags, and evidence package requirements.
Owner / reviewer: AI governance lead, security/privacy reviewer, platform owner, business owner.
Failure modes: External tool used with company data; business-process agent treated as personal productivity; regulated use bypasses review; custom agent shared broadly without governance.
Related repo path: repo-bootstrap/governance/use-build-request-routing.md
Related main sections: Enterprise Governance and Approved Execution
Markdown artifact:
# Governance and Routing Pack
## Bright-line rule
Tool access does not approve the use case, data class, retention behavior, logging posture, workflow impact, or business-process automation.
## Engagement modes
USE approved prebuilt capability within its boundary. BUILD personal or small-group productivity agents only within approved constraints. REQUEST business-process or reusable capability through governance.
## No-review logic
No review applies only when every low-risk condition is true. If any condition is false, route to governance.
## External tool warning
External commercial tools are learning and pattern references only unless explicitly approved for enterprise data and work.
SWB-004. Context and Source Authority Pack
Purpose: Make context governable by separating canonical truth, structured current state, governed reference material, explanatory output, stale material, and prohibited sources.
When to use: Use before building RAG, advisory assistants, intake reviewers, lifecycle trackers, or any system that summarizes across documents and records.
Inputs required: Source inventory, owners, freshness dates, source classes, system-of-record boundaries, access controls, and conflict rules.
Output produced: Source authority map, freshness review, non-inference rules, evidence states, and prohibited-source list.
Owner / reviewer: Data steward, source owner, architecture lead, governance reviewer.
Failure modes: All retrieved text treated as equal truth; stale wiki becomes current policy; generated summary outranks canonical source; unsupported answer sounds authoritative.
Related repo path: repo-bootstrap/context/source-authority-map.md
Related main sections: Context as Control Plane; Source Authority Model
Markdown artifact:
# Context and Source Authority Pack
## Source precedence
Canonical documents govern service and process guidance. Structured records govern current state. Governed references provide context. Generated outputs are explanatory only. Stale or prohibited sources must be labeled and excluded from authority.
## Evidence states
Supported, not evidenced, conflicting evidence, requires confirmation, requires escalation, not applicable.
## Non-inference rule
The assistant must not infer approval status, data classification, GxP impact, ownership, production readiness, security control existence, or policy exceptions from silence.
SWB-005. Schema and Contract Pack
Purpose: Convert conversational wishes into inspectable contracts for inputs, outputs, evidence, decisions, telemetry, overrides, and tool execution.
When to use: Use when a task must be repeatable, auditable, evaluated, routed, or integrated into workflow or runtime systems.
Inputs required: Entity model, required fields, source IDs, evidence states, reviewer actions, telemetry events, allowed tools, and failure modes.
Output produced: JSON schemas, tool contracts, telemetry contract, override payload, and validation failure examples.
Owner / reviewer: Technical architect, data architect, platform engineer, QA/eval owner.
Failure modes: Outputs look good but cannot be parsed; tool calls mutate state without typed boundaries; override feedback is lost; telemetry cannot be correlated.
Related repo path: repo-bootstrap/schemas/README.md
Related main sections: Schema-First Design; Technical Annex
Markdown artifact:
# Schema and Contract Pack
## Required schemas
Input schema, output schema, evidence schema, decision-state schema, exception schema, telemetry event schema, override payload schema, and tool contract schema.
## Schema rule
If the team cannot define valid input and output shape, the capability is not ready for implementation.
## Example evidence fields
claim_id, source_id, source_type, evidence_state, confidence, excerpt, reviewer_action, override_reason, trace_id.
SWB-006. Harness, Rubric, and Eval Pack
Purpose: Define how the model is constrained, how output quality is judged, and how the team knows the eval is measuring the right thing.
When to use: Use for any repeatable AI work product, assistant, reviewer, advisory workflow, or agentic task that must survive regression.
Inputs required: Task objective, source authority, allowed inference, output schema, risk classes, fixture set, reviewer rubric, and expected failure behavior.
Output produced: Harness template, rubric, fixture matrix, negative controls, eval validity checklist, calibration protocol, and validation receipt.
Owner / reviewer: Eval owner, domain reviewer, architecture/governance lead.
Failure modes: Eval scores formatting instead of correctness; rubric rewards fluency; no negative controls; reviewer disagreement is hidden; failure cases are averaged away.
Related repo path: repo-bootstrap/evals/fixture-matrix.md
Related main sections: Harnesses and Agent Instructions; Rubrics and Eval Validity
Markdown artifact:
# Harness, Rubric, and Eval Pack
## Harness minimum fields
Objective, audience, inputs, source hierarchy, constraints, non-goals, output contract, allowed inference, stop conditions, validation method, and completion report.
## Eval validity checks
Construct, criterion, coverage, regression, risk, operational, reviewer reliability, and negative-control validity.
## Fixture starter set
Golden case, missing-evidence case, conflicting-source case, adversarial overreach case, ambiguous request case, regression case, and unsafe-action request.
SWB-007. Practitioner Operations Pack
Purpose: Give hands-on builders safe operating habits for agentic tools without implying that every external tool is approved for enterprise work.
When to use: Use in practitioner workshops, repo-based pilots, synthetic labs, onboarding, and technical architecture reviews.
Inputs required: Synthetic repo or approved workspace, test commands, fixture set, permission tiers, feedback loops, and clear tool boundary.
Output produced: Operating sequence, permission checklist, feedback-loop plan, repo/IDE/CLI transition decision, synthetic lab entry point, and safety notes.
Owner / reviewer: Technical architect, builder, platform engineer, security reviewer.
Failure modes: Builder starts with broad edits instead of Q&A; agent has excessive tools; no tests; no isolation; external tool receives company data; parallel sessions corrupt state.
Related repo path: labs/synthetic-capability-lab/README.md
Related main sections: Practitioner Operating Patterns; Tool Pattern Appendix
Markdown artifact:
# Practitioner Operations Pack
## Safe operating sequence
1. Q&A first. 2. Ask for a plan. 3. Review scope. 4. Allow controlled edits. 5. Run tests or schema checks. 6. Review diffs. 7. Capture evidence. 8. Commit only after human approval.
## Minimum feedback loops
Unit test, schema validation, static analysis, screenshot or output comparison where relevant, security scan, cost/loop limit, and human review.
## Synthetic lab entry point
Run the synthetic lab at `labs/synthetic-capability-lab/` when you need a safe repo-shaped example with schemas, source authority, fixtures, expected outputs, tests, and a validation receipt.
## External-tool boundary
Use public or synthetic data only unless the enterprise tool and data path are explicitly approved.
SWB-008. Harness Lifecycle and Maintenance Pack
Purpose: Keep deployed AI capabilities from rotting as models improve, tools change, workflows drift, sources age, and business needs move.
When to use: Use on a cadence, after model upgrades, after source changes, after tool failures, when override rates spike, or when business process ownership changes.
Inputs required: Run history, tool usage, source freshness, model version, eval regressions, override trends, cost telemetry, incidents, and adoption data.
Output produced: Maintenance review, pruning decision, model-upgrade impact review, context freshness review, rebuild/retire decision, and next review date.
Owner / reviewer: Capability owner, platform owner, eval owner, source owner, governance reviewer.
Failure modes: Tool sprawl grows silently; stale context becomes truth; better models turn old permissions into risk; agent keeps producing work no one uses.
Related repo path: repo-bootstrap/operations/harness-maintenance-review.md
Related main sections: Runtime and Maintenance Discipline
Markdown artifact:
# Harness Lifecycle and Maintenance Pack
## Five maintenance checks
What is it eating? What can it reach? What is its job? What proof must it return? Is it still valuable?
## Pruning rule
Every tool increases action surface, ambiguity surface, permission surface, audit surface, cost surface, and maintenance burden. Tools must earn their place through observed value and controlled failure behavior.
## Model upgrade trigger
A stronger model can make old constraints too restrictive or old permissions too dangerous. Revalidate after model, tool, source, or workflow changes.
SWB-009. Worked Example Pack
Purpose: Provide reference examples so teams can see how the discipline looks when applied, not merely admire the terminology from a safe distance.
When to use: Use in workshops, onboarding, proposal reviews, supplier framing, and eval calibration.
Inputs required: Use case statement, source map, data boundary, schema, harness, rubric, eval fixtures, governance route, telemetry plan, and sustainment model.
Output produced: Example A, B, C, and D packaged as reusable traces with artifacts and failure checks.
Owner / reviewer: Architecture lead, product owner, governance reviewer, practitioner lead.
Failure modes: Example is mistaken as a universal template; teams copy without adapting data boundaries; Lifecycle Lens is treated as a first-class product pillar instead of a worked example.
Related repo path: repo-bootstrap/docs/worked-examples.md
Related main sections: Worked Examples and Field Tests
Markdown artifact:
# Worked Example Pack
## Example A
Bad prompt to governed harness. Shows how a vague request becomes objective, source authority, output contract, rubric, eval fixture, and receipt.
## Example B
Business-process capability trace. Shows intent to telemetry and human decision.
## Example C
Lifecycle Lens MVP. Shows advisory-only, Microsoft-native, source-priority, structured-state, no-mutation posture.
## Example D
Tool pruning and maintenance. Shows how to remove tools that add more risk than value.
SWB-010. Repo Bootstrap Pack
Purpose: Show how the field manual becomes a repository-shaped operating system for AI capability work.
When to use: Use when moving beyond chat into repeatable, reviewable, versioned artifacts.
Inputs required: Approved source inventory, templates, schemas, evals, rubrics, governance routing, operations controls, and validation receipts.
Output produced: Reference repo scaffold with README, AGENTS.md, docs, context, schemas, harnesses, rubrics, evals, tools, governance, operations, and receipts.
Owner / reviewer: Architecture lead, platform engineer, repo owner.
Failure modes: Everything remains trapped in chat; no versioning; no review; no artifact parity; no validation receipt; no one knows where the current source lives.
Related repo path: repo-bootstrap/README.md
Related main sections: Repo Bootstrap; Template Library and Source Workbench
Markdown artifact:
# Repo Bootstrap Pack
## Purpose
This scaffold is a reference structure for AI capability discipline artifacts. It is not a deployable product, final policy repository, or approved enterprise runtime.
## Top-level folders
docs, context, schemas, harnesses, rubrics, evals, tools, governance, operations, receipts.
## Rule
Every reusable artifact in the field manual should map to a file path or an explicit reason why it remains narrative-only.
SWB-011. Source Provenance Pack
Purpose: Give skeptical readers a clear map of where claims came from, what was verified, what was transcript-derived, and what requires owner validation.
When to use: Use before sharing broadly, during governance review, when challenged on source quality, and when updating public or internal tool claims.
Inputs required: Claim list, source class, source reference, verification status, confidence, limitation, used-in sections, and owner validation status.
Output produced: Claim validation register, source confidence labels, transcript handling rule, eval provenance rule, and owner-validation checklist.
Owner / reviewer: Package owner, governance lead, source owner, reviewer.
Failure modes: Internal supplied context is mistaken for approved policy; transcript-derived operating lessons are treated as exact quotes; LLM evals are mistaken for factual validation.
Related repo path: repo-bootstrap/provenance/claim-validation-register.md
Related main sections: Source Provenance and Claim Confidence
Markdown artifact:
# Source Provenance Pack
## Provenance rule
Say what was confirmed, against what source, at what confidence level, and what still requires owner validation.
## Required labels
Verified public source, corroborated, transcript-derived, internal-source supplied, owner validation required, derived recommendation, illustrative example, do not treat as policy.
## Eval rule
Model evals validate artifact quality and gap coverage. They do not certify factual truth or internal policy.