# Workload-To-Model Routing Discipline

## Decision

Model choice is an architecture decision, not a convenience default.

AI capability teams should route work by task type, source boundary, risk, latency, cost, and required reasoning depth. They should not reflexively send every task to the most expensive frontier model.

Core principle:

```text
Use the weakest sufficient model.
Escalate only when task risk, ambiguity, novelty, or consequence requires it.
Separate source retrieval from reasoning.
Separate drafting from approval.
Separate cheap first-pass generation from premium final review.
Do not use a model when deterministic execution is available.
```

This is a doctrine and architecture note only. It does not implement runtime routing, model calls, provider configuration, backend services, benchmark harnesses, or local model execution.

## Routing Starts With Workload Classification

Before choosing a model, classify the work. A routing decision should describe what the task is, what source authority applies, what consequence attaches to a wrong answer, and what review is required before the output can be used.

| Workload class | Default routing posture | Escalation signal |
|---|---|---|
| Routing and classification | Rules, templates, structured transforms, or a small hosted model when the labels are stable. | Ambiguous labels, high-impact misclassification, or new taxonomy. |
| Summarization over approved source | Small or mid-tier model when source is bounded and citations are strong. | Conflicting source, missing evidence, regulated context, or high-consequence summary use. |
| Search/result explanation | Retrieval first, then small or mid-tier explanation over retrieved approved source. | Weak citations, cross-source conflict, or explanation that drives a decision. |
| Extraction and normalization | Deterministic parser, schema transform, or small model with validation. | Messy input, source ambiguity, low tolerance for field error, or downstream automation. |
| Transformation into structured output | Rules/templates where structure is known; small or mid-tier model for tolerant drafts. | Contract-critical schema, safety-sensitive routing, or low-error tolerance. |
| Style/tone rewrite | Small or cheap model when facts are already approved and no new claims are allowed. | Executive, legal, policy, or external communication with approval implications. |
| Simple code generation | Coding-specialized or mid-tier model for low-risk snippets with deterministic tests. | Security-sensitive code, production path, architecture choice, or hard debugging. |
| Complex code architecture or debugging | Coding-specialized model or frontier reasoning model with source, logs, tests, and human review. | Production impact, concurrency, security, data loss, or unclear root cause. |
| Design/UI ideation | Small, mid-tier, multimodal, or frontier model depending on novelty and audience. | Accessibility, regulated workflow, executive presentation, or product-critical design decision. |
| Proposal assessment against a known rubric | Deterministic rubric scaffold plus small or mid-tier model for first pass when evidence is clear. | Missing evidence, conflicting evidence, high-consequence finding, or nuanced governance judgment. |
| Open-ended reasoning | Mid-tier model for low-risk exploration; frontier reasoning model for novel synthesis. | Novel strategy, architecture tradeoff, unclear evidence, or material consequence. |
| High-risk policy, legal, medical, regulatory, GxP, security, or production-impacting judgment | Human-led review with premium reasoning only as decision support where permitted. | Any use that could imply approval, compliance, safety, security, patient, legal, or production authority. |
| External pulse-check or current-market scan | Tool-using/search-enabled model only in an explicitly labeled external mode. | Need to separate current external context from approved corpus guidance. |
| Deterministic rendering, validation, hashing, packaging, and release checks | Non-model deterministic execution. | Escalate only to explain failures, not to replace the deterministic check. |

## Model And Execution Tiers

Model tier is not the same as provider brand. The tier describes the execution shape and control posture.

| Tier | Typical use | Boundary |
|---|---|---|
| Deterministic/non-model path | Rendering, hashing, validation, schema checks, packaging, release checks, exact transformations. | Preferred whenever the result can be computed or verified deterministically. |
| Rules/templates/structured transforms | Known mappings, standard response shapes, document assembly, rubric scaffolds, form normalization. | Keep business rules visible and testable. |
| Small/cheap hosted model | Classification, bounded summarization, rewrite, extraction, first-pass drafts, low-risk explanation. | Requires data-use approval and output review appropriate to consequence. |
| Mid-tier general model | General drafting, explanation, synthesis over bounded source, moderate ambiguity. | Should not become the default when simpler tiers are sufficient. |
| Frontier reasoning model | Ambiguous evidence, high-consequence synthesis, architecture tradeoffs, complex proposal assessment. | Use deliberately because cost, latency, and authority risk are higher. |
| Coding-specialized model | Code generation, debugging, refactoring plans, test reasoning, implementation assistance. | Must remain bounded by tests, review, security posture, and repository invariants. |
| Local/open-weight model | Personal experiments, public-corpus work, lab evaluation, or approved enterprise use. | No specific model is enterprise-approved merely because weights are available. |
| Edge/offline model | Low-connectivity, privacy-sensitive, device-local, or offline use cases. | Requires hardware, update, monitoring, support, and failure-mode review. |
| Multimodal model | Image, diagram, UI, document-layout, audio, or video interpretation and generation. | Use only when the task genuinely requires non-text modality handling. |
| Tool-using/search-enabled model | Current external scan, governed retrieval, source lookup, and explicitly labeled pulse-checks. | Tool access is not approval to use external sources as source truth. |

## Routing Criteria

Every routing decision should record enough evidence for review:

| Criterion | Question |
|---|---|
| Task type | Is the work classification, retrieval, summarization, transformation, drafting, assessment, coding, design, or decision support? |
| Ambiguity | Are instructions, sources, or expected outputs clear enough for a cheaper tier? |
| Novelty | Has this task pattern been validated before, or is the team synthesizing something new? |
| Consequence | What happens if the output is wrong, incomplete, misleading, late, or overconfident? |
| Data sensitivity | What data classes, source boundaries, privacy limits, and hosting restrictions apply? |
| Source-grounding strength | Are sources approved, bounded, current, cited, and sufficient for the answer? |
| Latency target | Is response time part of the capability promise or only an operator convenience? |
| Cost tolerance | Is the cost acceptable for expected volume, retries, drafts, review loops, and failure cases? |
| Review requirement | Who reviews the output before it can influence a decision or user-facing artifact? |
| Auditability | Can the team reconstruct source inputs, routing choice, model tier, output, review, and final disposition? |
| Fallback requirement | What happens when the chosen tier refuses, times out, returns low confidence, or fails validation? |
| Approval boundary | Does the output imply tool, data, workflow, production, legal, security, or policy approval? |

## Premium Reasoning Escalation Criteria

Premium reasoning is justified when a cheaper tier cannot handle the risk or ambiguity with reviewable evidence.

Escalate to a frontier reasoning model, coding-specialized model, or stronger reviewer path when one or more of these conditions applies:

- ambiguous or conflicting source evidence
- high-consequence decision support
- architecture or production-impacting recommendation
- novel design or strategy synthesis
- high-risk regulatory, security, GxP, policy, legal, medical, or production context
- complex proposal assessment against multiple criteria
- unresolved source gaps requiring careful uncertainty handling
- complex code debugging where local tests, logs, and simpler analysis have not isolated the issue
- external pulse-check findings that may materially change a recommendation

Escalation should not erase the lower-tier evidence. The stronger review should see the source set, the cheaper model output if used, validation results, uncertainty, and the requested decision boundary.

## When Cheaper Models Are Appropriate

Cheaper models are appropriate when the task is bounded, reviewable, and low enough risk for the tier.

Good candidates include:

- Find and Explain over an approved corpus where citations are strong
- summarization of bounded approved source
- classification or routing with stable labels
- formatting and style rewrite without new claims
- low-risk draft generation
- extraction and normalization with deterministic validation
- deterministic-context assessment scaffolding
- first-pass output that will be reviewed by a stronger model or a human
- internal ideation where output is clearly non-authoritative

Cheaper model use still requires source, data, and approval discipline. Low cost does not make a model safe, approved, or authoritative.

## Deterministic First

Do not use a model when deterministic execution is available.

Examples that should usually stay deterministic:

- rendering generated pages
- hashing files
- validating manifests
- checking schemas
- packaging release artifacts
- running tests
- formatting known templates
- calculating exact counts
- applying known mappings
- verifying required files or routes

A model may help explain a deterministic failure, draft a remediation plan, or summarize validation output. It should not replace the deterministic check that proves the artifact state.

## Separate Retrieval, Reasoning, Drafting, And Approval

Model routing should avoid bundling distinct responsibilities into one opaque call.

| Responsibility | Routing rule |
|---|---|
| Source retrieval | Prefer deterministic lookup, approved indexes, manifests, section maps, and explicit current-search mode when needed. |
| Reasoning | Use the weakest sufficient reasoning tier over the retrieved source set. |
| Drafting | Allow cheaper first drafts when the source basis and review boundary are clear. |
| Final review | Use human review, stronger model review, or both when consequence requires it. |
| Approval | Keep approval outside the model unless a governed approval system explicitly authorizes the workflow. |

This separation supports the source-grounded semantic explainer: Find can be retrieval-heavy, Explain can be cheaper when citations are strong, and Assess may require premium reasoning when evidence is missing, conflicting, or high consequence.

## Open-Weight, Local, And Edge Posture

Local or open-weight models may be evaluated for personal, public-corpus, lab, or enterprise use. Evaluation is not endorsement.

Emerging open or open-weight classes such as GLM-class, Qwen-class, DeepSeek-class, Llama-class, Mistral-class, and similar future models may be useful candidates for experimentation or controlled evaluation. This note makes no benchmark claim and does not treat community hype, leaderboard movement, availability of weights, or local execution as evidence of enterprise approval.

Enterprise use must remain gated by:

- data classification
- security review
- legal and licensing review
- export/control review
- supplier-risk review
- model provenance
- operational support
- monitoring and fallback expectations
- allowed hosting and hardware posture
- logging, retention, and update process
- incident response and withdrawal path

Foreign-origin providers, externally hosted non-enterprise models, and externally sourced weights require explicit review of source/data classes allowed to leave the environment, supply-chain posture, licensing and redistribution terms, export/control limits, and model provenance. Availability is not permission.

## Cost And Performance Discipline

Cost discipline should be designed before volume grows.

Teams should estimate:

- expected task volume
- retry rate
- draft and review loops
- prompt and source size
- context resend strategy
- latency target
- cacheability
- deterministic pre-processing options
- cheaper first-pass path
- premium review rate
- fallback path when the cheap tier fails

Do not treat subscription packaging, promotional pricing, current token rates, or platform credits as stable architecture facts. They can change. Tokenomics evidence is a planning signal, not a billing guarantee or permission to send source material to a model.

## Evaluation And Fallback Expectations

Before promoting a cheaper model into a workflow, define:

- fixtures or representative examples
- expected outputs and unacceptable outputs
- citation requirements
- deterministic validators where possible
- human review threshold
- premium escalation threshold
- retry and refusal behavior
- fallback model or non-model path
- logging and audit evidence
- cost and latency budget
- rollback or disable path

The standard is not whether the output sounds fluent. The standard is whether the routed tier can reliably produce reviewable output for the task, source boundary, and consequence.

## Relationship To Existing Issues

| Issue | Relationship |
|---|---|
| #153 | Supports the source-grounded semantic explainer and proposal assessment path by separating Find, Explain, Assess, retrieval, reasoning, and premium escalation. |
| #155 | Complements the Microsoft-native internal pilot by defining how a pilot should choose model tiers without treating Microsoft-native access as blanket model approval. |
| #156 | Supports an external API-backed reference implementation by making provider calls a routed architecture choice with data, cost, and approval boundaries. |
| #79 | Keeps backend, graph, vector, and retrieval infrastructure deferred unless routing evidence proves flat package, manifest, and context-pack structure are insufficient. |
| #151 | Does not supersede DOCX idempotence, which remains separate deterministic artifact-generation hygiene. |
| Tokenomics/generation-strategy receipt pattern | Extends tokenomics posture from receipt evidence into practical workload classification, model-tier choice, cost-risk review, and escalation discipline. |

## Non-Goals

This note does not:

- implement model routing in code
- call external APIs
- add API keys or credentials
- endorse a specific provider or model for enterprise use
- add benchmark claims
- add local model runtime execution
- add provider configuration
- add backend services
- add vector store or graph database
- add connector ingestion
- create durable memory
- create workflow integration
- alter published HTML behavior
- approve any tool, model, data class, workflow, production use, policy, legal conclusion, security posture, regulatory posture, or GxP use case
