feat(router): surface or optionally reject client models overwritten by managed-inference pinning

### Problem Statement

Managed inference (`inference.local`) pins the model: for standard routes, `prepare_backend_request` overwrites the client's `model` with the route-configured model. This is intentional and correct for its purpose (a sandbox must not be able to pick a different upstream model than what `openshell inference set` configured).

The gap: there is **no signal** when a client requests a model that is *not* the route model. The request silently succeeds (HTTP 200) with a completion from the route model. A typo or a misconfigured agent model id is therefore invisible — the caller believes they used model X but received model Y, with no error, header, or log line.

This surfaced downstream as NVIDIA/NemoClaw#994 (NVBug 6128801): a request with `{"model":"this-model-does-not-exist"}` returned HTTP 200 plus a completion from the pinned route model. That is by-design pinning, but the total absence of any signal makes misconfiguration hard to detect.

### Proposed Design

Options, least invasive first — pinning behavior stays the default either way:

1. **Observability (recommended first, zero behavior change):** when the client-sent `model` differs from `route.model`, emit a gateway log line / metric (`model overwritten: requested=<x> served=<route.model>`). No response change.

3. **Opt-in strict mode:** `openshell inference set --reject-unknown-model` (default off). When enabled and the client model ≠ route.model (and not a known alias), return a 4xx instead of overwriting. Default-off preserves current pinning and backward compat.

### Alternatives Considered

- **Always reject mismatches** — rejected: breaks the intentional pinning use-case where clients legitimately send a model ≠ route.model, and would break agents that hardcode a model id.
- **Catalog validation against the provider's `/v1/models`** — heavier; the router may not have catalog knowledge, and it is provider-dependent.

Relationship to #2039: same code path (`prepare_backend_request`). #2039 forwards the client model (passthrough); this surfaces/optionally rejects mismatches under pinning. Complementary, not duplicate.

### Agent Investigation

Investigated the OpenShell source directly at HEAD `a5161d0b`:

- `crates/openshell-router/src/backend.rs:193` `prepare_backend_request`: for standard routes (not Vertex rawPredict, not Bedrock) it runs `obj.insert("model", serde_json::Value::String(route.model.clone()))` (backend.rs:297–301) — an unconditional overwrite of the client `model`. Comments at :204 and :268 state this is intentional ("so a sandbox cannot pick a different upstream model than what `inference set` configured" / "regardless of what the client sent").
- `route.model` originates from `openshell inference set --model` (`crates/openshell-server/src/inference.rs:131`, via resolve_provider_route`).
- No request-time validation of the client-supplied model exists in the router/server crates. The only model validators are config-time format-safety checks for Bedrock/Vertex route models (`validate_aws_bedrock_model_id`, `validate_vertex_model_id`); `model_not_found` appears only in test mocks.

Note: I read the source directly; I did not run OpenShell's named skills (`debug-inference` / `openshell-cli`).

Refs: NVIDIA/NemoClaw#994 · NVBug 6128801 · related #2039

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(router): surface or optionally reject client models overwritten by managed-inference pinning #2063

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(router): surface or optionally reject client models overwritten by managed-inference pinning #2063

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions