Problem Statement
Managed inference (inference.local) pins the model: for standard routes, prepare_backend_request overwrites the client's model with the route-configured model. This is intentional and correct for its purpose (a sandbox must not be able to pick a different upstream model than what openshell inference set configured).
The gap: there is no signal when a client requests a model that is not the route model. The request silently succeeds (HTTP 200) with a completion from the route model. A typo or a misconfigured agent model id is therefore invisible — the caller believes they used model X but received model Y, with no error, header, or log line.
This surfaced downstream as NVIDIA/NemoClaw#994 (NVBug 6128801): a request with {"model":"this-model-does-not-exist"} returned HTTP 200 plus a completion from the pinned route model. That is by-design pinning, but the total absence of any signal makes misconfiguration hard to detect.
Proposed Design
Options, least invasive first — pinning behavior stays the default either way:
-
Observability (recommended first, zero behavior change): when the client-sent model differs from route.model, emit a gateway log line / metric (model overwritten: requested=<x> served=<route.model>). No response change.
-
Opt-in strict mode: openshell inference set --reject-unknown-model (default off). When enabled and the client model ≠ route.model (and not a known alias), return a 4xx instead of overwriting. Default-off preserves current pinning and backward compat.
Alternatives Considered
- Always reject mismatches — rejected: breaks the intentional pinning use-case where clients legitimately send a model ≠ route.model, and would break agents that hardcode a model id.
- Catalog validation against the provider's
/v1/models — heavier; the router may not have catalog knowledge, and it is provider-dependent.
Relationship to #2039: same code path (prepare_backend_request). #2039 forwards the client model (passthrough); this surfaces/optionally rejects mismatches under pinning. Complementary, not duplicate.
Agent Investigation
Investigated the OpenShell source directly at HEAD a5161d0b:
crates/openshell-router/src/backend.rs:193 prepare_backend_request: for standard routes (not Vertex rawPredict, not Bedrock) it runs obj.insert("model", serde_json::Value::String(route.model.clone())) (backend.rs:297–301) — an unconditional overwrite of the client model. Comments at :204 and :268 state this is intentional ("so a sandbox cannot pick a different upstream model than what inference set configured" / "regardless of what the client sent").
route.model originates from openshell inference set --model (crates/openshell-server/src/inference.rs:131, via resolve_provider_route`).
- No request-time validation of the client-supplied model exists in the router/server crates. The only model validators are config-time format-safety checks for Bedrock/Vertex route models (
validate_aws_bedrock_model_id, validate_vertex_model_id); model_not_found appears only in test mocks.
Note: I read the source directly; I did not run OpenShell's named skills (debug-inference / openshell-cli).
Refs: NVIDIA/NemoClaw#994 · NVBug 6128801 · related #2039
Checklist
Problem Statement
Managed inference (
inference.local) pins the model: for standard routes,prepare_backend_requestoverwrites the client'smodelwith the route-configured model. This is intentional and correct for its purpose (a sandbox must not be able to pick a different upstream model than whatopenshell inference setconfigured).The gap: there is no signal when a client requests a model that is not the route model. The request silently succeeds (HTTP 200) with a completion from the route model. A typo or a misconfigured agent model id is therefore invisible — the caller believes they used model X but received model Y, with no error, header, or log line.
This surfaced downstream as NVIDIA/NemoClaw#994 (NVBug 6128801): a request with
{"model":"this-model-does-not-exist"}returned HTTP 200 plus a completion from the pinned route model. That is by-design pinning, but the total absence of any signal makes misconfiguration hard to detect.Proposed Design
Options, least invasive first — pinning behavior stays the default either way:
Observability (recommended first, zero behavior change): when the client-sent
modeldiffers fromroute.model, emit a gateway log line / metric (model overwritten: requested=<x> served=<route.model>). No response change.Opt-in strict mode:
openshell inference set --reject-unknown-model(default off). When enabled and the client model ≠ route.model (and not a known alias), return a 4xx instead of overwriting. Default-off preserves current pinning and backward compat.Alternatives Considered
/v1/models— heavier; the router may not have catalog knowledge, and it is provider-dependent.Relationship to #2039: same code path (
prepare_backend_request). #2039 forwards the client model (passthrough); this surfaces/optionally rejects mismatches under pinning. Complementary, not duplicate.Agent Investigation
Investigated the OpenShell source directly at HEAD
a5161d0b:crates/openshell-router/src/backend.rs:193prepare_backend_request: for standard routes (not Vertex rawPredict, not Bedrock) it runsobj.insert("model", serde_json::Value::String(route.model.clone()))(backend.rs:297–301) — an unconditional overwrite of the clientmodel. Comments at :204 and :268 state this is intentional ("so a sandbox cannot pick a different upstream model than whatinference setconfigured" / "regardless of what the client sent").route.modeloriginates fromopenshell inference set --model(crates/openshell-server/src/inference.rs:131, via resolve_provider_route`).validate_aws_bedrock_model_id,validate_vertex_model_id);model_not_foundappears only in test mocks.Note: I read the source directly; I did not run OpenShell's named skills (
debug-inference/openshell-cli).Refs: NVIDIA/NemoClaw#994 · NVBug 6128801 · related #2039
Checklist