Skip to content

feat(router): surface or optionally reject client models overwritten by managed-inference pinning #2063

Description

@hunglp6d

Problem Statement

Managed inference (inference.local) pins the model: for standard routes, prepare_backend_request overwrites the client's model with the route-configured model. This is intentional and correct for its purpose (a sandbox must not be able to pick a different upstream model than what openshell inference set configured).

The gap: there is no signal when a client requests a model that is not the route model. The request silently succeeds (HTTP 200) with a completion from the route model. A typo or a misconfigured agent model id is therefore invisible — the caller believes they used model X but received model Y, with no error, header, or log line.

This surfaced downstream as NVIDIA/NemoClaw#994 (NVBug 6128801): a request with {"model":"this-model-does-not-exist"} returned HTTP 200 plus a completion from the pinned route model. That is by-design pinning, but the total absence of any signal makes misconfiguration hard to detect.

Proposed Design

Options, least invasive first — pinning behavior stays the default either way:

  1. Observability (recommended first, zero behavior change): when the client-sent model differs from route.model, emit a gateway log line / metric (model overwritten: requested=<x> served=<route.model>). No response change.

  2. Opt-in strict mode: openshell inference set --reject-unknown-model (default off). When enabled and the client model ≠ route.model (and not a known alias), return a 4xx instead of overwriting. Default-off preserves current pinning and backward compat.

Alternatives Considered

  • Always reject mismatches — rejected: breaks the intentional pinning use-case where clients legitimately send a model ≠ route.model, and would break agents that hardcode a model id.
  • Catalog validation against the provider's /v1/models — heavier; the router may not have catalog knowledge, and it is provider-dependent.

Relationship to #2039: same code path (prepare_backend_request). #2039 forwards the client model (passthrough); this surfaces/optionally rejects mismatches under pinning. Complementary, not duplicate.

Agent Investigation

Investigated the OpenShell source directly at HEAD a5161d0b:

  • crates/openshell-router/src/backend.rs:193 prepare_backend_request: for standard routes (not Vertex rawPredict, not Bedrock) it runs obj.insert("model", serde_json::Value::String(route.model.clone())) (backend.rs:297–301) — an unconditional overwrite of the client model. Comments at :204 and :268 state this is intentional ("so a sandbox cannot pick a different upstream model than what inference set configured" / "regardless of what the client sent").
  • route.model originates from openshell inference set --model (crates/openshell-server/src/inference.rs:131, via resolve_provider_route`).
  • No request-time validation of the client-supplied model exists in the router/server crates. The only model validators are config-time format-safety checks for Bedrock/Vertex route models (validate_aws_bedrock_model_id, validate_vertex_model_id); model_not_found appears only in test mocks.

Note: I read the source directly; I did not run OpenShell's named skills (debug-inference / openshell-cli).

Refs: NVIDIA/NemoClaw#994 · NVBug 6128801 · related #2039

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions