[Harbor 2/4] eval sidecar: verifier, token auth, HTTP API (Mode A)#4
[Harbor 2/4] eval sidecar: verifier, token auth, HTTP API (Mode A)#4varunursekar wants to merge 3 commits into
Conversation
The evaluation engine, run in a sidecar container, with the trust boundary that makes an optimization run leaderboard-gradeable: - `EvaluationSidecar` (server.py): agent-facing handlers — commit transfer from the untrusted agent repo (git fetch, hooks off, object copy) and tier-gated write-routing of results across the agent-readable and admin volumes. - `Verifier` (verifier.py): commit selection (submit | auto_best) + hidden-split scoring. - Per-trial admin token (auth.py), written root:600 so the optimizer (de-privileged) cannot read it; only the verifier (root, shared mode) can. - FastAPI surface (app.py): /eval, /submit, /status for the agent (metered, redacted); /finalize for the verifier (token-gated). `vero harbor serve` (serve.py) assembles the engine + sidecar + verifier from a ServeConfig and runs it under uvicorn. - `vero harbor` CLI clients (cli.py): serve | eval | submit | status | finalize (build/run land with the compiler). HarborConfig + the Mode-B dataset partition helpers (config.py, dataset.py) are included so the harbor package imports cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| # Mode A | ||
| task: str | None = None | ||
| task_project: str | None = None |
There was a problem hiding this comment.
Security, the headline one. In Mode A this defaults to None, so the Evaluator runs uv in the agent's own project and loads the @task.evaluation() scorer from the agent's committed code. Verifier.finalize → evaluate_admin then scores the hidden split with that agent-controlled scorer, so a committed scorer that returns 1.0 wins the leaderboard. The token/volume/object-copy machinery secures transport, not scorer provenance. For a gradeable boundary the verifier must score with a sidecar-baked task project: require task_project/inner_task for Mode A and reject configs where the scorer would resolve from the agent repo. This is the default posture and the single most important gap in the stack.
| tok = generate_token() | ||
| p = write_admin_token(tmp_path / "t", tok) | ||
| assert read_admin_token(p) == tok | ||
| assert (p.stat().st_mode & 0o777) == 0o600 |
There was a problem hiding this comment.
For the security-critical PR, the OS-level claims are asserted only as a 0o600 mode bit plus happy-path mocks. Please add adversarial coverage: (1) agent /eval on a no_access split → rejected end-to-end; (2) an integration test that finalize does not execute an agent-supplied scorer (the Mode A provenance gap); (3) a uid-drop test that agent.user cannot read the token file.
| workspace = await GitWorkspace.create(config.repo_path) | ||
|
|
||
| budget = BudgetLedger( | ||
| [SplitBudget(**b) for b in config.budgets], |
There was a problem hiding this comment.
build_components rebuilds the BudgetLedger from config.budgets on every start, and the ledger writes persist_path on each mutation but never reads it back. So a sidecar restart mid-trial silently resets spent budget — metering is not crash-safe despite the durable-ledger framing. Add a load-if-exists path that seeds from the persisted ledger when the file exists, and from config.budgets only when it does not.
| """Best candidate by recorded score on the selection split (excludes baseline).""" | ||
| if self.engine.db is None: | ||
| raise NoCandidateError("auto_best mode but no experiment database.") | ||
| df = self.engine.db.get_experiments_df(fill_score=default_minimum_score) |
There was a problem hiding this comment.
auto_best ranks candidates solely by the recorded mean_score from engine.db.get_experiments_df, but that DB is written by the agent's metered /eval calls and the verifier shares the same in-process engine/DB. The only admin-independent scoring at finalize is evaluate_admin on the test target — after selection, and only on the chosen winner. So a candidate that records an inflated selection score selects itself. (Mode B in #5 sharpens this: the recorded selection score is the reward a nested run of the candidate emitted, which the optimizer controls.) Before trusting the ranking, re-score under admin: re-run evaluate_admin on the selection split for the top-K and re-rank, or at minimum re-score the winner and fail closed on deviation.
| for sa in split_accesses: | ||
| if sa.split == split: | ||
| return sa.access | ||
| return SplitAccessLevel.viewable |
There was a problem hiding this comment.
This fails open, the wrong default for a trust boundary. tier_for_split is the gate _route_results uses to decide whether to write full per-sample SampleResults (labels included) to the agent volume; an author who budgets a split but forgets to add it to split_accesses lands here and gets viewable → ground truth on the agent-readable volume. Default to the most restrictive tier (no_access) and update the docstring, which currently advertises the fail-open as intended. If an unlisted split should ever be visible, make that explicit and logged.
…ers, budget reload Trust-boundary hardening for the eval sidecar (review findings on PR #4): - protocol: tier_for_split fails CLOSED (unlisted split -> no_access, with a warning), instead of fail-open to viewable. - serve: Mode A now requires task_project. build_components refuses to start a Mode A config without it, so the hidden-split/admin scorer is always loaded from the sidecar-baked task project, never the agent's committed repo (a committed scorer returning 1.0 can no longer win the reward). - verifier: auto_best no longer trusts the agent-influenced recorded score. It shortlists by recorded score, then re-runs evaluate_admin (the trusted scorer) on the selection split for the top-K and ranks by the admin score; fails closed. - serve: reload the persisted budget ledger on startup so a sidecar restart does not reset spent budget to full. - tests: adversarial coverage (no_access -> 400, finalize does not run an agent-supplied scorer via a cheating-agent repo, admin token 0o600/unreadable). Also folds in the valid Greptile findings on this PR: clear stale per-sample result files on re-eval, type reward_mode as Literal[submit, auto_best], and filter auto_best candidates by selection dataset id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(harbor): Mode A scorer provenance, auto_best admin re-score, fail-closed tiers [fixes 2/4 sidecar]
| async def evaluate(self, req: EvalRequest, *, admin: bool = False) -> EvalSummary: | ||
| sha = await self._transfer_commit(req.commit) | ||
| exp = await self.engine.evaluate(replace(req, commit=sha), admin=admin) | ||
| result_path = self._route_results(exp, admin=admin) |
There was a problem hiding this comment.
tier_for_split() now treats unlisted splits as no_access, and /status reports those budgeted-but-unlisted splits as not agent-evaluable. But this path still transfers and scores the commit before _route_results() suppresses only the result files. If a config includes a budget for a hidden or unlisted split, the agent can call /eval and receive mean_score, n_samples, and budget data in the returned EvalSummary. Please reject non-admin evals whose tier is no_access before calling engine.evaluate(), or ensure those splits cannot be present in the agent budget allowlist.
Artifacts
Repro: focused pytest exercising non-admin hidden split EvaluationSidecar.evaluate behavior
- Contains supporting evidence from the run (text/x-python; charset=utf-8).
Repro: verbose pytest output showing engine.evaluate call and leaked EvalSummary fields
- Keeps the command output available without making the summary code-heavy.
Ran code and verified through T-Rex
Prompt To Fix With AI
This is a comment left during a code review.
Path: vero/src/vero/harbor/server.py
Line: 70-73
Comment:
**Gate hidden evals**
`tier_for_split()` now treats unlisted splits as `no_access`, and `/status` reports those budgeted-but-unlisted splits as not agent-evaluable. But this path still transfers and scores the commit before `_route_results()` suppresses only the result files. If a config includes a budget for a hidden or unlisted split, the agent can call `/eval` and receive `mean_score`, `n_samples`, and budget data in the returned `EvalSummary`. Please reject non-admin evals whose tier is `no_access` before calling `engine.evaluate()`, or ensure those splits cannot be present in the agent budget allowlist.
How can I resolve this? If you propose a fix, please make it concise.
Draft · Stack 2 of 4 — targets
harbor-1-core(review [1/4] first; its diff is the base here).The evaluation engine run in a sidecar container, plus the trust boundary that makes a run leaderboard-gradeable. This is the security-critical PR — a focused/security review is worthwhile here.
EvaluationSidecar(server.py): commit transfer from the untrusted agent repo (git fetch, hooks off, object copy) + tier-gated result write-routing across the agent-readable and admin volumes.Verifier(verifier.py): commit selection (submit | auto_best) + hidden-split scoring.root:600— unreadable by the de-privileged optimizer, readable by the verifier (root, shared mode)./eval/submit/status(agent; metered, redacted) and/finalize(verifier; token-gated).vero harbor serveassembles engine+sidecar+verifier from aServeConfig.HarborConfig/partition helpers so the package imports cleanly (build/runlight up in [3/4]).Stack: [1/4] core → this → [3/4] compiler → [4/4] docs.
🤖 Generated with Claude Code
Greptile Summary
This PR adds the Harbor eval sidecar and verifier flow. The main changes are:
Confidence Score: 4/5
The Harbor sidecar flow is close, but the eval path needs a hidden-split access gate before merge.
The review covers the new API, sidecar, verifier, auth, CLI, and tests, and the remaining concern is a concrete access-control gap in the eval path.
vero/src/vero/harbor/server.py
What T-Rex did
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "Merge pull request #8 from scaleapi/harb..." | Re-trigger Greptile