docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs] by shehabyasser-scale · Pull Request #10 · scaleapi/vero

shehabyasser-scale · 2026-06-30T10:25:21Z

Stacks on #6 (harbor-4-docs). Documentation-accuracy fixes from the review of that PR. 4 of 4 fix PRs.

What this fixes

#	Sev	Finding	Fix
D1	🟠	The intro over-claims a hard guarantee ("the optimizer cannot read hidden labels, modify the scorer, or bypass its budget")	Softened to best-effort, OS/process-level language describing what is actually enforced
D2	🔴	The GAIA `build.yaml` says the validation split "never reaches the optimizer", but `agent_repo: .` + a git-tracked `build.yaml` seeds the held-out task ids into the optimizer's repo	Corrected: only the scores are withheld (fine for a public benchmark), with a caveat + mitigations for secret-identity benchmarks
D3	🟠	`gsm8k-agent` is cited as the Mode A example but ships no `build.yaml`	Repointed: `gaia-optimization` is the complete runnable example; `gsm8k-agent` is the Mode A agent reference, paired with the tutorial's Mode A `build.yaml` snippet
D4	🟠	The 3-tier visibility section omits the current fail-open default for unlisted splits	Documented the fail-open default + "list every split", and noted it becomes fail-closed once the protocol fix lands
D5	🟠	"The scorer is sidecar-only" holds only for Mode B	Split by mode: sidecar-only in Mode B; in Mode A the scorer is in the agent's editable repo until the serve.py fix bakes a sidecar task project

Prose/yaml only; build.yaml still parses. These align the docs with the behavior after the core/sidecar fix PRs (#7, #8) land.

🤖 Generated with Claude Code

Greptile Summary

This PR updates Harbor documentation to describe the current integrity limits more accurately. The main changes are:

Softens architecture wording around leaderboard guarantees.
Documents fail-open split visibility for unlisted splits.
Clarifies Mode A versus Mode B scorer isolation.
Repoints the runnable example guidance toward gaia-optimization.
Adds a GAIA build.yaml caveat about held-out task IDs being visible.

Confidence Score: 4/5

Merge is mostly safe after the README wording is aligned with the caveated integrity model documented elsewhere.

The changes are documentation-only and generally improve accuracy, but one prominent README paragraph still preserves an over-strong guarantee that can mislead users about the actual isolation properties.

vero/README.md

T-Rex Logs

What T-Rex did

Reviewed the pre-change gaia-buildyaml-contract state by inspecting trex-artifacts/gaia-buildyaml-contract-01-before.log to verify base checkout, parse success, validation access: 'no_access', and the old caveat line.
Reviewed the post-change gaia-buildyaml-contract state by inspecting trex-artifacts/gaia-buildyaml-contract-02-after.log to verify head checkout, parse success, validation access: 'no_access', extracted caveat lines 44-52, and that all corrected-caveat assertions pass.
Saved the validation script trex-artifacts/gaia_buildyaml_contract_check.py that implements the checks used to verify the gaia build.yaml contract results.

_{Ran code and verified through T-Rex}

Comments Outside Diff (1)

vero/README.md, line 530 (link)

Stale integrity guarantee This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through agent_repo and read_only_paths, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.
Prompt To Fix With AI
```
This is a comment left during a code review.
Path: vero/README.md
Line: 530

Comment:
**Stale integrity guarantee** This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through `agent_repo` and `read_only_paths`, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.

How can I resolve this? If you propose a fix, please make it concise.
```
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
vero/README.md:530
**Stale integrity guarantee** This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through `agent_repo` and `read_only_paths`, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.

_{Reviews (1): Last reviewed commit: "docs(harbor): honest integrity guarantee..." | Re-trigger Greptile}

…xample, by-mode scorer Documentation accuracy fixes (review findings on PR #6): - architecture: soften the intro from a hard guarantee ("the optimizer cannot read hidden labels, modify the scorer, or bypass its budget") to best-effort, OS/process-level language describing what is actually enforced. - gaia build.yaml: correct "never reaches the optimizer". Because agent_repo is "." and build.yaml is git-tracked, the validation task ids ARE seeded into the optimizer's repo; only the per-sample scores are withheld. Acceptable for a public benchmark, with a caveat + mitigations for secret-identity benchmarks. - examples: gsm8k-agent is cited as the Mode A example but ships no build.yaml; repoint to gaia-optimization as the complete runnable example and pair gsm8k-agent with the tutorial's Mode A snippet. - architecture: document the current fail-open default for unlisted splits (and that it becomes fail-closed once the protocol fix lands), and split the "scorer is sidecar-only" claim by mode (true for Mode B; Mode A keeps the scorer in the agent's editable repo until the serve.py fix). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

varunursekar merged commit bb04d67 into harbor-4-docs Jul 1, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs]#10

docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs]#10
varunursekar merged 1 commit into
harbor-4-docsfrom
harbor-4-docs-fixes

shehabyasser-scale commented Jun 30, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shehabyasser-scale commented Jun 30, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

Greptile Summary

Confidence Score: 4/5

T-Rex Logs

Comments Outside Diff (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shehabyasser-scale commented Jun 30, 2026 •

edited by greptile-apps Bot

Loading