Skip to content

docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs]#10

Merged
varunursekar merged 1 commit into
harbor-4-docsfrom
harbor-4-docs-fixes
Jul 1, 2026
Merged

docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs]#10
varunursekar merged 1 commit into
harbor-4-docsfrom
harbor-4-docs-fixes

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Stacks on #6 (harbor-4-docs). Documentation-accuracy fixes from the review of that PR. 4 of 4 fix PRs.

What this fixes

# Sev Finding Fix
D1 🟠 The intro over-claims a hard guarantee ("the optimizer cannot read hidden labels, modify the scorer, or bypass its budget") Softened to best-effort, OS/process-level language describing what is actually enforced
D2 🔴 The GAIA build.yaml says the validation split "never reaches the optimizer", but agent_repo: . + a git-tracked build.yaml seeds the held-out task ids into the optimizer's repo Corrected: only the scores are withheld (fine for a public benchmark), with a caveat + mitigations for secret-identity benchmarks
D3 🟠 gsm8k-agent is cited as the Mode A example but ships no build.yaml Repointed: gaia-optimization is the complete runnable example; gsm8k-agent is the Mode A agent reference, paired with the tutorial's Mode A build.yaml snippet
D4 🟠 The 3-tier visibility section omits the current fail-open default for unlisted splits Documented the fail-open default + "list every split", and noted it becomes fail-closed once the protocol fix lands
D5 🟠 "The scorer is sidecar-only" holds only for Mode B Split by mode: sidecar-only in Mode B; in Mode A the scorer is in the agent's editable repo until the serve.py fix bakes a sidecar task project

Prose/yaml only; build.yaml still parses. These align the docs with the behavior after the core/sidecar fix PRs (#7, #8) land.

🤖 Generated with Claude Code

Greptile Summary

This PR updates Harbor documentation to describe the current integrity limits more accurately. The main changes are:

  • Softens architecture wording around leaderboard guarantees.
  • Documents fail-open split visibility for unlisted splits.
  • Clarifies Mode A versus Mode B scorer isolation.
  • Repoints the runnable example guidance toward gaia-optimization.
  • Adds a GAIA build.yaml caveat about held-out task IDs being visible.

Confidence Score: 4/5

Merge is mostly safe after the README wording is aligned with the caveated integrity model documented elsewhere.

The changes are documentation-only and generally improve accuracy, but one prominent README paragraph still preserves an over-strong guarantee that can mislead users about the actual isolation properties.

vero/README.md

T-Rex T-Rex Logs

What T-Rex did

  • Reviewed the pre-change gaia-buildyaml-contract state by inspecting trex-artifacts/gaia-buildyaml-contract-01-before.log to verify base checkout, parse success, validation access: 'no_access', and the old caveat line.
  • Reviewed the post-change gaia-buildyaml-contract state by inspecting trex-artifacts/gaia-buildyaml-contract-02-after.log to verify head checkout, parse success, validation access: 'no_access', extracted caveat lines 44-52, and that all corrected-caveat assertions pass.
  • Saved the validation script trex-artifacts/gaia_buildyaml_contract_check.py that implements the checks used to verify the gaia build.yaml contract results.

View all artifacts

T-Rex Ran code and verified through T-Rex

Comments Outside Diff (1)

  1. vero/README.md, line 530 (link)

    P1 Stale integrity guarantee This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through agent_repo and read_only_paths, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: vero/README.md
    Line: 530
    
    Comment:
    **Stale integrity guarantee** This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through `agent_repo` and `read_only_paths`, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Fix in Cursor Fix in Claude Code Fix in Codex

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
vero/README.md:530
**Stale integrity guarantee** This README paragraph still gives the hard guarantee this PR is trying to remove: it says the optimizer can't read hidden labels, modify the scorer, or bypass its budget. The updated architecture docs now describe best-effort process-level limits, Mode A scorer exposure through `agent_repo` and `read_only_paths`, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.

Reviews (1): Last reviewed commit: "docs(harbor): honest integrity guarantee..." | Re-trigger Greptile

…xample, by-mode scorer

Documentation accuracy fixes (review findings on PR #6):

- architecture: soften the intro from a hard guarantee ("the optimizer cannot
  read hidden labels, modify the scorer, or bypass its budget") to best-effort,
  OS/process-level language describing what is actually enforced.
- gaia build.yaml: correct "never reaches the optimizer". Because agent_repo is
  "." and build.yaml is git-tracked, the validation task ids ARE seeded into the
  optimizer's repo; only the per-sample scores are withheld. Acceptable for a
  public benchmark, with a caveat + mitigations for secret-identity benchmarks.
- examples: gsm8k-agent is cited as the Mode A example but ships no build.yaml;
  repoint to gaia-optimization as the complete runnable example and pair
  gsm8k-agent with the tutorial's Mode A snippet.
- architecture: document the current fail-open default for unlisted splits (and
  that it becomes fail-closed once the protocol fix lands), and split the
  "scorer is sidecar-only" claim by mode (true for Mode B; Mode A keeps the
  scorer in the agent's editable repo until the serve.py fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@varunursekar varunursekar merged commit bb04d67 into harbor-4-docs Jul 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants