Skip to content

fix(harbor): nested-run flags/dedup/resume, archive + secret guards [fixes 3/4 compiler]#9

Merged
varunursekar merged 1 commit into
harbor-3-compilerfrom
harbor-3-compiler-fixes
Jul 1, 2026
Merged

fix(harbor): nested-run flags/dedup/resume, archive + secret guards [fixes 3/4 compiler]#9
varunursekar merged 1 commit into
harbor-3-compilerfrom
harbor-3-compiler-fixes

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Stacks on #5 (harbor-3-compiler). Addresses the Mode B runner + build-compiler findings. 3 of 4 fix PRs.

What this fixes

# Sev Finding Fix
K1 🔴 HarborConfig.n_attempts / max_retries are typed fields _build_command never emits, so nested runs use harbor defaults Emit --n-attempts / --max-retries from the configured values
K2 🔴 _load_trials is last-write-wins over an unordered rglob; a failing retry can clobber a passing trial Deterministic best-trial-per-task selection (clean+rewards beats failure; ties to latest attempt)
K3 🟠 Resume treats a persisted error sample as done, permanently skipping a transiently-failed sample A sample is done only if its persisted result is not an error; errors are re-run on resume
K4 🟠 _prepare_baseline_repo discards git archive's exit code, so a truncated stream can bake a near-empty baseline Assert git archive exited 0 (capture stderr, reap the process); fail the build otherwise
K5 🟠 Declared compose secrets resolve from host env; an unset var silently becomes empty → credential-less sidecar Render the fail-fast ${VAR:?msg} compose form + a build-time presence check (opt-out VERO_SKIP_SECRET_CHECK)
K6 read_only_paths perms read as tamper-protection but the verifier checks out git blobs, not the working tree Document it as advisory only; scorer provenance is enforced sidecar-side (the Mode A task project)

Greptile finding on PR #5, also fixed here (reply posted on #5)

  • mkdtemp leak (compiler.py): the dataset-staging dir is now a tempfile.TemporaryDirectory() cleaned up on success and exception (datasets can be gigabytes).

Tests

18 passed (test_harbor_runner.py, test_harbor_build.py): attempts/retries flags emitted; a passing trial wins over a later failing retry; resume re-runs a persisted error sample; a failing git archive raises; a missing declared secret fails the build; the rendered seed.sh documents the advisory caveat.

🤖 Generated with Claude Code

Greptile Summary

This PR fixes six issues in the Harbor Mode-B runner and build compiler, covering correctness bugs around nested-run flag propagation, non-deterministic trial selection, error-sample resume skipping, git archive exit-code checking, silent empty-secret injection, and missing documentation on read_only_paths scope.

  • Runner (K1–K3): --n-attempts and --max-retries are now emitted in _build_command; _load_trials picks the best trial per task deterministically (clean+rewards beats failure, ties to latest attempt) rather than using undefined rglob order; _is_done re-runs a persisted error sample on resume instead of permanently skipping it.
  • Compiler (K4–K6 + mkdtemp leak): git archive exit code is now asserted after reaping the process; declared secrets are validated against the host environment at build time with an opt-out escape hatch (VERO_SKIP_SECRET_CHECK); the compose template uses ${VAR:?msg} fail-fast interpolation; seed.sh documents read_only_paths as advisory only; the dataset-staging temp dir is now a TemporaryDirectory context manager that cleans up on both success and exception.

Confidence Score: 4/5

Safe to merge; all six targeted defects are fixed with direct test coverage and the changes are tightly scoped to the harbor runner and compiler.

The secret-presence check is placed after the git archive and dataset-staging steps rather than at the top of compile_task. An operator with a missing secret waits through potentially slow operations before getting the error, and the output directory is left in a partial state. Everything else — flag emission, deterministic trial selection, error-sample resume, git archive exit assertion, compose fail-fast interpolation — looks correct and is covered by the new tests.

vero/src/vero/harbor/build/compiler.py — the ordering of the secret check relative to the expensive build steps.

Important Files Changed

Filename Overview
vero/src/vero/harbor/runner.py Adds --n-attempts/--max-retries to _build_command, replaces _existing with _is_done (errors are re-run on resume), and replaces last-write-wins _load_trials with deterministic best-trial selection via _trial_rank. Logic is correct and well-tested.
vero/src/vero/harbor/build/compiler.py Fixes mkdtemp leak (TemporaryDirectory), git archive exit-code check, and build-time secret presence validation. The secret check is placed after expensive git/dataset operations rather than at the top of compile_task, which wastes time and leaves a partial output dir on failure.
vero/src/vero/harbor/build/templates/docker-compose.yaml.j2 Upgrades compose secret interpolation from ${VAR} to ${VAR:?msg} fail-fast form; correct and safe.
vero/src/vero/harbor/build/templates/seed.sh.j2 Adds advisory-only note for read_only_paths, accurately documenting that sidecar-baked task_project is the real scorer-provenance enforcement.
vero/tests/test_harbor_build.py Adds tests for archive failure, missing secrets, and advisory read_only_paths text; existing fixture updated to skip the secret check via VERO_SKIP_SECRET_CHECK.
vero/tests/test_harbor_runner.py New TestReviewFixes class covers all four runner changes: flag emission, best-trial selection, latest-clean-trial tiebreaker, and error-sample re-run on resume.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant V as vero (HarborRunner)
    participant H as harbor run (nested)
    participant FS as jobs_dir filesystem
    participant DB as SampleResult store

    V->>DB: _is_done(sample_id)?
    alt result exists AND not is_error()
        DB-->>V: "done=True (skip)"
    else missing OR is_error()
        DB-->>V: "done=False (pending)"
        V->>H: harbor run --n-attempts N --max-retries M -i task_name
        H->>FS: "write jobs/<ts>/<trial>/result.json (per attempt)"
        H-->>V: exit (non-zero tolerated)
    end
    V->>FS: rglob result.json
    FS-->>V: all trial files (unordered)
    V->>V: "_trial_rank: pick best per task (clean+rewards > rewards > latest finished_at)"
    V->>DB: save_sample_result (score or error)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant V as vero (HarborRunner)
    participant H as harbor run (nested)
    participant FS as jobs_dir filesystem
    participant DB as SampleResult store

    V->>DB: _is_done(sample_id)?
    alt result exists AND not is_error()
        DB-->>V: "done=True (skip)"
    else missing OR is_error()
        DB-->>V: "done=False (pending)"
        V->>H: harbor run --n-attempts N --max-retries M -i task_name
        H->>FS: "write jobs/<ts>/<trial>/result.json (per attempt)"
        H-->>V: exit (non-zero tolerated)
    end
    V->>FS: rglob result.json
    FS-->>V: all trial files (unordered)
    V->>V: "_trial_rank: pick best per task (clean+rewards > rewards > latest finished_at)"
    V->>DB: save_sample_result (score or error)
Loading

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
vero/src/vero/harbor/build/compiler.py:246-258
**Secret check runs after expensive build steps**

The presence check for declared secrets (step 4b) executes after `_prepare_baseline_repo` (git archive of the agent repo) and the dataset staging block, both of which can take minutes for large repos or multi-gigabyte datasets. If a required secret is absent, the operator waits through those steps before learning about the missing credential. Moving this check to the top of `compile_task` — before step 1 — would fail fast and avoid the partial-state output directory that is left behind on failure.

Reviews (1): Last reviewed commit: "fix(harbor): nested-run flags/dedup/resu..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

…mkdtemp cleanup

Mode B runner + build-compiler robustness (review findings on PR #5):

- runner: emit --n-attempts / --max-retries (the typed HarborConfig fields were
  silently dropped); pick the best trial per task deterministically instead of
  last-write-wins over an unordered rglob; treat a persisted error sample as
  not-done so a transient failure is re-run on resume.
- compiler: assert git archive exited 0 (and reap it) so a truncated stream
  cannot bake a near-empty baseline; validate declared secrets at build time and
  render compose secrets with a fail-fast guard so an unset host var fails loudly
  instead of producing a credential-less sidecar.
- seed.sh: document that read_only_paths is advisory only and scorer provenance
  is enforced sidecar-side.
- compiler: stage the dataset in a cleaned-up TemporaryDirectory (Greptile:
  the mkdtemp scratch dir was leaking, datasets can be gigabytes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +246 to +258
# 4b. Fail early if a declared secret is missing from the host env, so the
# operator finds out at build time rather than via a credential-less
# sidecar. The compose ${VAR:?} guard is the run-time backstop.
import os

if not os.environ.get("VERO_SKIP_SECRET_CHECK"):
missing = [s for s in config.secrets if not os.environ.get(s)]
if missing:
raise ValueError(
"Declared secrets missing from the host environment: "
f"{', '.join(missing)}. Set them, or set VERO_SKIP_SECRET_CHECK=1 "
"to defer to the run-time compose check."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Secret check runs after expensive build steps

The presence check for declared secrets (step 4b) executes after _prepare_baseline_repo (git archive of the agent repo) and the dataset staging block, both of which can take minutes for large repos or multi-gigabyte datasets. If a required secret is absent, the operator waits through those steps before learning about the missing credential. Moving this check to the top of compile_task — before step 1 — would fail fast and avoid the partial-state output directory that is left behind on failure.

Prompt To Fix With AI
This is a comment left during a code review.
Path: vero/src/vero/harbor/build/compiler.py
Line: 246-258

Comment:
**Secret check runs after expensive build steps**

The presence check for declared secrets (step 4b) executes after `_prepare_baseline_repo` (git archive of the agent repo) and the dataset staging block, both of which can take minutes for large repos or multi-gigabyte datasets. If a required secret is absent, the operator waits through those steps before learning about the missing credential. Moving this check to the top of `compile_task` — before step 1 — would fail fast and avoid the partial-state output directory that is left behind on failure.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Cursor Fix in Claude Code Fix in Codex

@varunursekar varunursekar merged commit 05cdeea into harbor-3-compiler Jul 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants