Skip to content

CI/infra hardening from the package audit: cache-poisoning fix, two-sided ratchet, MSPLIM, pin upgrades (PR B)#26

Merged
tap merged 1 commit into
mainfrom
claude/infra-hardening
Jun 12, 2026
Merged

CI/infra hardening from the package audit: cache-poisoning fix, two-sided ratchet, MSPLIM, pin upgrades (PR B)#26
tap merged 1 commit into
mainfrom
claude/infra-hardening

Conversation

@tap

@tap tap commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Audit fix series, part B — the CI/infrastructure findings. Composes cleanly with #25 (part A): this PR deliberately avoids the hexagon ctest -E line and the bare-metal filter string that #25 touches.

The two Highs

  1. Hexagon toolchain cache-poisoning hole closed. The icount-ratchet and compare.yml jobs downloaded the toolchain unverified yet saved it under the same -verified- cache key the hexagon-qemu job trusts — first unverified writer poisoned everyone's cache. Now: the SHA256 hard pin is verified in all three download paths, and the cache key is the pinned hash (hexagon-toolchain-<sha256>-1), so an unverified artifact can never occupy a trusted key.
  2. The icount ratchet is now two-sided. Improvement beyond tolerance fails with "run icount.py --update and commit baselines.json" — unclaimed improvements can no longer create a stale-baseline dead zone that absorbs future regressions. Demonstrated end-to-end: inflated baseline → exit 1 with the message; --update now also prunes stale scenario keys (verified bit-identical values, zero README diff).

The rest

  • Bare-metal empty-run guard: a filter typo matching zero tests now fails instead of passing green. One spec deviation, empirically forced: gtest applies filters inside RUN_ALL_TESTS() (test_to_run_count() reads 0 before it), so the guard sits after the run — proven by building the as-specified version and watching it fail on target.
  • MSPLIM armed on M33 and M55 (first instruction of Reset_Handler; __stack_limit symbols in both linker scripts, verified by nm: M33 0x383f0000, M55 0x20000000) plus a dedicated HardFault handler — stack overflow now faults instead of silently corrupting the heap. M33 one-shot suite passed under QEMU with MSPLIM armed (78.9 s). Baselines deliberately unchanged (+2..26 insns one-time, +0.00%).
  • compare-smoke job: per-push build-only smoke of srt_bench_compare (host) and cmp_icount_lsr_medium (M55 cross) — the comparison infrastructure can no longer bit-rot invisibly between manual dispatches.
  • ci-arm64 failures get an audience: if: failure() opens/updates a "ci-arm64 weekly run failing" issue; stale header comment corrected (macos-latest already covers per-push arm64; this workflow's unique value is TSan-on-arm64).
  • Pin upgrades: qemu-plugin.h fetched by commit SHA + SHA256 check (self-tested: plugin built from the pinned URL); googletest/benchmark FetchContent moved from movable tags to commit SHAs (fresh configure verified to clone the pinned SHA).
  • Script guards: icount.py per-binary 600 s timeout naming the binary, zero-baseline guard, corrected usage; update_perf_docs.py refuses to write an empty table (both failure cases unit-tested). clang-format gate extended to bench/compare, the QEMU plugin, and platform/*.c (reformat included; minimal churn).
  • Known-debt ledger in PERFORMANCE.md (MSVC /W4 triage, missing tail-latency bench) referenced from the MSVC gate comment.

Verified locally: all workflows YAML-parse; host build + fast ctest green on the new pins; M55 icount all 7 scenarios pass vs committed baselines; M33 QEMU suite green with MSPLIM. Not verifiable here: the Hexagon legs, actual cache/issue-creation behavior, MSVC/macOS — first CI run on this PR covers most of that.

https://claude.ai/code/session_01HuAFfoeD5a5Xe5aGNA16M9


Generated by Claude Code

- Hexagon toolchain cache poisoning: icount-ratchet (ci.yml) and Measure
  Hexagon (compare.yml) downloaded the toolchain unverified on cache miss
  while sharing the hexagon-qemu job's trusted cache key. Both paths now
  verify the existing HEXAGON_TOOLCHAIN_SHA256 hard pin, and all three
  cache keys are keyed on the pinned digest itself.
- Two-sided ratchet: icount.py now fails on improvement beyond tolerance
  (stale slack would hide later regressions) with instructions to run
  --update and commit baselines.json; docstring and PERFORMANCE.md
  updated. Also: zero-baseline guard, 600 s per-binary QEMU timeout with
  a named-binary error, usage line lists all three targets, and --update
  rewrites the target entry to exactly the measured scenarios (prunes
  stale keys). Committed baselines unchanged; README regen is diff-free.
- Bare-metal empty-run guard: bare_metal_main.cpp fails with
  SRT_TESTS_COMPLETE rc=1 if fewer than 15 tests were selected, so a
  filter typo cannot pass green. Checked after RUN_ALL_TESTS because
  gtest applies the filter inside it (the count reads 0 beforehand —
  verified on target). Filter string itself untouched.
- MSPLIM: __stack_limit added to both linker scripts (M55: DTCM base, the
  stack owns the region; M33: __heap_end__) and written to msplim first
  thing in Reset_Handler (Armv8-M Mainline only; both targets are).
  Dedicated HardFault_Handler (bkpt + park) replaces the Default_Handler
  alias. Verified: M33 one-shot suite passes under QEMU; M55 icount
  workloads still complete with counts within 0.01% of baselines.
- compare-smoke job: per-push build-only check of srt_bench_compare
  (host) and cmp_icount_lsr_medium (M55 cross) so compare.yml's
  manual-only paths cannot bit-rot.
- ci-arm64.yml: on failure, opens or comments on a "ci-arm64 weekly run
  failing" issue (scheduled runs have no PR audience); header comment
  reworded — macos-latest already covers arm64 per push, this workflow's
  unique value is TSan-on-arm64.
- qemu-plugin.h pinned to the commit v8.2.2 points at
  (11aa0b1ff115b86160c4d37e7c37e6a6b13b77ea) with sha256 verification in
  both workflows' plugin-build steps.
- FetchContent pins: googletest f8d7d77c (v1.14.0), benchmark c58e6d07
  (v1.9.1) — commit SHAs instead of movable tags.
- update_perf_docs.py exits nonzero on empty/items_per_second-less
  benchmark output; clang-format gate extended to bench/compare,
  tools/qemu_insn_plugin and platform C sources (only churn: comment
  realignment in armv8m_startup.c's vector table).
- Known-debt ledger in PERFORMANCE.md (MSVC /W4 triage, missing
  tail-latency benchmark); MSVC matrix comment references it.

https://claude.ai/code/session_01HuAFfoeD5a5Xe5aGNA16M9
@tap tap merged commit 14a9329 into main Jun 12, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants