#313 P1+P3: replace the boot-time facet-index full scan with a tiny trusted manifest by rdhyee · Pull Request #317 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-07-01T23:48:41Z

What this fixes

Part of #313 (Explorer slowness on slow connections). Live repro today: a URL with a preset facet filter at continental zoom took ~45-50 seconds to fully resolve on current production — reproduced independent of any recent changes, so this is a pre-existing latency issue, not a regression.

facetIndexReady in explorer.qmd currently does two expensive things against the live sample_facet_index.parquet (9.68 MB, ~6M rows) on every page load, blocking multi-filter count readiness:

SELECT DISTINCT build_id, schema_version FROM read_parquet(index_url) — touches build_id/schema_version columns across every row group of the 9.68 MB file.
A coverage check: SELECT source, COUNT(*) FROM read_parquet(index_url) GROUP BY source vs facet_summaries — a full 6M-row scan.

This PR eliminates both, per the joint Claude+Codex mitigation plan from the original 2026-06-26 investigation (P0, the "Loading…" honesty-state fix, already shipped as #316).

P1 — trusted build-time manifest

New sample_facet_index_meta.parquet artifact (scripts/build_frontend_derived.py): a tiny (~1 KB) per-source histogram + build_id/schema_version/total_rows, computed directly from samp_geo — the same authoritative table sample_facet_index itself derives from, not read back from the index (independence is the point: a buggy index build could carry self-consistent-but-wrong metadata).
Independent validation gate (scripts/validate_frontend_derived.py): reads the actual on-disk sample_facet_index.parquet (full scan — fine at build/CI time, never the browser critical path) and asserts it matches the manifest.
explorer.qmd's facetIndexReady now reads the tiny manifest instead of scanning the big index. Same checks (schema version, node_bits generation match, coverage vs facet_summaries), same data, just a cheaper source. The big index is now touched only lazily, when a user's actual multi-filter query runs.
Escape hatch: --only sample_facet_index_meta builds just the meta file without forcing a full index rebuild — for pairing a new meta file with an already-deployed index built from the same input (see deployment note below).

P3 — decouple the masks scan from the readiness gate

facetIndexReady previously waited on the entire nodeBitsReady cell, including a 9.67 MB masks scan it doesn't actually need (only __nodeBitsBuild, set after a 2 KB fetch). Split into nodeBitsCoreReady (fast) + nodeBitsReady (masks scan, now sequenced to run after facetIndexReady settles so the two don't contend for the single DuckDB-WASM connection).

P6 (targeted) — Firefox regression spec

Narrow firefox-facet-index-meta Playwright project, scoped to one new spec proving the pending→failed→ready UI contract and that a held/blocked manifest fetch never produces a permanent-looking stuck state.

Verification

46/46 JS unit tests, 39/39 Python pipeline tests, explorer-smoke (chromium), and the new Firefox spec (3/3 clean runs) all pass.
Confirmed the new manifest pairs with the already-deployed index with zero risk: built it locally from a wide.parquet whose sha256 byte-matches what produced the currently-live sample_facet_index.parquet, and verified via the live public URL that the build_ids are identical.
Ran the new independent validator locally against the real deployed index — all new checks pass.
Two rounds of Codex review: design-level (conditional LGTM, 3 required corrections, all applied) and a final code-review pass on this diff (clean LGTM, no blocking issues).

⚠️ Deployment note — one manual step required

No R2 write access was available while building this, so the new sample_facet_index_meta.parquet file has not been uploaded. Built locally at /Users/raymondyee/Data/iSample/pqg_refining/staged_202608/p1_meta_local/isamples_202608_sample_facet_index_meta.parquet (also verified reproducible independently, sha256-matched inputs).

This is safe to merge before the upload happens: today, facetIndexReady always ends in 'failed' (no index at all reachable in a useful way for this check). After merging but before uploading the new file, it will still end in 'failed', just via a fast 404 instead of a slow scan — a net improvement to the failure path with zero behavior change to the success path (which simply isn't reachable yet either way). It becomes fully active (fast 'ready' path) the moment isamples_202608_sample_facet_index_meta.parquet is uploaded to R2 (isamples-ry bucket) alongside the existing isamples_202608_sample_facet_index.parquet — same build_id, confirmed paired above.

Relates to #313 (not closing — P1+P3 shipped; P2 DuckDB-WASM upgrade and P4/P5 remain deferred per the original review).

🤖 Generated with Claude Code

…st, derived from samp_geo) New build_sample_facet_index_meta() computes the per-source histogram directly from samp_geo (the same authoritative located-universe table build_sample_facet_index/build_facet_summaries already derive from), NOT by reading back sample_facet_index.parquet itself -- independence is the point, per Codex's 2026-07-01 review: an independent validator can then read the actual on-disk index and prove meta/index/facet_summaries agree. Registered in ARTIFACTS/HIER_ARTIFACTS, deliberately excluded from force_deps so `--only sample_facet_index_meta` alone builds just the meta file -- the escape hatch for pairing a new meta with an already-deployed index built from the same wide input. Part of isamplesorg#313 P1+P3 (facetIndexReady latency fix); validator + explorer.qmd wiring + P3 decoupling + P6 targeted test to follow in this branch.

…ainst the real index New --index-meta gate in validate_frontend_derived.py: schema/shape checks, then (given --index) a FRESH full scan of the actual on-disk sample_facet_index recomputes the per-source histogram/build_id/schema_version/row_count and diffs it against the manifest via symmetric EXCEPT (relational content, not byte identity) -- this is the independence Codex's review required: the validator does not trust meta's self-reported numbers or read meta back to derive its own expectation. Also cross-checks meta against facet_summaries' source facet, mirroring the comparison the explorer runtime performs. Continues isamplesorg#313 P1+P3 (see prior commit).

…ntract Adds SERIALIZATIONS.md §4.13 and a DATA_PROVENANCE.md summary line for the new manifest artifact: independence from sample_facet_index (built from samp_geo, not read back), the --only escape hatch, and the R2 same-build_id pairing requirement.

…decouple masks scan P1: facetIndexReady now reads index_meta_url (a few KB, built at compile time from samp_geo and independently validated against the real index) instead of scanning the 9.68MB sample_facet_index.parquet directly. Same checks (schema version, node_bits generation match, per-source coverage vs facet_summaries), same data, just sourced from the cheap pre-verified manifest. The big index file is now touched only lazily, when a user's actual multi-filter count query runs -- never during the readiness check. P3: split nodeBitsReady into nodeBitsCoreReady (step 1, node_bits fetch, publishes __nodeBitsMap/__nodeBitsBuild) and a thinner nodeBitsReady (step 2, the 9.67MB masks scan). facetIndexReady now depends on nodeBitsCoreReady only -- previously it depended on the whole nodeBitsReady cell, which meant it couldn't even start until the masks scan finished, even though the values it needs are published synchronously before that scan begins. nodeBitsReady itself now awaits facetIndexReady's settlement (ready or failed, either is fine) before starting the masks scan, so the two don't race for the single DuckDB-WASM connection -- same discipline as whenConnectionIdle elsewhere in this file. Completes the explorer.qmd side of isamplesorg#313 P1+P3 (see prior two commits for the data-pipeline side: build_frontend_derived.py + validate_frontend_derived.py).

…ding/failed race Adds a narrow firefox-facet-index-meta Playwright project scoped to ONE new spec (tests/playwright/facet-index-meta-pending.spec.js), not a broad Firefox enable. Test 1 uses page.route() to hold/release the sample_facet_index_meta fetch and proves window.__facetIndexStatus stays 'pending' while held and settles (ready/failed) once released. Test 2 exercises the exact UI contract for 2 active Material filters at global view across pending -> failed -> ready, reusing the real production handleFacetFilterChange/ updateCrossFilteredCounts code path. Empirical finding baked into the design (documented in the spec's header): DuckDB-WASM's non-threaded worker serializes queries, so holding the meta fetch open also starves the Material facet's own independent query -- a real held request and "Material checkboxes interactive" can't coexist in a single fresh page load. Test 2 therefore drives window.__facetIndexStatus directly (the same global the real preflight sets) after a normal boot, which lets it assert the pending/failed contract deterministically and still trigger a REAL count query for the 'ready' step (sample_facet_index and facet_node_bits are already live on R2; only the new meta manifest isn't). That real query was confirmed to genuinely start against production but did not resolve within the spec's window in this sandboxed environment (a large, network-bound full-file read) -- so the 'ready' step is a best-effort/soft check, not a hard CI assertion, with the reasoning documented inline. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XEtSoXjsKtnYWQ7yS8mGRo

…ntract spec Test 2 (pending -> failed -> ready UI contract) failed on repeat local runs: the DOM was still showing the "(Loading…)" pending state when the test expected "(—)" failed, well past the original 45s poll window. Tried and reverted: blocking the real sample_facet_index_meta fetch to "neutralize" the real boot-time preflight racing the test's manual window.__facetIndexStatus injections. That reintroduces the exact FIFO single-worker starvation the spec's own DESIGN NOTE documents -- Material's facet_tree_summaries query gets stuck behind the held route on the same DuckDB-WASM worker, so the checkboxes this test needs never render at all. Root cause is more likely general single-worker query-queue congestion in this sandbox's network path to data.isamples.org (the same Firefox slowness already documented for the 'ready' step) occasionally delaying the pending->failed repaint past 45s, not a status race -- the real preflight resolves to 'failed' quickly (a 404, not a large download) well before this test's manual steps run. Fix: generous-but-bounded timeouts (45s -> 90s) on both the pending and failed polls, test.setTimeout 180s -> 300s to give them room. Verified 3/3 clean runs locally after the change (previously flaked on run 2 of 2). Also verified independently: 46/46 unit tests, 39/39 python pipeline tests, explorer-smoke (chromium) all still pass.

rdhyee and others added 6 commits July 1, 2026 15:43

rdhyee merged commit 97a7fb5 into isamplesorg:main Jul 2, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#313 P1+P3: replace the boot-time facet-index full scan with a tiny trusted manifest#317

#313 P1+P3: replace the boot-time facet-index full scan with a tiny trusted manifest#317
rdhyee merged 6 commits into
isamplesorg:mainfrom
rdhyee:fix/313-facet-index-manifest-p1-p3

rdhyee commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rdhyee commented Jul 1, 2026

What this fixes

P1 — trusted build-time manifest

P3 — decouple the masks scan from the readiness gate

P6 (targeted) — Firefox regression spec

Verification

⚠️ Deployment note — one manual step required

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant