Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244
Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244rdhyee wants to merge 1 commit into
Conversation
99d10ed to
c3a34e5
Compare
…samplesorg#243) Additive 'collection' dimension: filter the explorer to a named SamplingSite label (e.g. OpenContext 'PKAP Survey Area'). Precomputes site membership via the wide-parquet Sample->Event->Site traversal into two new R2 files; touches none of the existing facet files. Rebased onto main so it sits cleanly on top of the merged isamplesorg#242 heatmap work (disjoint regions, no conflict). - scripts/build_collections.py: builds collections.parquet + sample_collections .parquet. Unnests BOTH relationship arrays (multi-event/multi-site safe), counts DISTINCT pids, orders membership by collection_id for row-group pruning. PKAP=15,446 verified; both files live on data.isamples.org. - explorer.qmd: dual-UX collection facet (top-N checkboxes + search-the-tail), ?collection= URL param wired through the existing facet lifecycle and the facetFilterSQL() chokepoint (2nd subquery against sample_collections.parquet). - collections.qmd: Featured Collections page uses identity-based &collection=. - EXPLORER_STATE.md, data.qmd: document the new param and files. - tests/test_collections.py: page + facet-DOM checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c3a34e5 to
219b400
Compare
|
Rebased onto upstream/main (was 41 commits behind) to resolve conflicts with the #305 facet-count stack. All conflicts were in
No EXPLORER_STATE.md/ Tests: Still draft — leaving as-is since the R2 upload of |
Resolves #243.
Adds a first-class
collectiondimension to the explorer: filter to a namedSamplingSite label (e.g. the OpenContext project "PKAP Survey Area") and
layer the existing material / context / object_type facets on top.
Why this design (additive)
"Collection" identity lives on
SamplingSiteentities, reached only by theMaterialSampleRecord → produced_by → SamplingEvent → sampling_site → SamplingSitetraversal — never on the sample rows the explorer renders. Doingthat array-join live in DuckDB-WASM is the documented in-browser bottleneck, so
membership is precomputed. The current
sample_facets_v2 / facet_summaries / facet_cross_filterbuild pipeline isn't in any repo, so rather than riskregenerating those, this feature is strictly additive — two new files that
touch nothing existing:
collections.parquet— dimension (collection_id, label, source, n_samples, centroid_lat/lng, bbox). 61,695 rows, ~3 MB. Powers the top-Ncheckboxes, the search box, and the Featured-Collections preset cameras.
sample_collections.parquet— membership (pid → collection_id). ~13 MB.The filter appends a second
pid IN (SELECT … )subquery infacetFilterSQL(), exactly parallel to the existing facet predicate.A "collection" = a
SamplingSitelabel (≈1,336 site rows share "PKAP SurveyArea"), keyed by a stable hash of (source, label). Verified: PKAP = 15,446
samples.
What's in the PR
scripts/build_collections.py— builds both files from/current/wide.parquet.explorer.qmd— dual-UXcollectionfacet (top-N checkboxes + search-the-tailfor the ~60K long tail),
?collection=URL param wired through the existingfacet lifecycle (
applyQueryToFacetFilters/writeQueryState/handleFacetFilterChange) and thefacetFilterSQL()chokepoint.collections.qmd— Featured Collections page upgraded to identity-based&collection=<id>links + camera fly.EXPLORER_STATE.md,data.qmd— document the new param and files.tests/test_collections.py— Collections page + explorer facet-DOM checks.The facet is inert until the two files are live on
data.isamples.org:python scripts/build_collections.py --out-dir <dir> --snapshot 202604isamples_202604_collections.parquet+isamples_202604_sample_collections.parquetto R2 (behind the data.isamples.org Worker)explorer.html?collection=dd74c71982da0e21→ PKAP samples; layer a material facet to confirm it narrowstests/test_collections.pyagainst the deployed siteKnown limitations (v1)
cross-filtered against other facets (no cross_filter cache for collections).
The dots and table do respect the filter. Documented in
EXPLORER_STATE.md.not to zoomed-out H3 clusters (same
#facetNotecaveat).🤖 Generated with Claude Code