Skip to content

Search by PID in the Interactive Explorer#314

Open
rdhyee wants to merge 2 commits into
isamplesorg:mainfrom
rdhyee:feat/278-search-by-pid
Open

Search by PID in the Interactive Explorer#314
rdhyee wants to merge 2 commits into
isamplesorg:mainfrom
rdhyee:feat/278-search-by-pid

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What changes for users

Typing a plain word into the Explorer search box still does a regular text search across labels, place names, and descriptions — nothing changes there. But now, if you paste in a persistent identifier (PID) — an ARK, DOI, IGSN, handle, or a resolver URL like https://n2t.net/ark:/... — the search detects it and does an exact match against that sample's identifier instead of a fuzzy text search. There's also an explicit escape hatch: prefixing your query with pid: (e.g. pid:k2000027w) runs a scheme-agnostic substring match against the identifier, useful when you only have a fragment of the PID or aren't sure which scheme it uses.

Also resolves #26

This PR also fixes the ARK classic/modern collapse from #26 — classic-form ARKs (ark:/...) and modern-form ARKs (ark:...) are now canonicalized before matching, so searching with either form finds the same record. Per triage, #26#278, so this PR addresses both.

Implementation

  • assets/js/sql-builders.js: adds canonicalizePid, looksLikePid, and pidSearchWhere, which detect PID-shaped input, normalize across ARK/IGSN/DOI/handle schemes and resolver-URL prefixes, and build an injection-safe SQL WHERE clause (exact match for detected PIDs, ILIKE substring match for the pid: escape hatch).
  • explorer.qmd: wires this into buildSearchFilter so PID detection runs ahead of the existing text-search path, without disturbing it.
  • tests/unit/sql-builders-pid.test.mjs: 29 new unit tests covering canonicalization, detection, and WHERE-clause generation (including SQL-injection-safety cases for both quote and LIKE-metacharacter escaping). Full unit suite is 42/42 passing after rebase onto current main.

This branch was rebased onto upstream/main to pick up the squash-merged #300/#302 filtered-clusters work; only the two PID-search commits are new relative to main.

Closes #278, closes #26

🤖 Generated with Claude Code

rdhyee and others added 2 commits June 30, 2026 14:35
…esorg#26)

Adds two-sided normalised PID matching so samples are findable by their
persistent identifier (ARK, IGSN, DOI) even though those values never
appear in label/description/place_name.

New helpers in assets/js/sql-builders.js:
- canonicalizePid(value): lowercase + strip resolver-URL prefix (n2t.net,
  doi.org, arks.org, hdl.handle.net) + collapse classic ARK `ark:/` →
  modern `ark:` (closes isamplesorg#26).
- looksLikePid(term): heuristic — true when the term starts with ark:/
  igsn:/doi: or begins with "10." (bare DOI) or is a resolver URL.
  Plain-text terms are never routed through PID matching; the hot-path
  is unchanged for queries like "pottery" or "basalt".
- pidSearchWhere(rawTerm): SQL fragment: LOWER(REPLACE(pid,'ark:/','ark:'))
  = '<canonical>' OR pid ILIKE '%<localpart>%'. Both sides normalised so
  stored format (classic/modern ARK, uppercase IGSN) doesn't matter.
  All user input passed through escSql/escapeIlikePattern — no raw
  interpolation.

Wire-up in explorer.qmd buildSearchFilter: import the three new helpers;
when any search term looksLikePid, OR its pidSearchWhere into the existing
fullWhere clause. Non-PID terms are unaffected (fullWhere === searchWhere).

New test file tests/unit/sql-builders-pid.test.mjs: 22 tests covering
canonicalizePid (ARK, IGSN, DOI, resolver URLs, whitespace trim),
looksLikePid (true/false cases), and pidSearchWhere (SQL shape,
injection safety). All 27 unit tests (5 existing + 22 new) pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a `pid:` query prefix so users can find a sample by a bare local
identifier fragment without knowing the scheme (IGSN, ARK, DOI).

Typing `pid:IEGIL000C` or `pid:k2000027w` in the search box emits:
  pid ILIKE '%<fragment>%' ESCAPE '\'
DuckDB ILIKE is case-insensitive, so no LOWER is needed; the canonical
exact-match arm is intentionally skipped — the substring already spans
all scheme variants. The prefix itself is stripped case-insensitively
(`PID:`, `Pid:`, `pid:` all work).

Changes to assets/js/sql-builders.js:
- looksLikePid: adds `pid:` to the list of recognised prefixes.
- pidSearchWhere: new fast path for `pid:` terms — returns a single bare
  ILIKE predicate instead of the canonical exact-match + localpart pair.
  All other (scheme-bearing) terms keep existing behaviour unchanged.

New tests in sql-builders-pid.test.mjs (7 additional, 34 total):
- looksLikePid recognises pid: in multiple cases
- pidSearchWhere emits correct bare ILIKE for pid:IEGIL000C and pid:k2000027w
- Case-insensitive prefix strip (PID:, Pid:)
- Injection safety: single-quote doubling and LIKE metachar escaping
- Explicit Option-A confirmation: bare words without scheme not routed via PID

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding objects that should be searchable by their PIDs Make sure iSamples handles modern ARKs as well as classic ARKs

1 participant