Skip to content

Add saliency: spectral-residual visual saliency (where to look)#408

Merged
JE-Chen merged 1 commit into
devfrom
feat/saliency-batch
Jun 24, 2026
Merged

Add saliency: spectral-residual visual saliency (where to look)#408
JE-Chen merged 1 commit into
devfrom
feat/saliency-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 24, 2026

Copy link
Copy Markdown
Member

Why

When there's no template, no known colour and no text to OCR, an agent still needs a cue for where to look — the region that stands out (a popup, a badge, a highlighted row). saliency computes the spectral-residual saliency map (Hou & Zhang 2007 — log amplitude minus its local average, reconstructed through the phase) and turns it into ranked salient boxes.

  • saliency_map — the normalised (0–1) saliency map as an ndarray
  • salient_regions — ranked salient boxes {x, y, width, height, center, score} in source pixel coordinates
  • most_salient — the single most salient region (the first place to look)

Design

  • The transform is a pure numpy FFTcv2.saliency lives in the forbidden opencv-contrib package, so it's re-implemented over base opencv only.
  • Reuses visual_match._haystack_gray (any ndarray / path / PIL image, or the live screen) and cv2_utils.blobs.connected_boxes for region extraction. cv2/numpy lazily imported.
  • Regions threshold at mean + 2·std of the saliency map by default (scale-invariant; pass threshold to override), then scale back to source pixel coordinates. Saliency is a coarse attention cue, documented as such — it narrows where a template / OCR pass then looks.
  • 5 layers wired: core → facade __all__AC_salient_regions / AC_most_salient → read-only ac_* MCP tools → Script Builder (Image). Qt-free verified.

Tests

test/unit_test/headless/test_saliency_batch.py (cv2 via importorskip) — map shape/dtype/range, size param, salient regions in-bounds + ranked + scores in [0,1] on a 3-block frame, most_salient matches the top region, the high-threshold []/None path, the pure executor path, and 5-layer wiring. 23 passed with the vision siblings. This completes the vision lane HIGH items (image_quality / scale_detect / saliency).

When there's no template, colour or text to key on, an agent still
needs a cue for where to look. Compute the spectral-residual saliency
map (Hou & Zhang 2007) and rank salient boxes in source coordinates.
Pure numpy FFT (cv2.saliency is opencv-contrib, forbidden), reusing
visual_match's grayscale loader and cv2_utils.blobs.connected_boxes;
regions threshold at mean+2*std by default. A coarse attention cue to
narrow where a template / OCR pass then looks.
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 30 complexity · 0 duplication

Metric Results
Complexity 30
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit c3a4f1a into dev Jun 24, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/saliency-batch branch June 24, 2026 06:55
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant