Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-24) — Visual Saliency (where to look — spectral-residual)

Find the region that stands out, with no template / colour / text. Full reference: [`docs/source/Eng/doc/new_features/v190_features_doc.rst`](docs/source/Eng/doc/new_features/v190_features_doc.rst).

- **`saliency_map` / `salient_regions` / `most_salient`** (`AC_salient_regions`, `AC_most_salient`): when there's no template, colour or text to key on, an agent still needs a cue for *where to look*. This computes the spectral-residual saliency map (Hou & Zhang 2007 — log amplitude minus its local average, reconstructed through the phase) and turns it into ranked salient boxes in source pixel coordinates. The transform is a pure numpy FFT (`cv2.saliency` is in the forbidden opencv-contrib package, so it's re-implemented over base opencv); it reuses `visual_match`'s grayscale loader and `cv2_utils.blobs.connected_boxes`. Regions threshold at `mean + 2·std` by default. A coarse attention cue to *narrow* where a template / OCR pass then looks. No `PySide6`.

## What's new (2026-06-24) — Display-Scale / Visual-DPI Detection

Infer which display scale (DPI) a template renders at — and how confidently. Full reference: [`docs/source/Eng/doc/new_features/v189_features_doc.rst`](docs/source/Eng/doc/new_features/v189_features_doc.rst).
Expand Down
49 changes: 49 additions & 0 deletions docs/source/Eng/doc/new_features/v190_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Visual Saliency (where to look — spectral-residual)
===================================================

When there is no template, no known colour and no text to OCR, an agent still
needs a cue for *where to look* — the region that stands out from its
surroundings (a popup, a badge, a highlighted row). ``saliency`` computes the
spectral-residual saliency map (Hou & Zhang 2007) — ``log`` amplitude minus its
local average, reconstructed through the phase — and turns it into ranked salient
boxes.

* :func:`saliency_map` — the normalised (0–1) saliency map as an ndarray,
* :func:`salient_regions` — ranked salient boxes ``{x, y, width, height, center,
score}`` in source pixel coordinates,
* :func:`most_salient` — the single most salient region (the first place to look).

The transform is a pure ``numpy`` FFT — ``cv2.saliency`` lives in the forbidden
opencv-contrib package, so it is re-implemented over base opencv only. It reuses
``visual_match``'s grayscale loader (any ndarray / path / PIL image, or the live
screen) and ``cv2_utils.blobs.connected_boxes`` for region extraction. cv2 /
numpy are lazily imported. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import saliency_map, salient_regions, most_salient

most_salient("screen.png")
# {"x": 612, "y": 40, "width": 180, "height": 36, "center": [702, 58],
# "score": 0.82}

for region in salient_regions("screen.png"): # most-salient first
...

sal = saliency_map("screen.png") # (64, 64) float32 in 0..1

Regions are thresholded at ``mean + 2·std`` of the saliency map by default (pass
``threshold`` to override), extracted with ``connected_boxes`` and scaled back to
the source's pixel coordinates. ``size`` is the (small) resolution the saliency is
computed at. Saliency is a coarse attention cue, not a precise detector — use it
to *narrow* where a template / OCR pass then looks.

Executor commands
-----------------

``AC_salient_regions`` and ``AC_most_salient`` (``source`` / ``region`` / ``size``
/ ``threshold`` / ``min_area``). They are exposed as read-only ``ac_*`` MCP tools
and as Script Builder commands under **Image**.
42 changes: 42 additions & 0 deletions docs/source/Zh/doc/new_features/v190_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
視覺顯著度(該看哪裡——spectral-residual)
==========================================

當沒有模板、沒有已知顏色、也沒有文字可 OCR 時,agent 仍需要一個*該看哪裡*的線索——也就是從
周遭凸顯出來的區域(彈出視窗、徽章、被反白的列)。``saliency`` 計算 spectral-residual 顯著度圖
(Hou & Zhang 2007)——``log`` 振幅減去其區域平均,再透過相位重建——並轉成排序後的顯著方框。

* :func:`saliency_map` ——正規化(0–1)的顯著度圖(ndarray),
* :func:`salient_regions` ——排序後的顯著方框 ``{x, y, width, height, center, score}``
(以來源像素座標表示),
* :func:`most_salient` ——單一最顯著的區域(第一個該看的地方)。

此轉換為純 ``numpy`` FFT——``cv2.saliency`` 位於被禁用的 opencv-contrib 套件,故在 base opencv
上重新實作。它重用 ``visual_match`` 的灰階載入器(任何 ndarray / 路徑 / PIL 影像,或存活螢幕)與
``cv2_utils.blobs.connected_boxes`` 做區域擷取。cv2 / numpy 為延遲匯入。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import saliency_map, salient_regions, most_salient

most_salient("screen.png")
# {"x": 612, "y": 40, "width": 180, "height": 36, "center": [702, 58],
# "score": 0.82}

for region in salient_regions("screen.png"): # 最顯著者在前
...

sal = saliency_map("screen.png") # (64, 64) float32,範圍 0..1

區域預設以顯著度圖的 ``mean + 2·std`` 為門檻(可傳 ``threshold`` 覆寫),以 ``connected_boxes``
擷取,並縮放回來源的像素座標。``size`` 是計算顯著度所用的(較小)解析度。顯著度是粗略的注意力
線索,而非精確偵測器——用它來*縮小*接著由模板 / OCR 比對的範圍。

執行器指令
----------

``AC_salient_regions`` 與 ``AC_most_salient``(``source`` / ``region`` / ``size`` /
``threshold`` / ``min_area``)。皆以唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image**
分類下)形式提供。
5 changes: 5 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@
)
# Display-scale / visual-DPI detection (per-scale match profile)
from je_auto_control.utils.scale_detect import detect_scale, scale_sweep
# Spectral-residual visual saliency (where to look — map + salient regions)
from je_auto_control.utils.saliency import (
most_salient, salient_regions, saliency_map,
)
# VLM element locator (headless)
from je_auto_control.utils.vision import (
VLMNotAvailableError, click_by_description, locate_by_description,
Expand Down Expand Up @@ -1660,6 +1664,7 @@ def start_autocontrol_gui(*args, **kwargs):
"plan_file_drop", "drop_files",
"image_quality", "is_blurry", "quality_gate",
"detect_scale", "scale_sweep",
"saliency_map", "salient_regions", "most_salient",
# VLM locator
"VLMNotAvailableError", "locate_by_description", "click_by_description",
"verify_description",
Expand Down
18 changes: 18 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,24 @@ def _add_image_specs(specs: List[CommandSpec]) -> None:
),
description="Per-scale match-score profile of a template.",
))
saliency_fields = (
FieldSpec("source", FieldType.FILE_PATH, optional=True),
FieldSpec("region", FieldType.STRING, optional=True,
placeholder=_REGION_PLACEHOLDER),
FieldSpec("size", FieldType.INT, optional=True, default=64),
FieldSpec("threshold", FieldType.FLOAT, optional=True),
FieldSpec("min_area", FieldType.INT, optional=True, default=4),
)
specs.append(CommandSpec(
"AC_salient_regions", "Image", "Salient Regions",
fields=saliency_fields,
description="Visually salient regions (spectral-residual; where to look).",
))
specs.append(CommandSpec(
"AC_most_salient", "Image", "Most Salient Region",
fields=saliency_fields,
description="The single most visually salient region of an image/screen.",
))
specs.append(CommandSpec(
"AC_changed_regions", "Image", "Changed Regions (motion)",
fields=(
Expand Down
23 changes: 23 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -4327,6 +4327,27 @@ def _scale_sweep(template: Any, haystack: Any = None, region: Any = None,
method=str(method))}


def _salient_regions(source: Any = None, region: Any = None, size: Any = 64,
threshold: Any = None, min_area: Any = 4) -> Dict[str, Any]:
"""Adapter: ranked visually-salient regions of an image / the screen."""
from je_auto_control.utils.saliency import salient_regions
cut = float(threshold) if threshold not in (None, "") else None
regions = salient_regions(source, region=_coerce_region(region),
size=int(size), threshold=cut,
min_area=int(min_area))
return {"regions": regions, "count": len(regions)}


def _most_salient(source: Any = None, region: Any = None, size: Any = 64,
threshold: Any = None, min_area: Any = 4) -> Dict[str, Any]:
"""Adapter: the single most visually-salient region (where to look)."""
from je_auto_control.utils.saliency import most_salient
cut = float(threshold) if threshold not in (None, "") else None
result = most_salient(source, region=_coerce_region(region),
size=int(size), threshold=cut, min_area=int(min_area))
return {"found": result is not None, "region": result}


def _image_histogram(source: Any = None, bins: Any = 32, space: str = "hsv",
region: Any = None) -> Dict[str, Any]:
"""Adapter: per-channel colour histogram of an image / the screen."""
Expand Down Expand Up @@ -6553,6 +6574,8 @@ def __init__(self):
"AC_quality_gate": _quality_gate,
"AC_detect_scale": _detect_scale,
"AC_scale_sweep": _scale_sweep,
"AC_salient_regions": _salient_regions,
"AC_most_salient": _most_salient,
"AC_image_histogram": _image_histogram,
"AC_histogram_changed": _histogram_changed,
"AC_changed_regions": _changed_regions,
Expand Down
29 changes: 29 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3443,6 +3443,35 @@ def img_histogram_tools() -> List[MCPTool]:
handler=h.scale_sweep,
annotations=READ_ONLY,
),
MCPTool(
name="ac_salient_regions",
description=("Visually salient regions of 'source' (image path; "
"default screen grab of 'region') via spectral-residual "
"saliency — where to look with no template/text. Returns "
"{regions:[{x,y,width,height,center,score}], count}."),
input_schema=schema({
"source": {"type": "string"},
"region": {"type": "array", "items": {"type": "integer"}},
"size": {"type": "integer"},
"threshold": {"type": "number"},
"min_area": {"type": "integer"}}),
handler=h.salient_regions,
annotations=READ_ONLY,
),
MCPTool(
name="ac_most_salient",
description=("The single most visually salient region of 'source' "
"(default screen): {found, region:{x,y,width,height,"
"center,score}}. The first place to look."),
input_schema=schema({
"source": {"type": "string"},
"region": {"type": "array", "items": {"type": "integer"}},
"size": {"type": "integer"},
"threshold": {"type": "number"},
"min_area": {"type": "integer"}}),
handler=h.most_salient,
annotations=READ_ONLY,
),
]


Expand Down
11 changes: 11 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2532,6 +2532,17 @@ def scale_sweep(template, haystack=None, region=None, scales=None,
return _scale_sweep(template, haystack, region, scales, method)


def salient_regions(source=None, region=None, size=64, threshold=None,
min_area=4):
from je_auto_control.utils.executor.action_executor import _salient_regions
return _salient_regions(source, region, size, threshold, min_area)


def most_salient(source=None, region=None, size=64, threshold=None, min_area=4):
from je_auto_control.utils.executor.action_executor import _most_salient
return _most_salient(source, region, size, threshold, min_area)


def image_histogram(source=None, bins=32, space="hsv", region=None):
from je_auto_control.utils.executor.action_executor import _image_histogram
return _image_histogram(source, bins, space, region)
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/saliency/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Spectral-residual visual saliency: map + ranked salient regions (numpy FFT)."""
from je_auto_control.utils.saliency.saliency import (
most_salient, salient_regions, saliency_map,
)

__all__ = ["saliency_map", "salient_regions", "most_salient"]
101 changes: 101 additions & 0 deletions je_auto_control/utils/saliency/saliency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""Find the visually salient regions of a frame (spectral-residual saliency).

When there is no template, no known colour and no text to OCR, an agent still
needs a cue for *where to look* — the region that stands out from its
surroundings (a popup, a badge, a highlighted row). ``saliency`` computes the
spectral-residual saliency map (Hou & Zhang 2007) — ``log`` amplitude minus its
local average, reconstructed through the phase — and turns it into ranked salient
boxes.

The transform is a pure ``numpy`` FFT (``cv2.saliency`` lives in the forbidden
opencv-contrib package, so it is re-implemented here over base opencv only). It
reuses ``visual_match``'s grayscale loader for the source (any ndarray / path /
PIL image, or the live screen) and ``cv2_utils.blobs.connected_boxes`` for the
region extraction. cv2 / numpy are lazily imported. Imports no ``PySide6``.
"""
from typing import Any, Dict, List, Optional, Sequence, Tuple

ImageSource = Any


def _gray(source: Optional[ImageSource], region: Optional[Sequence[int]]):
from je_auto_control.utils.visual_match.visual_match import _haystack_gray
return _haystack_gray(source, region)


def _saliency_from_gray(gray, size: int):
import cv2
import numpy as np
small = cv2.resize(gray, (size, size),
interpolation=cv2.INTER_AREA).astype(np.float32)
fft = np.fft.fft2(small)
log_amplitude = np.log(np.abs(fft) + 1e-8)
residual = log_amplitude - cv2.blur(log_amplitude, (3, 3))
recon = np.fft.ifft2(np.exp(residual + 1j * np.angle(fft)))
smoothed = cv2.GaussianBlur(np.abs(recon) ** 2, (0, 0), sigmaX=3.0)
peak = float(smoothed.max())
if peak > 0:
smoothed = smoothed / peak
return smoothed.astype(np.float32)


def saliency_map(source: Optional[ImageSource] = None, *,
region: Optional[Sequence[int]] = None, size: int = 64):
"""Return the normalised (0–1) spectral-residual saliency map as an ndarray.

The map is computed at ``size`` x ``size`` (the algorithm's native low
resolution); higher = more salient.
"""
return _saliency_from_gray(_gray(source, region), int(size))


def _regions_from_saliency(saliency, orig_shape: Tuple[int, int],
threshold: Optional[float], min_area: int,
size: int) -> List[Dict[str, Any]]:
from je_auto_control.utils.cv2_utils.blobs import connected_boxes
if threshold is not None:
cut = float(threshold)
else: # scale-invariant: regions standing 2 std above the mean saliency
cut = float(saliency.mean()) + 2.0 * float(saliency.std())
mask = (saliency >= cut).astype("uint8") * 255
orig_height, orig_width = int(orig_shape[0]), int(orig_shape[1])
scale_x, scale_y = orig_width / float(size), orig_height / float(size)
regions: List[Dict[str, Any]] = []
for box in connected_boxes(mask, min_area=min_area):
x, y = int(box["x"] * scale_x), int(box["y"] * scale_y)
width = max(1, int(box["width"] * scale_x))
height = max(1, int(box["height"] * scale_y))
patch = saliency[box["y"]:box["y"] + box["height"],
box["x"]:box["x"] + box["width"]]
score = float(patch.mean()) if patch.size else 0.0
regions.append({"x": x, "y": y, "width": width, "height": height,
"center": [x + width // 2, y + height // 2],
"score": score})
regions.sort(key=lambda region: region["score"], reverse=True)
return regions


def salient_regions(source: Optional[ImageSource] = None, *,
region: Optional[Sequence[int]] = None, size: int = 64,
threshold: Optional[float] = None,
min_area: int = 4) -> List[Dict[str, Any]]:
"""Return salient regions as ``[{x, y, width, height, center, score}]``.

Boxes are thresholded from the saliency map (default cut = 3x the mean,
per Hou & Zhang), extracted with ``connected_boxes`` and scaled back to the
source's pixel coordinates, ranked most-salient first.
"""
gray = _gray(source, region)
saliency = _saliency_from_gray(gray, int(size))
return _regions_from_saliency(saliency, gray.shape[:2], threshold,
int(min_area), int(size))


def most_salient(source: Optional[ImageSource] = None, *,
region: Optional[Sequence[int]] = None, size: int = 64,
threshold: Optional[float] = None,
min_area: int = 4) -> Optional[Dict[str, Any]]:
"""Return the single most salient region, or ``None`` if none stand out."""
regions = salient_regions(source, region=region, size=size,
threshold=threshold, min_area=min_area)
return regions[0] if regions else None
Loading
Loading