Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# What's New — AutoControl

## What's new (2026-06-24) — Visual Saliency (where to look — spectral-residual)

Find the region that stands out, with no template / colour / text. Full reference: [`docs/source/Eng/doc/new_features/v190_features_doc.rst`](docs/source/Eng/doc/new_features/v190_features_doc.rst).

- **`saliency_map` / `salient_regions` / `most_salient`** (`AC_salient_regions`, `AC_most_salient`): when there's no template, colour or text to key on, an agent still needs a cue for *where to look*. This computes the spectral-residual saliency map (Hou & Zhang 2007 — log amplitude minus its local average, reconstructed through the phase) and turns it into ranked salient boxes in source pixel coordinates. The transform is a pure numpy FFT (`cv2.saliency` is in the forbidden opencv-contrib package, so it's re-implemented over base opencv); it reuses `visual_match`'s grayscale loader and `cv2_utils.blobs.connected_boxes`. Regions threshold at `mean + 2·std` by default. A coarse attention cue to *narrow* where a template / OCR pass then looks. No `PySide6`.

## What's new (2026-06-24) — Display-Scale / Visual-DPI Detection

Infer which display scale (DPI) a template renders at — and how confidently. Full reference: [`docs/source/Eng/doc/new_features/v189_features_doc.rst`](docs/source/Eng/doc/new_features/v189_features_doc.rst).

- **`detect_scale` / `scale_sweep`** (`AC_detect_scale`, `AC_scale_sweep`): a template cropped at 100% scale won't match on a 150%-DPI machine, and `match_template` returns only the single best match — discarding the per-scale scores. This keeps the whole profile: `scale_sweep` scores the template at every scale, and `detect_scale` reports the winning scale as a DPI inference (`scale_percent`) with a confidence `margin` (how far it beats the runner-up). Reuses `visual_match._score_map` per scale; source is any ndarray / path / PIL image (or the live screen); scales default to the common Windows values. cv2/numpy lazily imported. No `PySide6`.

## What's new (2026-06-24) — Image Quality Scoring (sharpness / contrast / brightness gate)

Refuse to OCR a blurry or washed-out frame — score quality and gate before recognition. Full reference: [`docs/source/Eng/doc/new_features/v188_features_doc.rst`](docs/source/Eng/doc/new_features/v188_features_doc.rst).

- **`image_quality` / `is_blurry` / `quality_gate`** (`AC_image_quality`, `AC_quality_gate`): OCR and template matching quietly fail on a blurry, washed-out or too-dark capture, and the caller can't tell a *missing* element from an *unreadable* one. This measures sharpness (variance of the Laplacian), contrast (grayscale stddev) and brightness (mean 0–255); `quality_gate` turns them into `{passed, issues}` flagging `blurry` / `low_contrast` / `too_dark` / `too_bright` so a script can pre-process or re-capture before OCR. Reuses `visual_match`'s grayscale loader (any ndarray / path / PIL image, or the live screen); cv2/numpy lazily imported. No `PySide6`.

## What's new (2026-06-24) — Drop Files onto a Window (WM_DROPFILES)

Complete a drag-and-drop programmatically — drop files onto a target window. Full reference: [`docs/source/Eng/doc/new_features/v187_features_doc.rst`](docs/source/Eng/doc/new_features/v187_features_doc.rst).
Expand Down
47 changes: 47 additions & 0 deletions docs/source/Eng/doc/new_features/v188_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Image Quality Scoring (sharpness / contrast / brightness gate)
==============================================================

OCR and template matching quietly fail on a blurry, washed-out or too-dark
capture — the locate returns nothing and the caller can't tell a *missing*
element from an *unreadable* one. ``image_quality`` measures the three things
that wreck recognition and gates on them:

* **sharpness** — variance of the Laplacian (low = blurry / out of focus),
* **contrast** — standard deviation of the grayscale (low = washed out),
* **brightness** — mean grayscale 0–255 (too low = dark, too high = blown out).

:func:`image_quality` returns the raw metrics, :func:`is_blurry` is the common
one-liner, and :func:`quality_gate` turns the metrics into a pass / fail verdict
with named issues, so a script can refuse to OCR a bad frame (or pre-process it
first). It reuses ``visual_match``'s grayscale loader, so the source is any
ndarray / path / PIL image (or the live screen when omitted); cv2 / numpy are
lazily imported. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import image_quality, is_blurry, quality_gate

image_quality("frame.png")
# {"sharpness": 842.1, "contrast": 58.3, "brightness": 131.0}

if is_blurry("frame.png", threshold=100):
... # capture again / sharpen before OCR

gate = quality_gate("frame.png", min_sharpness=100, min_contrast=12)
# {"sharpness": .., "contrast": .., "brightness": .., "passed": False,
# "issues": ["blurry", "too_dark"]}

``quality_gate`` flags ``blurry`` / ``low_contrast`` / ``too_dark`` /
``too_bright``; ``passed`` is True only when no issue fires. ``region`` applies to
a live-screen grab (omit ``source`` to grade the screen). Thresholds are tunable;
the defaults suit typical UI screenshots.

Executor commands
-----------------

``AC_image_quality`` (``source`` / ``region``) and ``AC_quality_gate`` (plus
``min_sharpness`` / ``min_contrast``). They are exposed as read-only ``ac_*`` MCP
tools and as Script Builder commands under **Image**.
47 changes: 47 additions & 0 deletions docs/source/Eng/doc/new_features/v189_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Display-Scale / Visual-DPI Detection
====================================

A template cropped at 100% display scale will not match pixel-for-pixel on a
machine running at 150% DPI — everything is 1.5x bigger. ``visual_match.
match_template`` *can* sweep scales, but it returns only the single best match's
location and throws the per-scale scores away. ``scale_detect`` keeps the whole
profile: it scores the template against the haystack at a range of scales and
reports **which scale wins, by how much**, so an automation can infer the
effective UI scale / DPI and how confident that inference is.

* :func:`scale_sweep` — the per-scale score profile (every scale's best match),
* :func:`detect_scale` — the winning scale as a DPI inference with a confidence
margin.

It reuses ``visual_match._score_map`` (the full ``matchTemplate`` surface,
oriented higher = better) for each scale, so the source is any ndarray / path /
PIL image (or the live screen). cv2 / numpy are lazily imported. Imports no
``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import detect_scale, scale_sweep

detect_scale("button.png", "screen.png")
# {"scale": 1.5, "scale_percent": 150, "score": 0.98, "center": [...],
# "margin": 0.62, "candidates": [...]}

scale_sweep("button.png", scales=[1.0, 1.25, 1.5, 1.75, 2.0])
# [{"scale": 1.0, "score": .., "center": [..]}, {"scale": 1.25, ...}, ...]

``scales`` defaults to the common Windows display scales
``(1.0, 1.25, 1.5, 1.75, 2.0)``. ``margin`` is how far the winning scale beats the
runner-up — a low margin means the inference is ambiguous. Scales at which the
template is larger than the haystack are skipped; ``detect_scale`` returns
``None`` when none fit. Omit ``haystack`` to match against the live screen
(``region`` applies to that grab).

Executor commands
-----------------

``AC_detect_scale`` and ``AC_scale_sweep`` (``template`` / ``haystack`` /
``region`` / ``scales`` / ``method``). They are exposed as read-only ``ac_*`` MCP
tools and as Script Builder commands under **Image**.
49 changes: 49 additions & 0 deletions docs/source/Eng/doc/new_features/v190_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Visual Saliency (where to look — spectral-residual)
===================================================

When there is no template, no known colour and no text to OCR, an agent still
needs a cue for *where to look* — the region that stands out from its
surroundings (a popup, a badge, a highlighted row). ``saliency`` computes the
spectral-residual saliency map (Hou & Zhang 2007) — ``log`` amplitude minus its
local average, reconstructed through the phase — and turns it into ranked salient
boxes.

* :func:`saliency_map` — the normalised (0–1) saliency map as an ndarray,
* :func:`salient_regions` — ranked salient boxes ``{x, y, width, height, center,
score}`` in source pixel coordinates,
* :func:`most_salient` — the single most salient region (the first place to look).

The transform is a pure ``numpy`` FFT — ``cv2.saliency`` lives in the forbidden
opencv-contrib package, so it is re-implemented over base opencv only. It reuses
``visual_match``'s grayscale loader (any ndarray / path / PIL image, or the live
screen) and ``cv2_utils.blobs.connected_boxes`` for region extraction. cv2 /
numpy are lazily imported. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import saliency_map, salient_regions, most_salient

most_salient("screen.png")
# {"x": 612, "y": 40, "width": 180, "height": 36, "center": [702, 58],
# "score": 0.82}

for region in salient_regions("screen.png"): # most-salient first
...

sal = saliency_map("screen.png") # (64, 64) float32 in 0..1

Regions are thresholded at ``mean + 2·std`` of the saliency map by default (pass
``threshold`` to override), extracted with ``connected_boxes`` and scaled back to
the source's pixel coordinates. ``size`` is the (small) resolution the saliency is
computed at. Saliency is a coarse attention cue, not a precise detector — use it
to *narrow* where a template / OCR pass then looks.

Executor commands
-----------------

``AC_salient_regions`` and ``AC_most_salient`` (``source`` / ``region`` / ``size``
/ ``threshold`` / ``min_area``). They are exposed as read-only ``ac_*`` MCP tools
and as Script Builder commands under **Image**.
42 changes: 42 additions & 0 deletions docs/source/Zh/doc/new_features/v188_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
影像品質評分(銳利度 / 對比 / 亮度門檻)
=======================================

OCR 與模板比對在模糊、褪色或太暗的擷取畫面上會悄悄失敗——定位回傳空值,呼叫端無法分辨是元素
*不存在*還是畫面*無法辨識*。``image_quality`` 量測三項會破壞辨識的指標並據以把關:

* **sharpness(銳利度)**——Laplacian 的變異數(低 = 模糊 / 失焦),
* **contrast(對比)**——灰階的標準差(低 = 褪色),
* **brightness(亮度)**——灰階平均 0–255(太低 = 太暗,太高 = 過曝)。

:func:`image_quality` 回傳原始指標,:func:`is_blurry` 是常用的一行式,:func:`quality_gate` 把
指標轉成通過 / 失敗的判定並附上具名問題,讓腳本可以拒絕對壞畫面做 OCR(或先做前處理)。它重用
``visual_match`` 的灰階載入器,因此來源可為任何 ndarray / 路徑 / PIL 影像(省略時則為存活螢幕);
cv2 / numpy 為延遲匯入。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import image_quality, is_blurry, quality_gate

image_quality("frame.png")
# {"sharpness": 842.1, "contrast": 58.3, "brightness": 131.0}

if is_blurry("frame.png", threshold=100):
... # 在 OCR 前重新擷取 / 銳化

gate = quality_gate("frame.png", min_sharpness=100, min_contrast=12)
# {"sharpness": .., "contrast": .., "brightness": .., "passed": False,
# "issues": ["blurry", "too_dark"]}

``quality_gate`` 會標記 ``blurry`` / ``low_contrast`` / ``too_dark`` /
``too_bright``;只有在沒有任何問題時 ``passed`` 才為 True。``region`` 套用於存活螢幕擷取(省略
``source`` 即評分螢幕)。門檻可調整;預設值適合一般 UI 截圖。

執行器指令
----------

``AC_image_quality``(``source`` / ``region``)與 ``AC_quality_gate``(另加
``min_sharpness`` / ``min_contrast``)。皆以唯讀 ``ac_*`` MCP 工具及 Script Builder 指令
(位於 **Image** 分類下)形式提供。
40 changes: 40 additions & 0 deletions docs/source/Zh/doc/new_features/v189_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
顯示縮放 / 視覺 DPI 偵測
=======================

在 100% 顯示縮放下裁切的模板,在 150% DPI 的機器上不會逐像素吻合——一切都放大了 1.5 倍。
``visual_match.match_template`` *可以* 掃過多個縮放,但它只回傳單一最佳吻合的位置,並把各縮放的
分數丟棄。``scale_detect`` 保留整個剖面:它在一系列縮放下對 haystack 評分模板,並回報**哪個縮放
勝出、勝出多少**,讓自動化能推測有效的 UI 縮放 / DPI,以及該推測的信心。

* :func:`scale_sweep` ——逐縮放的分數剖面(每個縮放的最佳吻合),
* :func:`detect_scale` ——勝出的縮放作為 DPI 推測,並附信心 margin。

它對每個縮放重用 ``visual_match._score_map``(完整的 ``matchTemplate`` 表面,方向為越高越好),
因此來源可為任何 ndarray / 路徑 / PIL 影像(或存活螢幕)。cv2 / numpy 為延遲匯入。不匯入
``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import detect_scale, scale_sweep

detect_scale("button.png", "screen.png")
# {"scale": 1.5, "scale_percent": 150, "score": 0.98, "center": [...],
# "margin": 0.62, "candidates": [...]}

scale_sweep("button.png", scales=[1.0, 1.25, 1.5, 1.75, 2.0])
# [{"scale": 1.0, "score": .., "center": [..]}, {"scale": 1.25, ...}, ...]

``scales`` 預設為常見的 Windows 顯示縮放 ``(1.0, 1.25, 1.5, 1.75, 2.0)``。``margin`` 是勝出縮放
領先次佳者的幅度——margin 低代表推測模稜兩可。模板大於 haystack 的縮放會被略過;當沒有任何縮放
吻合時 ``detect_scale`` 回傳 ``None``。省略 ``haystack`` 即對存活螢幕比對(``region`` 套用於該
擷取)。

執行器指令
----------

``AC_detect_scale`` 與 ``AC_scale_sweep``(``template`` / ``haystack`` / ``region`` /
``scales`` / ``method``)。皆以唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image**
分類下)形式提供。
42 changes: 42 additions & 0 deletions docs/source/Zh/doc/new_features/v190_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
視覺顯著度(該看哪裡——spectral-residual)
==========================================

當沒有模板、沒有已知顏色、也沒有文字可 OCR 時,agent 仍需要一個*該看哪裡*的線索——也就是從
周遭凸顯出來的區域(彈出視窗、徽章、被反白的列)。``saliency`` 計算 spectral-residual 顯著度圖
(Hou & Zhang 2007)——``log`` 振幅減去其區域平均,再透過相位重建——並轉成排序後的顯著方框。

* :func:`saliency_map` ——正規化(0–1)的顯著度圖(ndarray),
* :func:`salient_regions` ——排序後的顯著方框 ``{x, y, width, height, center, score}``
(以來源像素座標表示),
* :func:`most_salient` ——單一最顯著的區域(第一個該看的地方)。

此轉換為純 ``numpy`` FFT——``cv2.saliency`` 位於被禁用的 opencv-contrib 套件,故在 base opencv
上重新實作。它重用 ``visual_match`` 的灰階載入器(任何 ndarray / 路徑 / PIL 影像,或存活螢幕)與
``cv2_utils.blobs.connected_boxes`` 做區域擷取。cv2 / numpy 為延遲匯入。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import saliency_map, salient_regions, most_salient

most_salient("screen.png")
# {"x": 612, "y": 40, "width": 180, "height": 36, "center": [702, 58],
# "score": 0.82}

for region in salient_regions("screen.png"): # 最顯著者在前
...

sal = saliency_map("screen.png") # (64, 64) float32,範圍 0..1

區域預設以顯著度圖的 ``mean + 2·std`` 為門檻(可傳 ``threshold`` 覆寫),以 ``connected_boxes``
擷取,並縮放回來源的像素座標。``size`` 是計算顯著度所用的(較小)解析度。顯著度是粗略的注意力
線索,而非精確偵測器——用它來*縮小*接著由模板 / OCR 比對的範圍。

執行器指令
----------

``AC_salient_regions`` 與 ``AC_most_salient``(``source`` / ``region`` / ``size`` /
``threshold`` / ``min_area``)。皆以唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image**
分類下)形式提供。
13 changes: 13 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,16 @@
)
# Drop files onto a window (WM_DROPFILES sender)
from je_auto_control.utils.file_drop import drop_files, plan_file_drop
# Image quality scoring (sharpness / contrast / brightness gate before OCR)
from je_auto_control.utils.image_quality import (
image_quality, is_blurry, quality_gate,
)
# Display-scale / visual-DPI detection (per-scale match profile)
from je_auto_control.utils.scale_detect import detect_scale, scale_sweep
# Spectral-residual visual saliency (where to look — map + salient regions)
from je_auto_control.utils.saliency import (
most_salient, salient_regions, saliency_map,
)
# VLM element locator (headless)
from je_auto_control.utils.vision import (
VLMNotAvailableError, click_by_description, locate_by_description,
Expand Down Expand Up @@ -1652,6 +1662,9 @@ def start_autocontrol_gui(*args, **kwargs):
"classify_format", "classify_formats", "diff_formats",
"list_clipboard_formats", "clipboard_formats",
"plan_file_drop", "drop_files",
"image_quality", "is_blurry", "quality_gate",
"detect_scale", "scale_sweep",
"saliency_map", "salient_regions", "most_salient",
# VLM locator
"VLMNotAvailableError", "locate_by_description", "click_by_description",
"verify_description",
Expand Down
Loading
Loading