Typed operations & engines: spine, 6 engines, plans, models, facades (#689) by tony · Pull Request #690 · tmux-python/libtmux

tony · 2026-06-21T14:02:37Z

Summary

Implements the typed operations + engines architecture under libtmux.experimental.{ops,engines,models,facade} — an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with ;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.

Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under libtmux.experimental (explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.

What's delivered

The spine — libtmux.experimental.ops (pure, no tmux):

Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Pure render() with declarative version gating; build_result() adapts raw output to a typed result (version-threaded so read parsing matches the gated render).
Typed Result hierarchy with opt-in raise_for_status(): AckResult (no-output commands — success/failure only), SplitWindowResult/CreateResult (captured ids), CapturePaneResult (lines), ListPanes/Windows/SessionsResult (snapshot-deriving rows).
Closed Target sum, fail-closed OperationRegistry, stdlib serialization, and catalog() (registry-derived docs data).
LazyPlan (record → resolve SlotRef forward refs → execute) with chainability: >> / OpChain composition and execute(fold=True) folding chainable runs into one tmux a ; b dispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux's cmdq_remove_group).
Read seam: ListPanes/ListWindows/ListSessions ops render the same -F template neo uses (imported, not copied) and parse into models snapshots — a typed read surface parallel to neo, leaving the ORM untouched.
57 operations across client/pane/window/session/server scopes.

Engines — libtmux.experimental.engines (all behind TmuxEngine/AsyncTmuxEngine, all returning the same CommandResult):

Family	Sync	Async
Subprocess (classic)	`SubprocessEngine`	`AsyncSubprocessEngine`
Concrete (in-memory)	`ConcreteEngine`	`AsyncConcreteEngine`
Control mode (`tmux -C`)	`ControlModeEngine`	`AsyncControlModeEngine` (event stream via `subscribe()`)
Native imsg (binary protocol)	`ImsgEngine` (opt-in easter egg)	—

Control engines use an I/O-free bytes ControlModeParser with FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX + SCM_RIGHTS, PROTOCOL_VERSION 8) and has a live parity test vs the subprocess engine the prototype never had.

Models — libtmux.experimental.models: frozen Pane/Window/Session/ServerSnapshot (typed core + raw field tail), from_pane_rows() builds the whole tree from one list-panes -a query, round-trips to plain dicts — neo-like but decoupled and serializable.

Facades — libtmux.experimental.facade ("mode lives in the type"): eager Server→Session→Window→Pane navigation, LazyWindow/LazyPane, AsyncWindow/AsyncPane — all over the same ops; control mode is just an engine choice.

Docs: an in-repo tmuxop-catalog Sphinx directive renders catalog() into the operation reference (exercised by the docs gate), so the reference can't drift from the code.

Testing

~240 experimental tests + doctests; the pure spine/models/concrete tests need no tmux, while classic/control/async/imsg engines and the facades are validated against a real tmux server via the libtmux fixtures.
Cross-engine contract suite: same typed result across engines; serialization round-trips.
Full repo gate green: ruff, ruff format, mypy --strict, pytest (1501 passed, 2 skipped), build-docs. (The occasional test_retry.py timing flake is pre-existing and unrelated — passes in isolation.)

Design notes

Revises Design typed operations and engines #688: execution mode lives in the facade type, not a runtime-bound engine attribute (return types differ by mode).
Per-engine error policy: classic reproduces today's behavior; newer engines return typed results with opt-in raise_for_status(). Same result shape across engines.
Core is stdlib-dataclass-only; an OTel/MCP edge can sit behind an extra.
imsg is opt-in and non-default: it depends on tmux's internal protocol (v8), is POSIX-only, and cannot host attach (which falls back to a local spawn).

Refs #688, #689.

@register

why: Operationalizes the typed-operations/engines architecture (issues 688, 689) with the pure substrate that was absent from every prototype branch: an inert, statically-typed operation value that renders tmux commands, carries its result type, and serializes without a live tmux server. Engines stay transport-agnostic over it. None of this touches or changes existing public APIs. what: - Add libtmux.experimental.{ops,engines} packages (experimental, not under the versioning policy) - ops: frozen Operation[ResultT] with class-level metadata as the single source of truth; pure render() with declarative version gating (LooseVersion); build_result() adapting raw output to typed results - ops: typed Result base + raise_for_status() (CPython/requests precedent), SplitWindowResult/CapturePaneResult payloads - ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/ IndexRef/Special/SlotRef) with fail-closed validation - ops: fail-closed OperationRegistry keyed by kind, with OpSpec views and predicate listing; stdlib dict serialization with round-trips - ops: four seed operations (split-window, capture-pane, send-keys, select-layout) registered via @register - engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/ CommandResult, EngineSpec; run()/arun() execute bridge sharing one render/build path (sync vs await is the only divergence) - tests: 111 pure, fixture-parametrizable unit tests + doctests, all runnable without a tmux server

codecov · 2026-06-21T14:03:52Z

Codecov Report

❌ Patch coverage is 78.68582% with 1291 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.42%. Comparing base (42cf219) to head (4229265).

Files with missing lines	Patch %	Lines
scripts/mcp_swap.py	26.72%	314 Missing and 15 partials ⚠️
src/libtmux/experimental/engines/imsg/base.py	51.59%	163 Missing and 34 partials ⚠️
src/libtmux/experimental/engines/control_mode.py	65.43%	70 Missing and 33 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py	71.74%	42 Missing and 21 partials ⚠️
src/libtmux/experimental/engines/imsg/v8.py	75.29%	46 Missing and 17 partials ⚠️
...rc/libtmux/experimental/mcp/vocabulary/_resolve.py	60.95%	46 Missing and 11 partials ⚠️
docs/_ext/tmuxop.py	18.18%	36 Missing ⚠️
src/libtmux/experimental/mcp/vocabulary/pane.py	75.34%	32 Missing and 4 partials ⚠️
src/libtmux/experimental/mcp/__init__.py	46.77%	28 Missing and 5 partials ⚠️
src/libtmux/experimental/workspace/runner.py	60.86%	19 Missing and 8 partials ⚠️
... and 61 more

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #690       +/-   ##
===========================================
+ Coverage   51.30%   72.42%   +21.12%     
===========================================
  Files          25      192      +167     
  Lines        3487    10873     +7386     
  Branches      686     1431      +745     
===========================================
+ Hits         1789     7875     +6086     
- Misses       1403     2414     +1011     
- Partials      295      584      +289

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

why: Proves the operation/result contract is transport-agnostic -- the same typed result whether produced by a real tmux subprocess or an in-memory simulator -- and provides the offline engine that lets ops doctests and tests run without a tmux server (issue 689 phases 2-3). what: - engines.subprocess: classic SubprocessEngine mirroring tmux_cmd (has-session stderr fold, backslashreplace, trailing-blank strip; tmux failure returned as data, only missing binary raises), with for_server() deriving -L/-S/-f/-2 flags from a live Server - engines.concrete: deterministic in-memory engine (fabricated pane/ window/session ids, canned capture lines) for tests and docs - engines.registry: name-keyed engine registry (register/create/ available), seeded with subprocess + concrete - tests/experimental/contract: engine-agnostic operation contract run offline via concrete, plus classic-vs-concrete parity against a real tmux server (same result type + argv, payload may differ)

why: Completes the sync/async-symmetric execution story plus the deferred-execution and documentation mechanisms from issue 689 (phase 5 + docs), still without touching any existing API. what: - engines.asyncio: real AsyncSubprocessEngine on create_subprocess_exec (terminates the child on cancellation; not a thread wrapper), mirroring the classic engine's output handling so it returns the same typed result - ops.plan: LazyPlan records operations without touching tmux and resolves SlotRef forward refs at execute time via a sans-I/O generator; sync execute() and async aexecute() share one resolution core (run vs await arun is the only divergence); whole-plan serialization round-trips - ops.catalog: registry-driven CatalogEntry list (scope, version gates, effects, safety, result type, summary) -- the single source a docs domain renders, so runtime and docs cannot drift - tests: lazy resolution sync+async, plan serialization, catalog coverage, async-vs-sync classic parity against a real tmux server

why: Proves control mode is just another engine returning the same typed result (issue 689 phase 4) -- an operation run over a persistent tmux -C connection is indistinguishable, at the result level, from one run via fork-per-call subprocess. what: - engines.control_mode: ControlModeEngine over one persistent tmux -C connection; run_batch pipelines commands and parses each command's %begin/%end/%error block into a CommandResult; selectors-based nonblocking reads with timeout; startup-ACK discard; lifecycle via close()/context manager (lock-guarded teardown) - engines.control_mode: I/O-free ControlModeParser, unit-testable without tmux, adapted from the chain runner + protocol-engines parser - register control_mode in the engine registry and export it - tests: pure parser tests + real-tmux contract (split creates a real pane, batched commands, control-vs-concrete parity)

why: Demonstrates the "mode lives in the type" model from issue 689 -- EagerPane.split() returns a live EagerPane while LazyPane.split() returns a deferred LazyPane, each a single statically-known return type, both backed by the same SplitWindow operation. One Pane class with a runtime-bound engine could not type these return values distinctly. what: - facade.pane.EagerPane: executes immediately, returns live handles (split -> EagerPane), typed results for capture/send_keys - facade.pane.LazyPane: records into a LazyPlan, returns deferred handles (split -> LazyPane bound to the new pane's SlotRef), chainable - seed of the wider Server/Session/Window/Pane/Client x mode matrix - tests: eager live handles, lazy deferral + forward-ref resolution, and same-operation-backs-both-facades parity

why: Closes the two async gaps from issue 689: control mode and concrete had no async sibling. The async control engine is the one async engine that earns its place -- it adds an event stream subprocess cannot -- and prior libtmux/mux control-mode work (surfaced across agent histories via agentgrep, plus the asyncio-2 branches) shaped its correlation design. what: - engines.async_control_mode: AsyncControlModeEngine over a persistent tmux -C (create_subprocess_exec + one reader task). FIFO future correlation with skip-when-empty so unsolicited %begin blocks (hook- triggered commands and the startup ACK) never desync results; the startup ACK is consumed synchronously in start() to close the correlation race our whole-block parser would otherwise have. DEAD state fails pending commands on reader EOF/error. Cancellation via asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded subscribe() notification stream with drop-counting. for_server() helper - engines.control_mode: ControlModeParser now surfaces bare %-notification lines via notifications() (additive; the sync engine ignores them) - engines.concrete: AsyncConcreteEngine sibling over shared simulation; removes the async test shim - ControlNotification typed event value - tests: parser notification/drain; async control vs real tmux (split, pipelined batch, concrete parity, live event stream, lifecycle)

why: Many tmux commands print nothing (rename-window, kill-pane, select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls cmdq_error on failure, framed in control mode as %end vs %error (see tmux cmd-queue.c) -- they never cmdq_print. They still need a typed result that records success/failure without inventing a payload. what: - results.AckResult: a typed acknowledgement (no payload) whose raise_for_status() still surfaces the error path; documents the tmux success/error mapping - retarget send-keys and select-layout to AckResult (both print nothing) - add no-output ops: rename-window (mutating), kill-window and kill-pane (destructive) -- exercising AckResult across scopes and safety tiers - export AckResult and the new ops; refresh the catalog doctest - tests: render + AckResult success/failure across the no-output ops and destructive safety metadata; update classic/control parity assertions

why: A neo-like read model is useful, but neo.Obj is one flat ~200-field class fused to the query/dispatch pipeline. The experimental namespace lets us try a decoupled, immutable, serializable snapshot layer without any risk to the shipped ORM APIs. what: - libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot / SessionSnapshot / ServerSnapshot, each a typed core plus the full raw tmux-format tail in .fields (nothing tmux reported is lost) - from_format() builds one node from a format mapping; ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row set into an ordered session/window/pane tree - to_dict()/from_dict() round-trip the whole tree as plain data, with no live objects - pure tests (no tmux): value coercion, tree grouping/order, round-trip

why: The list/show read commands overlap neo's reader. Rather than touch the ORM, add a parallel typed read surface in experimental.ops that yields immutable models snapshots. The render version must thread into result parsing first, because the -F template is version-gated and the parser must split against the same fields it was rendered with. what: - operation: thread `version` through build_result -> _make_result so payload parsing matches the version-gated render (backward compatible; existing overrides accept and ignore it); execute.run/arun pass it - ops._read: re-export neo.get_output_format / parse_output and formats.FORMAT_SEPARATOR as the single source of truth (no copies) - list-panes / list-windows / list-sessions ops (readonly, chainable=False) render the same -F template neo builds and parse rows into models snapshots - ListPanesResult/.../ store JSON-friendly rows and derive typed views (.panes/.server/.windows/.sessions) via properties, so results serialize and round-trip with no special-casing - tests: -F parity with neo, snapshot-tree build, serialize round-trip, and live list-panes/sessions/windows against a real tmux server

why: The operation catalog is registry-derived data, so rendering it in docs keeps the operation reference from drifting from the code -- and the docs gate then exercises catalog() on every build. what: - docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that walks libtmux.experimental.ops.catalog() and emits a table, with :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty - conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions - docs/experimental.md: an experimental ops/engines overview embedding the catalog (full + readonly + destructive views), in the index toctree

why: The sync control engine skipped tmux's startup ACK with a fragile one-shot flags==0 heuristic and had no defense against hook-emitted %begin/%end blocks, so a stray block could desync request->result alignment. The async engine already handles this; backport the approach. what: - consume the startup ACK synchronously at connect (_consume_startup), dropping the one-shot _startup_ack_pending heuristic, so the startup block can never be conflated with a command's result block - drain buffered unsolicited blocks before each batch (_drain_unsolicited), so a hook-triggered command's block left over from a prior call is not mis-attributed to the next command - drain notifications during reads to keep the parser buffer bounded - regression test: many sequential commands stay aligned (first result is real; each call drains before reading its own block) A hook firing mid-pipelined-batch still needs per-command number correlation to disambiguate; single-command run() is robust.

why: The chainable-commands prototype folds independent commands into one "tmux a ; b" dispatch. Our typed-op model is a better host for it -- the Operation already carries a `chainable` classvar and the result Status already reserves `skipped` for exactly the chain-drop case. So yes, lazy mode can adopt the prototype's chainability. what: - mark output/creation ops non-chainable (capture-pane, split-window; list-* already were) so a fold never drops captured data or an id - ops._chain: render_chain (join chainable ops with standalone ';', escaping a trailing-';' arg), ensure_chainable (fail closed), and attribute -- splitting one merged ';'-chain result into a typed result per op (success -> all complete; failure -> first failed, rest skipped, matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then - Operation.__rshift__/then compose into an OpChain; result_with_status() builds a result with an explicit status (skipped/failed attribution) - LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of chainable, resolved ops dispatch once via engine.run; the sans-I/O _drive yields _Single or _Chain so sync and async share the core; add_chain() records an OpChain - tests: >> composition, render_chain, fold=one dispatch, fold-off=N dispatches, failure attribution, creators stay unfolded, add_chain

why: Extend the mode-in-the-type facades beyond the pane seed so a typed return value distinguishes eager/lazy/async across scopes -- and add the few creation ops the cross-scope navigation needs. what: - ops: NewWindow / NewSession (CreateResult, capture the new id), KillSession, RenameSession; generalize binding capture via Result.created_id (base None; SplitWindowResult -> new_pane_id; CreateResult -> new_id) so lazy plans bind window/session creations too - facade: eager Server -> Session -> Window -> Pane navigation (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops. Control mode stays an engine choice, not a separate facade family - EagerServer.for_server() binds the classic engine to a live Server - tests: offline navigation across scopes/modes (concrete engine), and a live eager Server -> Session -> Window -> Pane build against real tmux with cleanup

why: The native binary peer-protocol engine is the strongest proof the operation/result contract is transport-agnostic -- the same typed CommandResult whether produced by a subprocess, tmux -C, or by speaking tmux's imsg protocol directly. Research confirmed it is pure-stdlib and CI-verifiable; the prototype it is ported from only ever tested against a fake socketpair server, never real tmux. what: - port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines: ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len, peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT handshake); posix_spawn local fallback for attach / start-server / no-server-running - adapt to the experimental tuple CommandResult (drop the process field); add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion) and select the v8 codec directly; keep the version-mismatch retry - register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX is only touched at runtime; tests skip without it) - tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine return identical stdout/returncode for read-only commands against a real tmux server (runs across the CI tmux matrix)

why: Finish the mode-in-the-type matrix so every tmux scope has eager/lazy/async facades, and add the client-scoped ops a Client facade needs. The matrix is now 5 scopes x 3 modes, all over the shared spine. what: - ops: detach-client, refresh-client, switch-client (AckResult, client scope; switch-client renders -c/-t rather than the generic target) - facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server binds the async engine to a live Server - tests: a lazy full Server->Session->Window->pane plan, async navigation, and eager/lazy/async client methods

why: The pre-commit gate now runs `uv run ty check`, so ty must be a configured dev tool. Brings the ty setup from the add-ty-type-checker branch and makes the experimental tree ty-clean. what: - add `ty` to the dev dependency group (uv.lock updated) - add [tool.ty] (environment py3.10, src=src/tests) with the documented rule ignores for known ty false positives, ported verbatim - fixes ty surfaced in experimental: Target is now a real union (ty rejects an implicit two-string type alias); OperationRegistry.list -> select so the `-> list[OpSpec]` return annotation is not shadowed by the method name

why: Make lazy-plan dispatch strategy pluggable and A/B-testable, and add the chainable-commands {marked} lone-pane single-dispatch optimization the plain ;-fold lacked. what: - ops.planner: Planner Protocol + PlanStep; SequentialPlanner (one dispatch per op), FoldingPlanner (;-fold maximal chainable runs), MarkedPlanner (fold a pane creation + the chainable ops decorating its slot into one "split -P -F ; select-pane -m ; ... -t {marked} ; select-pane -M" dispatch) - _chain: render_marked / attribute_marked - LazyPlan.execute/aexecute take planner= (default SequentialPlanner), replacing fold=bool; _drive consumes the planner's PlanStep units and stays sans-I/O so sync and async share it - tests (NamedTuple + test_id): planner dispatch counts 3/2/1 with an identical PlanResult, marked single-dispatch rendering + fallback, and a live {marked} fold against a real tmux server

why: The read seam only covered the list-* family, leaving common queries (existence, format evaluation, option dumps, attached clients) outside the typed operation/result model. what: - Add has-session, display-message, show-options, list-clients ops, each rendering inert argv and parsing tmux output into a typed result - Add HasSessionResult.exists, DisplayMessageResult.text, ShowOptionsResult.options, ListClientsResult.clients result types - Add ClientSnapshot model (a leaf view, not part of the tree) - has-session maps rc != 0 to exists=False (a valid answer, not failure) - Wire ops/results/snapshot exports; update enumerating doctests/tests - Add test_read_breadth.py (NamedTuple + test_id render/parse/round-trip cases plus live tmux coverage)

why: The operation surface lacked the pane verbs the ORM relies on (select/resize/swap/break/join/move/respawn/pipe/clear-history), blocking pane-level parity for engine-driven callers. what: - Add select-pane, last-pane, resize-pane, respawn-pane, pipe-pane, clear-history (single-target) ops - Add swap-pane, join-pane, move-pane (dual-target) and break-pane (creates a window, captures #{window_id} into CreateResult) - Add src_target field + src_args() helper on Operation for the -s source of dual-target commands; serialize handles src_target like target - Wire ops/exports; extend the catalog kind-enumeration doctest - Add test_pane_ops.py (NamedTuple + test_id render/round-trip cases plus live tmux coverage)

why: Window-level parity was missing the verbs the ORM uses to navigate and rearrange windows, so engine-driven callers could not select, move, or relink windows. what: - Add select-window, last-window, next-window, previous-window, resize-window, rotate-window, respawn-window, unlink-window - Add swap-window, move-window, link-window (dual-target, via -s src_target) - Wire ops/exports; extend the catalog kind-enumeration doctest - Add test_window_ops.py (NamedTuple + test_id render/round-trip cases plus live navigation/swap/move/unlink coverage)

why: Engine-driven callers had no typed way to drive the tmux server lifecycle or write options, environment, and hooks -- the write side of the options surface that show-options already read. what: - Add start-server, kill-server, run-shell, source-file, suspend-client lifecycle ops - Add set-option, set-window-option (the write counterpart to show-options), set-environment, set-hook - Wire ops/exports; extend the catalog kind-enumeration doctest - Add test_lifecycle_ops.py (NamedTuple + test_id render/round-trip cases plus live option/env/hook/run-shell/source-file coverage)

why: The paste-buffer family the ORM uses for clipboard interchange had no typed operations, leaving buffer set/load/save/paste outside the engine-driven surface. what: - Add set-buffer, delete-buffer, load-buffer, save-buffer, paste-buffer ops - Add show-buffer read op + ShowBufferResult.text (buffer contents) - Wire ops/results/exports; extend the catalog kind-enumeration and registry readonly doctests - Add test_buffer_ops.py (NamedTuple + test_id render/round-trip cases plus a live set/show/save/delete and load/paste round-trip)

why: The experimental page described operations and the catalog but not how to run them or compose multi-step plans, leaving the engine choice and planner A/B story undocumented. what: - Add "Running an operation" (run/arun, raise_for_status policy) - Add "Choosing an engine" (engine table, create_engine, async peers) - Add "Lazy plans and planners" (LazyPlan slot refs, >> chaining, Sequential/Folding/Marked planners) - All examples are executable doctests via the in-memory ConcreteEngine

why: Record the experimental operations/engines layer for the upcoming release so the unreleased section tracks what landed. what: - Add a "What's new" deliverable under the unreleased 0.59.x section for the experimental operations and engines layer (#690) - Defer the release lead paragraph until the version is cut

why: An adversarial review of the new ops against tmux's command grammar found two defects: move-window could not request its kill-on-collision behavior, and paste-buffer's -r flag was documented as a space replacement it never performs. what: - MoveWindow: add kill (-k) field; tmux move-window's option string is "abdkrs:t:" and -k replaces any window already at the destination index - PasteBuffer: rename no_format to no_replace and fix the docstring; -r keeps linefeeds instead of converting them to the default carriage-return separator (it has nothing to do with spaces) - Add render cases for move-window -k/-r and paste-buffer -r

why: SendKeys(literal=True, enter=True) rendered 'send-keys -l <keys> Enter', but tmux's -l sends every arg literally, so "Enter" was typed as five characters and the line was never submitted. what: - __post_init__ raises ValueError on literal+enter (fail closed); the correct pattern is two operations - Document the constraint on the enter parameter - Add parametrized test_send_keys_literal_enter_guard

why: DisplayMessageResult.text took only stdout[0], silently dropping all but the first line of a multi-line display-message format. what: - Join all stdout lines into .text (matching ShowBuffer); single-line output is unchanged - Add a multi-line parse case to test_read_breadth

why: subprocess/asyncio/imsg each folded has-session's stderr into stdout but control mode did not, so HasSession's result diverged by engine. The fold is a has-session concern, not an engine concern. what: - HasSession._make_result surfaces stderr[0] in stdout when stdout is empty, so every engine yields a consistent result - Remove the per-engine `"has-session" in cmd` fold from subprocess, asyncio, imsg; soften the subprocess/asyncio docstrings accordingly - Add test_has_session_folds_stderr_to_stdout

why: ConcreteEngine is stateless, so has-session (and other existence queries) always report success -- HasSession.exists is always True through it. That surprise should be documented. what: - Add a Notes section to ConcreteEngine documenting the stateless simulation and that queries like has-session need a live engine

why: imsg _connect created the socket inside the try whose except calls sock.close(); if socket() itself failed (e.g. fd exhaustion), sock was unbound and the handler raised UnboundLocalError, masking the real OSError. what: - Create the socket before the try so the except only runs once sock exists - Add test_imsg_connect_socket_failure_raises_oserror (monkeypatched socket.socket)

why: If the tmux server closed the socket right after MSG_EXIT (before MSG_EXITED), recv_frame raised ImsgProtocolError, which run() did not catch -- so a normal command exit became an exception, diverging from the subprocess engine. what: - Catch ImsgProtocolError around recv_frame; once seen_exit is set, treat a clean close as the end and return the computed exit result

why: _run_socket_command duplicated stdin/stdout fds for SCM_RIGHTS transfer, but if building the identify frames or opening the transport raised before send_frames ran, those descriptors leaked (send_frames only closes the fds once it owns the frames). what: - Close the dup'd fds if codec.identify_messages or the transport constructor raises; once send_frames runs it owns/closes them

tony · 2026-06-22T01:30:30Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

why: The v8 identify burst sent MSG_IDENTIFY_LONGFLAGS twice -- a byte-identical copy-paste in the initial codec. A real tmux client sends it once; the duplicate is harmless (the server sets the flags idempotently) but is redundant wire traffic. what: - Drop the duplicate MSG_IDENTIFY_LONGFLAGS frame in ProtocolV8Codec.identify_messages - Add a parametrized regression test asserting each identify frame type is emitted the expected number of times (LONGFLAGS once)

tony · 2026-06-22T01:56:26Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

why: The async control-mode tests only dispatched single-command operations on happy paths, leaving the reader's multi-block correlation, batch pipelining, lifecycle short-circuits, and the error-as-data policy uncovered. what: - Cover %output and no-% notification parsing - Test run_batch([]) short-circuit and aclose-before-start no-op - Assert for_server threads the live socket into server_args - Pipeline two requests through one run_batch call - Fold a ; chain via FoldingPlanner (exercises expected>1 blocks) - Confirm a rejected command returns a failed result, no raise

why: Declarative workspace builders (tmuxp-style) need injected setup commands kept out of shell history. tmux has no native flag; the convention is a leading space honored by HISTCONTROL=ignorespace. what: - Add suppress_history field (default False, opt-in, no behavior change) - Prepend a space to the keys when set and not literal - Document the convention and add a render doctest

why: A declarative WorkspaceBuilder must target the first pane of a created window (e.g. to focus it) without the caller handling ids. The implicit first pane had no captured id, so it could only be addressed as the active pane -- which moves once the window is split. what: - NewSession.capture_panes / NewWindow.capture_pane (opt-in): emit a multi-id -F so the result also carries first_window_id/first_pane_id - CreateResult gains first_window_id/first_pane_id + created_subids - SlotRef.part ("self"/"window"/"pane") + .window/.pane sub-refs; the plan binds created_subids so a sub-ref resolves to its captured id - ConcreteEngine fabricates one id per #{*_id} token (single-token formats unchanged, preserving existing fabricated-id sequences)

why: tmuxp-style workspace creation needs a structural, declarative object language (à la SQLAlchemy Declarative on Core) -- declare the shape of a session/windows/panes and let a compiler lower it to Core operations, engine- and sync/async-neutral, instead of hand-driving ops. what: - New experimental.workspace package: analyzer (tmuxp YAML/dict -> IR), ir (Workspace/Window/Pane specs), compiler (spec -> Core LazyPlan, wiring first-pane sub-refs so the user never handles an id), runner (build/abuild over any engine + host steps + idempotent replace), confirm (live structure diff) - Workspace.compile()/build()/abuild(); on_exists error|replace|reuse - Robust QA: offline (op order, plan serialize round-trip, 3-way planner equivalence) + live (rich 3-window build over subprocess and async-control: names/order, pane counts, focus, options, env, cwd, commands)

why: The MCP tier needs to round-trip a plan's forward-ref bindings through JSON (tuple keys are not JSON-native) and to dry-run a plan's argv without an engine. what: - serialize.bindings_to_dict / bindings_from_dict ((slot, part) <-> "slot:part") - LazyPlan.preview(): render each op's argv, None for unresolved SlotRefs

why: Expose the typed-operations Core + Declarative tiers as a typed, chained, toolable command surface for agents, without coupling the library to any MCP framework. what: - experimental.mcp: ToolDescriptor/ParamDescriptor + OperationToolRegistry (per-op descriptors generated from the registry), an optional-pydantic schema builder, and TargetResolver (string/dict -> typed Target) - plan tools: preview_plan (dry-run), execute_plan (+ bindings), result_schema introspection, build_workspace - curated vocabulary: intuitive named tools mirroring libtmux's ORM (create_session/split_pane/send_input/list_*/kill_*/...) -> typed results - pure (ConcreteEngine) + live tmux tests; no fastmcp dependency

why: Prove the projection drives a real MCP server and give downstream servers a one-call binding -- behind an optional extra so the core stays dependency-free. what: - fastmcp_adapter.build_server(engine): register the curated vocabulary as typed FastMCP tools (engine bound out of the schema, safety -> ToolAnnotations) - add fastmcp to a new [project.optional-dependencies] mcp extra - in-process tests (offline + live) call the tools via fastmcp's Client

why: The fastmcp adapter only exposed the curated vocabulary -- agents had no access to the full operation set or to plan composition, and the server could not be launched on its own. what: - Register one op_<kind> tool per operation via a dynamic-schema Tool subclass (engine + descriptor on PrivateAttr, explicit parameters schema), re-injecting target/src_target at the adapter edge; tagged per-op and hidden by default (expose_operations=True reveals them) - Register plan tools (preview_plan/execute_plan/result_schema/ build_workspace) taking serialized operations + a planner name - Add build_server flags: include_operations/expose_operations/ include_plan_tools - Make the server runnable: main()/default_server(), __main__.py, fastmcp.json, and a libtmux-engine-mcp console script - Extend adapter tests: offline per-op/plan/workspace, default-server, --help exit, live plan execution

why: To test and dogfood the MCP server in local agent CLIs, we need a way to point Claude/Codex/Cursor/Gemini at this checkout instead of a pinned release. what: - Port scripts/mcp_swap.py from libtmux-mcp (PEP 723 uv-script, tomlkit): detect/status/use-local/revert with timestamped backups, dry-run, and Claude user/project scopes - Derive the server slug from the [project.scripts] entry (libtmux-engine-mcp -> libtmux-engine) instead of project.name, so it stays distinct from a sibling libtmux server; a strict generalization (libtmux-mcp still resolves to libtmux) - Namespace the swap state dir libtmux-engine-mcp-dev - Add tests: console-script registration (always-on) + slug/local-spec derivation (tomlkit-gated)

why: fastmcp is only the optional `mcp` extra, so a plain `uv sync` prunes it -- which silently turns `uv run mypy` red and makes every fastmcp adapter test importorskip away. The committed adapter's green gate depended on fastmcp happening to be installed. what: - Add fastmcp + tomlkit to the dev and testing dependency-groups so the standard gate type-checks and runs the adapter + mcp_swap tests (fastmcp also stays the `mcp` extra for end users) - Add --ignore=docs/_build to pytest addopts: `docs` is a testpath, so a stale built-HTML tree poisons collection (the gate's rm docs/_build first-step was the only guard) - Reformat ops/plan.py (pre-existing blank-line drift surfaced by ruff) - uv.lock: add tomlkit (no other version churn)

why: The Declarative WorkspaceBuilder tier had thin coverage -- a single analyzer shorthand case, and nothing exercising the compiler's host-step schedule or the runner's on_exists preflight policy. Lock in those behaviors so a regression in the Declarative-to-Core lowering or the host-side orchestration is caught. what: - Analyzer/IR (offline): dimensions in both [x, y] and {width, height} forms; shell_command shorthand (string / list / {cmd} items); the None-pane and unsupported-pane TypeError paths; non-mapping-YAML rejection; session-field passthrough; per-pane orchestration fields; and Pane.commands run-form normalization - Compiler (offline): dimensions threaded into new-session -x/-y; env/option/window-option ops emitted with their values; the before_script and pane sleep_before/sleep_after host-step schedule asserted off the pure op spine (anchored by send-keys position, not literal index); first-window reuse vs create-the-rest; and Workspace.compile() == compile_full().plan - Runner/confirm (live tmux): before_script runs as a host step in start_directory; on_exists='reuse' short-circuits to an empty-but-ok result leaving the session untouched while 'error' raises FileExistsError; and confirm() flags a structural mismatch

why: The swap tool covered Claude / Codex / Cursor / Gemini but not the Grok or Antigravity (agy) CLIs, so a local-checkout swap could not reach two installed agents. Extending the registry lets one use-local repoint the tmux MCP across all six. what: - Register grok (~/.grok/config.toml, TOML "mcp_servers" table, same shape as codex) and agy/Antigravity (~/.gemini/antigravity/mcp_config.json, JSON "mcpServers", same shape as cursor/gemini) in CLIName / ALL_CLIS / CLIS - Route grok through the existing codex branch and agy through the cursor/gemini branch in get_server / set_server / delete_server - Tolerate an empty JSON config in load_config so the swap can seed Antigravity's initially-empty mcp_config.json instead of raising - Note in the docstring that the Antigravity IDE and the agy CLI may read different profiles; only the documented profile path is written - Tests: grok (TOML) and agy (JSON) set/get/delete round-trips, the empty-JSON tolerance, and the registry shapes

why: The experimental MCP exposed only a thin synchronous projection. Agents driving tmux need an intuitive, non-blocking surface that knows which pane they are calling from, resolves "the pane relative to me" in one call, and never silently targets the wrong pane. what: - Refactor the curated vocabulary into an async-first package (session/window/pane/buffer/option/server): each tool is one async def over arun, with a derived sync twin via a sans-I/O trampoline (_bridge.synced) -- a single source of truth per tool. - Expand the lean curated set with high-value verbs and conveniences (grep_pane, capture_active_pane, geometry-resolved relative/corner pane tools, directional select_pane) plus a guarded run_tmux hatch. - Add build_async_server (default AsyncControlModeEngine) awaited on FastMCP's loop; build_server stays the sync wrapper; main() and fastmcp.json go async-first. - Add a live event stream (events.py): a push watch_events tool and a pull tmux://events ring buffer, selected by LIBTMUX_MCP_EVENTS. - Make the surface caller-aware: server name 'tmux' plus steering instructions (when/anti-triggers/concrete-id rule); CallerContext reads TMUX_PANE/TMUX from the server's own env, socket-scoped; get_caller_context anchor; is_caller on list_panes/search_panes rows; the relative tools default to and require the caller pane origin; capture_relative_pane/grep_relative_pane/search_panes. - Reject relative special targets ({up-of}/{down-of}/...) on capture, grep, send, and destructive pane tools with a hint pointing to the relative tools; anchor specials ({marked}/{last}) pass through. - Cover with experimental tests + doctests (no pytest-asyncio).

why: The round-2 caller-awareness read only the MCP's own environment, which real launchers (agent -> uv -> python child) strip -- so the caller pane was undiscoverable and the whole surface went inert. what: - Add CallerContext.discover(): process-env -> explicit override (LIBTMUX_MCP_CALLER_PANE/TMUX) -> a bounded, same-uid Linux /proc parent-process env walk (vocabulary/_proc.py). Fail-closed (never raises), env-minimised to TMUX/TMUX_PANE, depth-capped, and injectable so it is unit-testable without /proc; records the discovery source. - Bind the default engine to the discovered caller's -S socket when no explicit override (--socket-path/--socket-name/--no-caller-socket, $LIBTMUX_SOCKET*), so a stripped-env MCP still drives the user's own tmux server instead of a fresh default one. - Add a conservative socket comparator (socket_could_match / is_conservative_caller) alongside the strict one: the strict stays on the is_caller annotation and origin resolution; the conservative, fail-safe one guards destructive ops. - Refuse self-kill: kill_pane / respawn_pane / kill_window / kill_session (and the others=True siblings) decline the pane, window, or session running this MCP, with a hint to act manually. - Thread the discovered context to the tool bodies by stashing it on the engine (read by caller_of); SyncToAsyncEngine delegates server_args / _caller_context so the sync surface sees the same identity. - Cover with /proc parser, discovery precedence, comparator, and self-kill tests (no pytest-asyncio); fix the stale resolve_relative_pane active-pane-fallback docstring.

why: An adversarial review of the caller-discovery work surfaced fail-unsafe and over-broad edges in the new guards. what: - guard_self_kill and the others=True sibling guards now resolve the caller's own pane to its window/session through a fail-safe helper: a caller pane absent from the engine's server is not a self-kill, so a kill on an unrelated target no longer raises a raw tmux error. - Scope the ambient (socket=None) branch of both comparators to a process-env caller, so a parent-walked caller is not matched to an unbound default engine (which would mis-target resolve_origin reads and over-refuse kills under --no-caller-socket). - Wrap socket_matches' realpath comparisons (a $TMUX-controlled path can raise) so the read-only tools degrade instead of crashing. - Guard the op_* per-operation kill/respawn surface too, closing the bypass when --operations is enabled. - Name the caller pane in the others=True refusal hints; correct the get_caller_context docstring; document the socket/caller precedence in --help; make the adapter tests hermetic (no host /proc walk).

why: Agents need to know when a command in a pane finishes without hard-coding a needle (regex/sentinel) the tool must guess, and without blocking the server. tmux stops emitting %output the instant a pane goes quiet, so idle-since-last-%output is a structural signal the agent interprets via the captured chunk plus pane metadata. what: - Add _settle.py pure core: decode_output (tmux octal), output_payload (per-pane filter, split not join so inner whitespace survives), and accumulate_until_settle (settle/byte/time/end fold over an injected async stream + clock). All doctested; no I/O, no fastmcp. - Add wait_for_output edge tool in events.py: folds decoded %output to a frozen MonitorResult, reads DoneMetadata (pane_dead/status, pane_current_command) so the agent disambiguates finished vs blocked. ctx.info for live partials; aclosing for cancellation safety; each call runs in its own task. - _ensure_attached: a bare tmux -C client emits no %output until attach-session, so attach (sticky per engine) before folding; raise on a failed attach so a stale session never yields a silent capture. - Tests: pure settle unit + cancellation (test_settle.py); offline integration + attach/dropped/done coverage (test_events.py); live end-to-end against real tmux (test_monitor_live.py).

why: A capable agent asked to run tests in a pane and wait fell back to sleep + capture_pane polling -- it never found wait_for_output because the tool's surface said "watch pane output / settles / needle-free", not the agent's intent "run a command and wait for it to finish". The capability shipped in round 4; this surfaces it. what: - _instructions(): add a run-a-command-and-wait paragraph naming the split_pane/send_input -> wait_for_output workflow, the test/build use case, the "prefer over sleep + capture_pane polling" steer, and the "settled is not success" caveat. Gate it (events_enabled) so the sync server never names a tool it does not register. - wait_for_output: enrich description= with discovery vocabulary (completion, exit/return code, success/failed) and add a NumPy Parameters section -- FastMCP parses it into per-param schema descriptions even when description= is overridden. - docs: rename the colliding sync polling helper wait_for_output -> wait_for_text and point agents at the event-backed tool. - tests: lock the discoverable wording -- instructions name the tool + workflow + anti-polling steer; tool metadata carries the vocab + per param descriptions; events=off omits the live-output guidance.

why: The self-kill guards left two known gaps: the per-op (op_*) kill surface skipped the others=True sibling case, and the conservative socket match relied on path reconstruction that diverges on macOS. what: - Add an authoritative conservative_socket() that queries the engine's #{socket_path} for a -L name or ambient socket, so the guard's socket scoping survives a macOS $TMUX_TMPDIR divergence; an explicit -S path is used as-is. - Lift the others=True guards (guard_kill_other_panes / _windows) into _resolve so the curated tools and guard_destructive_op share them; the per-op op_kill_pane/op_kill_window now refuse killing the caller's sibling pane/window too. - Cover with offline (conservative_socket) and live (curated + per-op others=True) regression tests.

why: An adversarial review found the needle-free monitor could falsely report a command "settled" and could raise or mislead when the watched pane died -- the exact cases the design targets. what: - Make AsyncControlModeEngine.subscribe() a true broadcast: each subscriber gets its own queue, so wait_for_output, watch_events, and poll_events no longer steal each other's %output frames (which caused a premature settle with truncated text). - Make wait_for_output fail-safe when the watched pane is gone: _read_done no longer raises or fabricates pane_dead=False on a blank or fallback-pane probe (pane_dead becomes Optional/unknown, keyed on #{pane_id}), and the settle snapshot capture is guarded; the result is preserved. - Add a derived exit_code to MonitorResult. - Supervise the pull ring drainer (aclosing + recorded error surfaced via poll_events) so a reader failure cannot silently freeze it. - Regression test: two concurrent subscribers each see every event.

tony added the enhancement label Jun 21, 2026

tony added 4 commits June 21, 2026 09:08

tony changed the title ~~Typed operations and engines: inert op spine (#689)~~ Typed operations and engines: spine + 4 engines + facades (#689) Jun 21, 2026

tony changed the title ~~Typed operations and engines: spine + 4 engines + facades (#689)~~ Typed operations and engines Jun 21, 2026

tony added 9 commits June 21, 2026 09:57

tony changed the title ~~Typed operations and engines~~ Typed operations & engines: spine, 6 engines, plans, models, facades (#689) Jun 21, 2026

tony added 11 commits June 21, 2026 12:01

tony added 7 commits June 21, 2026 18:49

tony added 10 commits June 22, 2026 17:12

tony force-pushed the engine-ops branch from 49768fd to 2904003 Compare June 23, 2026 02:48

tony added 9 commits June 23, 2026 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
tony wants to merge 73 commits into
masterfrom
engine-ops

tony commented Jun 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

tony commented Jun 22, 2026

Uh oh!

tony commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tony commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's delivered

Testing

Design notes

Uh oh!

codecov Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tony commented Jun 22, 2026

Code review

Uh oh!

tony commented Jun 22, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tony commented Jun 21, 2026 •

edited

Loading

codecov Bot commented Jun 21, 2026 •

edited

Loading