Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
Open
tony wants to merge 73 commits into
Open
Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690tony wants to merge 73 commits into
tony wants to merge 73 commits into
Conversation
why: Operationalizes the typed-operations/engines architecture
(issues 688, 689) with the pure substrate that was absent from every
prototype branch: an inert, statically-typed operation value that
renders tmux commands, carries its result type, and serializes without
a live tmux server. Engines stay transport-agnostic over it. None of
this touches or changes existing public APIs.
what:
- Add libtmux.experimental.{ops,engines} packages (experimental, not
under the versioning policy)
- ops: frozen Operation[ResultT] with class-level metadata as the
single source of truth; pure render() with declarative version gating
(LooseVersion); build_result() adapting raw output to typed results
- ops: typed Result base + raise_for_status() (CPython/requests
precedent), SplitWindowResult/CapturePaneResult payloads
- ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/
IndexRef/Special/SlotRef) with fail-closed validation
- ops: fail-closed OperationRegistry keyed by kind, with OpSpec views
and predicate listing; stdlib dict serialization with round-trips
- ops: four seed operations (split-window, capture-pane, send-keys,
select-layout) registered via @register
- engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/
CommandResult, EngineSpec; run()/arun() execute bridge sharing one
render/build path (sync vs await is the only divergence)
- tests: 111 pure, fixture-parametrizable unit tests + doctests, all
runnable without a tmux server
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #690 +/- ##
===========================================
+ Coverage 51.30% 72.42% +21.12%
===========================================
Files 25 192 +167
Lines 3487 10873 +7386
Branches 686 1431 +745
===========================================
+ Hits 1789 7875 +6086
- Misses 1403 2414 +1011
- Partials 295 584 +289 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
why: Proves the operation/result contract is transport-agnostic -- the same typed result whether produced by a real tmux subprocess or an in-memory simulator -- and provides the offline engine that lets ops doctests and tests run without a tmux server (issue 689 phases 2-3). what: - engines.subprocess: classic SubprocessEngine mirroring tmux_cmd (has-session stderr fold, backslashreplace, trailing-blank strip; tmux failure returned as data, only missing binary raises), with for_server() deriving -L/-S/-f/-2 flags from a live Server - engines.concrete: deterministic in-memory engine (fabricated pane/ window/session ids, canned capture lines) for tests and docs - engines.registry: name-keyed engine registry (register/create/ available), seeded with subprocess + concrete - tests/experimental/contract: engine-agnostic operation contract run offline via concrete, plus classic-vs-concrete parity against a real tmux server (same result type + argv, payload may differ)
why: Completes the sync/async-symmetric execution story plus the deferred-execution and documentation mechanisms from issue 689 (phase 5 + docs), still without touching any existing API. what: - engines.asyncio: real AsyncSubprocessEngine on create_subprocess_exec (terminates the child on cancellation; not a thread wrapper), mirroring the classic engine's output handling so it returns the same typed result - ops.plan: LazyPlan records operations without touching tmux and resolves SlotRef forward refs at execute time via a sans-I/O generator; sync execute() and async aexecute() share one resolution core (run vs await arun is the only divergence); whole-plan serialization round-trips - ops.catalog: registry-driven CatalogEntry list (scope, version gates, effects, safety, result type, summary) -- the single source a docs domain renders, so runtime and docs cannot drift - tests: lazy resolution sync+async, plan serialization, catalog coverage, async-vs-sync classic parity against a real tmux server
why: Proves control mode is just another engine returning the same typed result (issue 689 phase 4) -- an operation run over a persistent tmux -C connection is indistinguishable, at the result level, from one run via fork-per-call subprocess. what: - engines.control_mode: ControlModeEngine over one persistent tmux -C connection; run_batch pipelines commands and parses each command's %begin/%end/%error block into a CommandResult; selectors-based nonblocking reads with timeout; startup-ACK discard; lifecycle via close()/context manager (lock-guarded teardown) - engines.control_mode: I/O-free ControlModeParser, unit-testable without tmux, adapted from the chain runner + protocol-engines parser - register control_mode in the engine registry and export it - tests: pure parser tests + real-tmux contract (split creates a real pane, batched commands, control-vs-concrete parity)
why: Demonstrates the "mode lives in the type" model from issue 689 -- EagerPane.split() returns a live EagerPane while LazyPane.split() returns a deferred LazyPane, each a single statically-known return type, both backed by the same SplitWindow operation. One Pane class with a runtime-bound engine could not type these return values distinctly. what: - facade.pane.EagerPane: executes immediately, returns live handles (split -> EagerPane), typed results for capture/send_keys - facade.pane.LazyPane: records into a LazyPlan, returns deferred handles (split -> LazyPane bound to the new pane's SlotRef), chainable - seed of the wider Server/Session/Window/Pane/Client x mode matrix - tests: eager live handles, lazy deferral + forward-ref resolution, and same-operation-backs-both-facades parity
why: Closes the two async gaps from issue 689: control mode and concrete had no async sibling. The async control engine is the one async engine that earns its place -- it adds an event stream subprocess cannot -- and prior libtmux/mux control-mode work (surfaced across agent histories via agentgrep, plus the asyncio-2 branches) shaped its correlation design. what: - engines.async_control_mode: AsyncControlModeEngine over a persistent tmux -C (create_subprocess_exec + one reader task). FIFO future correlation with skip-when-empty so unsolicited %begin blocks (hook- triggered commands and the startup ACK) never desync results; the startup ACK is consumed synchronously in start() to close the correlation race our whole-block parser would otherwise have. DEAD state fails pending commands on reader EOF/error. Cancellation via asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded subscribe() notification stream with drop-counting. for_server() helper - engines.control_mode: ControlModeParser now surfaces bare %-notification lines via notifications() (additive; the sync engine ignores them) - engines.concrete: AsyncConcreteEngine sibling over shared simulation; removes the async test shim - ControlNotification typed event value - tests: parser notification/drain; async control vs real tmux (split, pipelined batch, concrete parity, live event stream, lifecycle)
why: Many tmux commands print nothing (rename-window, kill-pane, select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls cmdq_error on failure, framed in control mode as %end vs %error (see tmux cmd-queue.c) -- they never cmdq_print. They still need a typed result that records success/failure without inventing a payload. what: - results.AckResult: a typed acknowledgement (no payload) whose raise_for_status() still surfaces the error path; documents the tmux success/error mapping - retarget send-keys and select-layout to AckResult (both print nothing) - add no-output ops: rename-window (mutating), kill-window and kill-pane (destructive) -- exercising AckResult across scopes and safety tiers - export AckResult and the new ops; refresh the catalog doctest - tests: render + AckResult success/failure across the no-output ops and destructive safety metadata; update classic/control parity assertions
why: A neo-like read model is useful, but neo.Obj is one flat ~200-field class fused to the query/dispatch pipeline. The experimental namespace lets us try a decoupled, immutable, serializable snapshot layer without any risk to the shipped ORM APIs. what: - libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot / SessionSnapshot / ServerSnapshot, each a typed core plus the full raw tmux-format tail in .fields (nothing tmux reported is lost) - from_format() builds one node from a format mapping; ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row set into an ordered session/window/pane tree - to_dict()/from_dict() round-trip the whole tree as plain data, with no live objects - pure tests (no tmux): value coercion, tree grouping/order, round-trip
why: The list/show read commands overlap neo's reader. Rather than touch the ORM, add a parallel typed read surface in experimental.ops that yields immutable models snapshots. The render version must thread into result parsing first, because the -F template is version-gated and the parser must split against the same fields it was rendered with. what: - operation: thread `version` through build_result -> _make_result so payload parsing matches the version-gated render (backward compatible; existing overrides accept and ignore it); execute.run/arun pass it - ops._read: re-export neo.get_output_format / parse_output and formats.FORMAT_SEPARATOR as the single source of truth (no copies) - list-panes / list-windows / list-sessions ops (readonly, chainable=False) render the same -F template neo builds and parse rows into models snapshots - ListPanesResult/.../ store JSON-friendly rows and derive typed views (.panes/.server/.windows/.sessions) via properties, so results serialize and round-trip with no special-casing - tests: -F parity with neo, snapshot-tree build, serialize round-trip, and live list-panes/sessions/windows against a real tmux server
why: The operation catalog is registry-derived data, so rendering it in docs keeps the operation reference from drifting from the code -- and the docs gate then exercises catalog() on every build. what: - docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that walks libtmux.experimental.ops.catalog() and emits a table, with :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty - conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions - docs/experimental.md: an experimental ops/engines overview embedding the catalog (full + readonly + destructive views), in the index toctree
why: The sync control engine skipped tmux's startup ACK with a fragile one-shot flags==0 heuristic and had no defense against hook-emitted %begin/%end blocks, so a stray block could desync request->result alignment. The async engine already handles this; backport the approach. what: - consume the startup ACK synchronously at connect (_consume_startup), dropping the one-shot _startup_ack_pending heuristic, so the startup block can never be conflated with a command's result block - drain buffered unsolicited blocks before each batch (_drain_unsolicited), so a hook-triggered command's block left over from a prior call is not mis-attributed to the next command - drain notifications during reads to keep the parser buffer bounded - regression test: many sequential commands stay aligned (first result is real; each call drains before reading its own block) A hook firing mid-pipelined-batch still needs per-command number correlation to disambiguate; single-command run() is robust.
why: The chainable-commands prototype folds independent commands into one "tmux a ; b" dispatch. Our typed-op model is a better host for it -- the Operation already carries a `chainable` classvar and the result Status already reserves `skipped` for exactly the chain-drop case. So yes, lazy mode can adopt the prototype's chainability. what: - mark output/creation ops non-chainable (capture-pane, split-window; list-* already were) so a fold never drops captured data or an id - ops._chain: render_chain (join chainable ops with standalone ';', escaping a trailing-';' arg), ensure_chainable (fail closed), and attribute -- splitting one merged ';'-chain result into a typed result per op (success -> all complete; failure -> first failed, rest skipped, matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then - Operation.__rshift__/then compose into an OpChain; result_with_status() builds a result with an explicit status (skipped/failed attribution) - LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of chainable, resolved ops dispatch once via engine.run; the sans-I/O _drive yields _Single or _Chain so sync and async share the core; add_chain() records an OpChain - tests: >> composition, render_chain, fold=one dispatch, fold-off=N dispatches, failure attribution, creators stay unfolded, add_chain
why: Extend the mode-in-the-type facades beyond the pane seed so a typed return value distinguishes eager/lazy/async across scopes -- and add the few creation ops the cross-scope navigation needs. what: - ops: NewWindow / NewSession (CreateResult, capture the new id), KillSession, RenameSession; generalize binding capture via Result.created_id (base None; SplitWindowResult -> new_pane_id; CreateResult -> new_id) so lazy plans bind window/session creations too - facade: eager Server -> Session -> Window -> Pane navigation (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops. Control mode stays an engine choice, not a separate facade family - EagerServer.for_server() binds the classic engine to a live Server - tests: offline navigation across scopes/modes (concrete engine), and a live eager Server -> Session -> Window -> Pane build against real tmux with cleanup
why: The native binary peer-protocol engine is the strongest proof the
operation/result contract is transport-agnostic -- the same typed
CommandResult whether produced by a subprocess, tmux -C, or by speaking
tmux's imsg protocol directly. Research confirmed it is pure-stdlib and
CI-verifiable; the prototype it is ported from only ever tested against a
fake socketpair server, never real tmux.
what:
- port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines:
ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and
ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len,
peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT
handshake); posix_spawn local fallback for attach / start-server /
no-server-running
- adapt to the experimental tuple CommandResult (drop the process field);
add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion)
and select the v8 codec directly; keep the version-mismatch retry
- register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX
is only touched at runtime; tests skip without it)
- tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the
live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine
return identical stdout/returncode for read-only commands against a
real tmux server (runs across the CI tmux matrix)
why: Finish the mode-in-the-type matrix so every tmux scope has eager/lazy/async facades, and add the client-scoped ops a Client facade needs. The matrix is now 5 scopes x 3 modes, all over the shared spine. what: - ops: detach-client, refresh-client, switch-client (AckResult, client scope; switch-client renders -c/-t rather than the generic target) - facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server binds the async engine to a live Server - tests: a lazy full Server->Session->Window->pane plan, async navigation, and eager/lazy/async client methods
why: The pre-commit gate now runs `uv run ty check`, so ty must be a configured dev tool. Brings the ty setup from the add-ty-type-checker branch and makes the experimental tree ty-clean. what: - add `ty` to the dev dependency group (uv.lock updated) - add [tool.ty] (environment py3.10, src=src/tests) with the documented rule ignores for known ty false positives, ported verbatim - fixes ty surfaced in experimental: Target is now a real union (ty rejects an implicit two-string type alias); OperationRegistry.list -> select so the `-> list[OpSpec]` return annotation is not shadowed by the method name
why: Make lazy-plan dispatch strategy pluggable and A/B-testable, and add
the chainable-commands {marked} lone-pane single-dispatch optimization
the plain ;-fold lacked.
what:
- ops.planner: Planner Protocol + PlanStep; SequentialPlanner (one
dispatch per op), FoldingPlanner (;-fold maximal chainable runs),
MarkedPlanner (fold a pane creation + the chainable ops decorating its
slot into one "split -P -F ; select-pane -m ; ... -t {marked} ;
select-pane -M" dispatch)
- _chain: render_marked / attribute_marked
- LazyPlan.execute/aexecute take planner= (default SequentialPlanner),
replacing fold=bool; _drive consumes the planner's PlanStep units and
stays sans-I/O so sync and async share it
- tests (NamedTuple + test_id): planner dispatch counts 3/2/1 with an
identical PlanResult, marked single-dispatch rendering + fallback, and
a live {marked} fold against a real tmux server
why: The read seam only covered the list-* family, leaving common queries (existence, format evaluation, option dumps, attached clients) outside the typed operation/result model. what: - Add has-session, display-message, show-options, list-clients ops, each rendering inert argv and parsing tmux output into a typed result - Add HasSessionResult.exists, DisplayMessageResult.text, ShowOptionsResult.options, ListClientsResult.clients result types - Add ClientSnapshot model (a leaf view, not part of the tree) - has-session maps rc != 0 to exists=False (a valid answer, not failure) - Wire ops/results/snapshot exports; update enumerating doctests/tests - Add test_read_breadth.py (NamedTuple + test_id render/parse/round-trip cases plus live tmux coverage)
why: The operation surface lacked the pane verbs the ORM relies on
(select/resize/swap/break/join/move/respawn/pipe/clear-history),
blocking pane-level parity for engine-driven callers.
what:
- Add select-pane, last-pane, resize-pane, respawn-pane, pipe-pane,
clear-history (single-target) ops
- Add swap-pane, join-pane, move-pane (dual-target) and break-pane
(creates a window, captures #{window_id} into CreateResult)
- Add src_target field + src_args() helper on Operation for the -s
source of dual-target commands; serialize handles src_target like
target
- Wire ops/exports; extend the catalog kind-enumeration doctest
- Add test_pane_ops.py (NamedTuple + test_id render/round-trip cases
plus live tmux coverage)
why: Window-level parity was missing the verbs the ORM uses to navigate and rearrange windows, so engine-driven callers could not select, move, or relink windows. what: - Add select-window, last-window, next-window, previous-window, resize-window, rotate-window, respawn-window, unlink-window - Add swap-window, move-window, link-window (dual-target, via -s src_target) - Wire ops/exports; extend the catalog kind-enumeration doctest - Add test_window_ops.py (NamedTuple + test_id render/round-trip cases plus live navigation/swap/move/unlink coverage)
why: Engine-driven callers had no typed way to drive the tmux server lifecycle or write options, environment, and hooks -- the write side of the options surface that show-options already read. what: - Add start-server, kill-server, run-shell, source-file, suspend-client lifecycle ops - Add set-option, set-window-option (the write counterpart to show-options), set-environment, set-hook - Wire ops/exports; extend the catalog kind-enumeration doctest - Add test_lifecycle_ops.py (NamedTuple + test_id render/round-trip cases plus live option/env/hook/run-shell/source-file coverage)
why: The paste-buffer family the ORM uses for clipboard interchange had no typed operations, leaving buffer set/load/save/paste outside the engine-driven surface. what: - Add set-buffer, delete-buffer, load-buffer, save-buffer, paste-buffer ops - Add show-buffer read op + ShowBufferResult.text (buffer contents) - Wire ops/results/exports; extend the catalog kind-enumeration and registry readonly doctests - Add test_buffer_ops.py (NamedTuple + test_id render/round-trip cases plus a live set/show/save/delete and load/paste round-trip)
why: The experimental page described operations and the catalog but not how to run them or compose multi-step plans, leaving the engine choice and planner A/B story undocumented. what: - Add "Running an operation" (run/arun, raise_for_status policy) - Add "Choosing an engine" (engine table, create_engine, async peers) - Add "Lazy plans and planners" (LazyPlan slot refs, >> chaining, Sequential/Folding/Marked planners) - All examples are executable doctests via the in-memory ConcreteEngine
why: Record the experimental operations/engines layer for the upcoming release so the unreleased section tracks what landed. what: - Add a "What's new" deliverable under the unreleased 0.59.x section for the experimental operations and engines layer (#690) - Defer the release lead paragraph until the version is cut
why: An adversarial review of the new ops against tmux's command grammar found two defects: move-window could not request its kill-on-collision behavior, and paste-buffer's -r flag was documented as a space replacement it never performs. what: - MoveWindow: add kill (-k) field; tmux move-window's option string is "abdkrs:t:" and -k replaces any window already at the destination index - PasteBuffer: rename no_format to no_replace and fix the docstring; -r keeps linefeeds instead of converting them to the default carriage-return separator (it has nothing to do with spaces) - Add render cases for move-window -k/-r and paste-buffer -r
why: SendKeys(literal=True, enter=True) rendered 'send-keys -l <keys> Enter', but tmux's -l sends every arg literally, so "Enter" was typed as five characters and the line was never submitted. what: - __post_init__ raises ValueError on literal+enter (fail closed); the correct pattern is two operations - Document the constraint on the enter parameter - Add parametrized test_send_keys_literal_enter_guard
why: DisplayMessageResult.text took only stdout[0], silently dropping all but the first line of a multi-line display-message format. what: - Join all stdout lines into .text (matching ShowBuffer); single-line output is unchanged - Add a multi-line parse case to test_read_breadth
why: subprocess/asyncio/imsg each folded has-session's stderr into stdout but control mode did not, so HasSession's result diverged by engine. The fold is a has-session concern, not an engine concern. what: - HasSession._make_result surfaces stderr[0] in stdout when stdout is empty, so every engine yields a consistent result - Remove the per-engine `"has-session" in cmd` fold from subprocess, asyncio, imsg; soften the subprocess/asyncio docstrings accordingly - Add test_has_session_folds_stderr_to_stdout
why: ConcreteEngine is stateless, so has-session (and other existence queries) always report success -- HasSession.exists is always True through it. That surprise should be documented. what: - Add a Notes section to ConcreteEngine documenting the stateless simulation and that queries like has-session need a live engine
why: imsg _connect created the socket inside the try whose except calls sock.close(); if socket() itself failed (e.g. fd exhaustion), sock was unbound and the handler raised UnboundLocalError, masking the real OSError. what: - Create the socket before the try so the except only runs once sock exists - Add test_imsg_connect_socket_failure_raises_oserror (monkeypatched socket.socket)
why: If the tmux server closed the socket right after MSG_EXIT (before MSG_EXITED), recv_frame raised ImsgProtocolError, which run() did not catch -- so a normal command exit became an exception, diverging from the subprocess engine. what: - Catch ImsgProtocolError around recv_frame; once seen_exit is set, treat a clean close as the end and return the computed exit result
why: _run_socket_command duplicated stdin/stdout fds for SCM_RIGHTS transfer, but if building the identify frames or opening the transport raised before send_frames ran, those descriptors leaked (send_frames only closes the fds once it owns the frames). what: - Close the dup'd fds if codec.identify_messages or the transport constructor raises; once send_frames runs it owns/closes them
Member
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
why: The v8 identify burst sent MSG_IDENTIFY_LONGFLAGS twice -- a byte-identical copy-paste in the initial codec. A real tmux client sends it once; the duplicate is harmless (the server sets the flags idempotently) but is redundant wire traffic. what: - Drop the duplicate MSG_IDENTIFY_LONGFLAGS frame in ProtocolV8Codec.identify_messages - Add a parametrized regression test asserting each identify frame type is emitted the expected number of times (LONGFLAGS once)
Member
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
why: The async control-mode tests only dispatched single-command operations on happy paths, leaving the reader's multi-block correlation, batch pipelining, lifecycle short-circuits, and the error-as-data policy uncovered. what: - Cover %output and no-% notification parsing - Test run_batch([]) short-circuit and aclose-before-start no-op - Assert for_server threads the live socket into server_args - Pipeline two requests through one run_batch call - Fold a ; chain via FoldingPlanner (exercises expected>1 blocks) - Confirm a rejected command returns a failed result, no raise
why: Declarative workspace builders (tmuxp-style) need injected setup commands kept out of shell history. tmux has no native flag; the convention is a leading space honored by HISTCONTROL=ignorespace. what: - Add suppress_history field (default False, opt-in, no behavior change) - Prepend a space to the keys when set and not literal - Document the convention and add a render doctest
why: A declarative WorkspaceBuilder must target the first pane of a
created window (e.g. to focus it) without the caller handling ids. The
implicit first pane had no captured id, so it could only be addressed as
the active pane -- which moves once the window is split.
what:
- NewSession.capture_panes / NewWindow.capture_pane (opt-in): emit a
multi-id -F so the result also carries first_window_id/first_pane_id
- CreateResult gains first_window_id/first_pane_id + created_subids
- SlotRef.part ("self"/"window"/"pane") + .window/.pane sub-refs; the
plan binds created_subids so a sub-ref resolves to its captured id
- ConcreteEngine fabricates one id per #{*_id} token (single-token
formats unchanged, preserving existing fabricated-id sequences)
why: tmuxp-style workspace creation needs a structural, declarative object language (à la SQLAlchemy Declarative on Core) -- declare the shape of a session/windows/panes and let a compiler lower it to Core operations, engine- and sync/async-neutral, instead of hand-driving ops. what: - New experimental.workspace package: analyzer (tmuxp YAML/dict -> IR), ir (Workspace/Window/Pane specs), compiler (spec -> Core LazyPlan, wiring first-pane sub-refs so the user never handles an id), runner (build/abuild over any engine + host steps + idempotent replace), confirm (live structure diff) - Workspace.compile()/build()/abuild(); on_exists error|replace|reuse - Robust QA: offline (op order, plan serialize round-trip, 3-way planner equivalence) + live (rich 3-window build over subprocess and async-control: names/order, pane counts, focus, options, env, cwd, commands)
why: The MCP tier needs to round-trip a plan's forward-ref bindings through JSON (tuple keys are not JSON-native) and to dry-run a plan's argv without an engine. what: - serialize.bindings_to_dict / bindings_from_dict ((slot, part) <-> "slot:part") - LazyPlan.preview(): render each op's argv, None for unresolved SlotRefs
why: Expose the typed-operations Core + Declarative tiers as a typed, chained, toolable command surface for agents, without coupling the library to any MCP framework. what: - experimental.mcp: ToolDescriptor/ParamDescriptor + OperationToolRegistry (per-op descriptors generated from the registry), an optional-pydantic schema builder, and TargetResolver (string/dict -> typed Target) - plan tools: preview_plan (dry-run), execute_plan (+ bindings), result_schema introspection, build_workspace - curated vocabulary: intuitive named tools mirroring libtmux's ORM (create_session/split_pane/send_input/list_*/kill_*/...) -> typed results - pure (ConcreteEngine) + live tmux tests; no fastmcp dependency
why: Prove the projection drives a real MCP server and give downstream servers a one-call binding -- behind an optional extra so the core stays dependency-free. what: - fastmcp_adapter.build_server(engine): register the curated vocabulary as typed FastMCP tools (engine bound out of the schema, safety -> ToolAnnotations) - add fastmcp to a new [project.optional-dependencies] mcp extra - in-process tests (offline + live) call the tools via fastmcp's Client
why: The fastmcp adapter only exposed the curated vocabulary -- agents had no access to the full operation set or to plan composition, and the server could not be launched on its own. what: - Register one op_<kind> tool per operation via a dynamic-schema Tool subclass (engine + descriptor on PrivateAttr, explicit parameters schema), re-injecting target/src_target at the adapter edge; tagged per-op and hidden by default (expose_operations=True reveals them) - Register plan tools (preview_plan/execute_plan/result_schema/ build_workspace) taking serialized operations + a planner name - Add build_server flags: include_operations/expose_operations/ include_plan_tools - Make the server runnable: main()/default_server(), __main__.py, fastmcp.json, and a libtmux-engine-mcp console script - Extend adapter tests: offline per-op/plan/workspace, default-server, --help exit, live plan execution
why: To test and dogfood the MCP server in local agent CLIs, we need a way to point Claude/Codex/Cursor/Gemini at this checkout instead of a pinned release. what: - Port scripts/mcp_swap.py from libtmux-mcp (PEP 723 uv-script, tomlkit): detect/status/use-local/revert with timestamped backups, dry-run, and Claude user/project scopes - Derive the server slug from the [project.scripts] entry (libtmux-engine-mcp -> libtmux-engine) instead of project.name, so it stays distinct from a sibling libtmux server; a strict generalization (libtmux-mcp still resolves to libtmux) - Namespace the swap state dir libtmux-engine-mcp-dev - Add tests: console-script registration (always-on) + slug/local-spec derivation (tomlkit-gated)
why: fastmcp is only the optional `mcp` extra, so a plain `uv sync` prunes it -- which silently turns `uv run mypy` red and makes every fastmcp adapter test importorskip away. The committed adapter's green gate depended on fastmcp happening to be installed. what: - Add fastmcp + tomlkit to the dev and testing dependency-groups so the standard gate type-checks and runs the adapter + mcp_swap tests (fastmcp also stays the `mcp` extra for end users) - Add --ignore=docs/_build to pytest addopts: `docs` is a testpath, so a stale built-HTML tree poisons collection (the gate's rm docs/_build first-step was the only guard) - Reformat ops/plan.py (pre-existing blank-line drift surfaced by ruff) - uv.lock: add tomlkit (no other version churn)
why: The Declarative WorkspaceBuilder tier had thin coverage -- a
single analyzer shorthand case, and nothing exercising the compiler's
host-step schedule or the runner's on_exists preflight policy. Lock in
those behaviors so a regression in the Declarative-to-Core lowering or
the host-side orchestration is caught.
what:
- Analyzer/IR (offline): dimensions in both [x, y] and {width, height}
forms; shell_command shorthand (string / list / {cmd} items); the
None-pane and unsupported-pane TypeError paths; non-mapping-YAML
rejection; session-field passthrough; per-pane orchestration fields;
and Pane.commands run-form normalization
- Compiler (offline): dimensions threaded into new-session -x/-y;
env/option/window-option ops emitted with their values; the
before_script and pane sleep_before/sleep_after host-step schedule
asserted off the pure op spine (anchored by send-keys position, not
literal index); first-window reuse vs create-the-rest; and
Workspace.compile() == compile_full().plan
- Runner/confirm (live tmux): before_script runs as a host step in
start_directory; on_exists='reuse' short-circuits to an empty-but-ok
result leaving the session untouched while 'error' raises
FileExistsError; and confirm() flags a structural mismatch
why: The swap tool covered Claude / Codex / Cursor / Gemini but not the Grok or Antigravity (agy) CLIs, so a local-checkout swap could not reach two installed agents. Extending the registry lets one use-local repoint the tmux MCP across all six. what: - Register grok (~/.grok/config.toml, TOML "mcp_servers" table, same shape as codex) and agy/Antigravity (~/.gemini/antigravity/mcp_config.json, JSON "mcpServers", same shape as cursor/gemini) in CLIName / ALL_CLIS / CLIS - Route grok through the existing codex branch and agy through the cursor/gemini branch in get_server / set_server / delete_server - Tolerate an empty JSON config in load_config so the swap can seed Antigravity's initially-empty mcp_config.json instead of raising - Note in the docstring that the Antigravity IDE and the agy CLI may read different profiles; only the documented profile path is written - Tests: grok (TOML) and agy (JSON) set/get/delete round-trips, the empty-JSON tolerance, and the registry shapes
why: The experimental MCP exposed only a thin synchronous projection.
Agents driving tmux need an intuitive, non-blocking surface that knows
which pane they are calling from, resolves "the pane relative to me" in
one call, and never silently targets the wrong pane.
what:
- Refactor the curated vocabulary into an async-first package
(session/window/pane/buffer/option/server): each tool is one async
def over arun, with a derived sync twin via a sans-I/O trampoline
(_bridge.synced) -- a single source of truth per tool.
- Expand the lean curated set with high-value verbs and conveniences
(grep_pane, capture_active_pane, geometry-resolved relative/corner
pane tools, directional select_pane) plus a guarded run_tmux hatch.
- Add build_async_server (default AsyncControlModeEngine) awaited on
FastMCP's loop; build_server stays the sync wrapper; main() and
fastmcp.json go async-first.
- Add a live event stream (events.py): a push watch_events tool and a
pull tmux://events ring buffer, selected by LIBTMUX_MCP_EVENTS.
- Make the surface caller-aware: server name 'tmux' plus steering
instructions (when/anti-triggers/concrete-id rule); CallerContext
reads TMUX_PANE/TMUX from the server's own env, socket-scoped;
get_caller_context anchor; is_caller on list_panes/search_panes rows;
the relative tools default to and require the caller pane origin;
capture_relative_pane/grep_relative_pane/search_panes.
- Reject relative special targets ({up-of}/{down-of}/...) on capture,
grep, send, and destructive pane tools with a hint pointing to the
relative tools; anchor specials ({marked}/{last}) pass through.
- Cover with experimental tests + doctests (no pytest-asyncio).
why: The round-2 caller-awareness read only the MCP's own environment, which real launchers (agent -> uv -> python child) strip -- so the caller pane was undiscoverable and the whole surface went inert. what: - Add CallerContext.discover(): process-env -> explicit override (LIBTMUX_MCP_CALLER_PANE/TMUX) -> a bounded, same-uid Linux /proc parent-process env walk (vocabulary/_proc.py). Fail-closed (never raises), env-minimised to TMUX/TMUX_PANE, depth-capped, and injectable so it is unit-testable without /proc; records the discovery source. - Bind the default engine to the discovered caller's -S socket when no explicit override (--socket-path/--socket-name/--no-caller-socket, $LIBTMUX_SOCKET*), so a stripped-env MCP still drives the user's own tmux server instead of a fresh default one. - Add a conservative socket comparator (socket_could_match / is_conservative_caller) alongside the strict one: the strict stays on the is_caller annotation and origin resolution; the conservative, fail-safe one guards destructive ops. - Refuse self-kill: kill_pane / respawn_pane / kill_window / kill_session (and the others=True siblings) decline the pane, window, or session running this MCP, with a hint to act manually. - Thread the discovered context to the tool bodies by stashing it on the engine (read by caller_of); SyncToAsyncEngine delegates server_args / _caller_context so the sync surface sees the same identity. - Cover with /proc parser, discovery precedence, comparator, and self-kill tests (no pytest-asyncio); fix the stale resolve_relative_pane active-pane-fallback docstring.
why: An adversarial review of the caller-discovery work surfaced fail-unsafe and over-broad edges in the new guards. what: - guard_self_kill and the others=True sibling guards now resolve the caller's own pane to its window/session through a fail-safe helper: a caller pane absent from the engine's server is not a self-kill, so a kill on an unrelated target no longer raises a raw tmux error. - Scope the ambient (socket=None) branch of both comparators to a process-env caller, so a parent-walked caller is not matched to an unbound default engine (which would mis-target resolve_origin reads and over-refuse kills under --no-caller-socket). - Wrap socket_matches' realpath comparisons (a $TMUX-controlled path can raise) so the read-only tools degrade instead of crashing. - Guard the op_* per-operation kill/respawn surface too, closing the bypass when --operations is enabled. - Name the caller pane in the others=True refusal hints; correct the get_caller_context docstring; document the socket/caller precedence in --help; make the adapter tests hermetic (no host /proc walk).
why: Agents need to know when a command in a pane finishes without hard-coding a needle (regex/sentinel) the tool must guess, and without blocking the server. tmux stops emitting %output the instant a pane goes quiet, so idle-since-last-%output is a structural signal the agent interprets via the captured chunk plus pane metadata. what: - Add _settle.py pure core: decode_output (tmux octal), output_payload (per-pane filter, split not join so inner whitespace survives), and accumulate_until_settle (settle/byte/time/end fold over an injected async stream + clock). All doctested; no I/O, no fastmcp. - Add wait_for_output edge tool in events.py: folds decoded %output to a frozen MonitorResult, reads DoneMetadata (pane_dead/status, pane_current_command) so the agent disambiguates finished vs blocked. ctx.info for live partials; aclosing for cancellation safety; each call runs in its own task. - _ensure_attached: a bare tmux -C client emits no %output until attach-session, so attach (sticky per engine) before folding; raise on a failed attach so a stale session never yields a silent capture. - Tests: pure settle unit + cancellation (test_settle.py); offline integration + attach/dropped/done coverage (test_events.py); live end-to-end against real tmux (test_monitor_live.py).
why: A capable agent asked to run tests in a pane and wait fell back to sleep + capture_pane polling -- it never found wait_for_output because the tool's surface said "watch pane output / settles / needle-free", not the agent's intent "run a command and wait for it to finish". The capability shipped in round 4; this surfaces it. what: - _instructions(): add a run-a-command-and-wait paragraph naming the split_pane/send_input -> wait_for_output workflow, the test/build use case, the "prefer over sleep + capture_pane polling" steer, and the "settled is not success" caveat. Gate it (events_enabled) so the sync server never names a tool it does not register. - wait_for_output: enrich description= with discovery vocabulary (completion, exit/return code, success/failed) and add a NumPy Parameters section -- FastMCP parses it into per-param schema descriptions even when description= is overridden. - docs: rename the colliding sync polling helper wait_for_output -> wait_for_text and point agents at the event-backed tool. - tests: lock the discoverable wording -- instructions name the tool + workflow + anti-polling steer; tool metadata carries the vocab + per param descriptions; events=off omits the live-output guidance.
why: The self-kill guards left two known gaps: the per-op (op_*) kill
surface skipped the others=True sibling case, and the conservative
socket match relied on path reconstruction that diverges on macOS.
what:
- Add an authoritative conservative_socket() that queries the engine's
#{socket_path} for a -L name or ambient socket, so the guard's socket
scoping survives a macOS $TMUX_TMPDIR divergence; an explicit -S path
is used as-is.
- Lift the others=True guards (guard_kill_other_panes / _windows) into
_resolve so the curated tools and guard_destructive_op share them; the
per-op op_kill_pane/op_kill_window now refuse killing the caller's
sibling pane/window too.
- Cover with offline (conservative_socket) and live (curated + per-op
others=True) regression tests.
why: An adversarial review found the needle-free monitor could falsely
report a command "settled" and could raise or mislead when the watched
pane died -- the exact cases the design targets.
what:
- Make AsyncControlModeEngine.subscribe() a true broadcast: each
subscriber gets its own queue, so wait_for_output, watch_events, and
poll_events no longer steal each other's %output frames (which caused
a premature settle with truncated text).
- Make wait_for_output fail-safe when the watched pane is gone:
_read_done no longer raises or fabricates pane_dead=False on a blank
or fallback-pane probe (pane_dead becomes Optional/unknown, keyed on
#{pane_id}), and the settle snapshot capture is guarded; the result
is preserved.
- Add a derived exit_code to MonitorResult.
- Supervise the pull ring drainer (aclosing + recorded error surfaced
via poll_events) so a reader failure cannot silently freeze it.
- Regression test: two concurrent subscribers each see every event.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the typed operations + engines architecture under
libtmux.experimental.{ops,engines,models,facade}— an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under
libtmux.experimental(explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.What's delivered
The spine —
libtmux.experimental.ops(pure, no tmux):Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Purerender()with declarative version gating;build_result()adapts raw output to a typed result (version-threaded so read parsing matches the gated render).Resulthierarchy with opt-inraise_for_status():AckResult(no-output commands — success/failure only),SplitWindowResult/CreateResult(captured ids),CapturePaneResult(lines),ListPanes/Windows/SessionsResult(snapshot-deriving rows).Targetsum, fail-closedOperationRegistry, stdlib serialization, andcatalog()(registry-derived docs data).LazyPlan(record → resolveSlotRefforward refs → execute) with chainability:>>/OpChaincomposition andexecute(fold=True)folding chainable runs into onetmux a ; bdispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux'scmdq_remove_group).ListPanes/ListWindows/ListSessionsops render the same-Ftemplate neo uses (imported, not copied) and parse intomodelssnapshots — a typed read surface parallel to neo, leaving the ORM untouched.Engines —
libtmux.experimental.engines(all behindTmuxEngine/AsyncTmuxEngine, all returning the sameCommandResult):SubprocessEngineAsyncSubprocessEngineConcreteEngineAsyncConcreteEnginetmux -C)ControlModeEngineAsyncControlModeEngine(event stream viasubscribe())ImsgEngine(opt-in easter egg)Control engines use an I/O-free bytes
ControlModeParserwith FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX+SCM_RIGHTS,PROTOCOL_VERSION8) and has a live parity test vs the subprocess engine the prototype never had.Models —
libtmux.experimental.models: frozenPane/Window/Session/ServerSnapshot(typed core + raw field tail),from_pane_rows()builds the whole tree from onelist-panes -aquery, round-trips to plain dicts — neo-like but decoupled and serializable.Facades —
libtmux.experimental.facade("mode lives in the type"): eagerServer→Session→Window→Panenavigation,LazyWindow/LazyPane,AsyncWindow/AsyncPane— all over the same ops; control mode is just an engine choice.Docs: an in-repo
tmuxop-catalogSphinx directive renderscatalog()into the operation reference (exercised by the docs gate), so the reference can't drift from the code.Testing
ruff,ruff format,mypy --strict,pytest(1501 passed, 2 skipped),build-docs. (The occasionaltest_retry.pytiming flake is pre-existing and unrelated — passes in isolation.)Design notes
raise_for_status(). Same result shape across engines.attach(which falls back to a local spawn).Refs #688, #689.