Skip to content

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690

Open
tony wants to merge 73 commits into
masterfrom
engine-ops
Open

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
tony wants to merge 73 commits into
masterfrom
engine-ops

Conversation

@tony

@tony tony commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Implements the typed operations + engines architecture under libtmux.experimental.{ops,engines,models,facade} — an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with ;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.

Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under libtmux.experimental (explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.

What's delivered

The spine — libtmux.experimental.ops (pure, no tmux):

  • Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Pure render() with declarative version gating; build_result() adapts raw output to a typed result (version-threaded so read parsing matches the gated render).
  • Typed Result hierarchy with opt-in raise_for_status(): AckResult (no-output commands — success/failure only), SplitWindowResult/CreateResult (captured ids), CapturePaneResult (lines), ListPanes/Windows/SessionsResult (snapshot-deriving rows).
  • Closed Target sum, fail-closed OperationRegistry, stdlib serialization, and catalog() (registry-derived docs data).
  • LazyPlan (record → resolve SlotRef forward refs → execute) with chainability: >> / OpChain composition and execute(fold=True) folding chainable runs into one tmux a ; b dispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux's cmdq_remove_group).
  • Read seam: ListPanes/ListWindows/ListSessions ops render the same -F template neo uses (imported, not copied) and parse into models snapshots — a typed read surface parallel to neo, leaving the ORM untouched.
  • 57 operations across client/pane/window/session/server scopes.

Engines — libtmux.experimental.engines (all behind TmuxEngine/AsyncTmuxEngine, all returning the same CommandResult):

Family Sync Async
Subprocess (classic) SubprocessEngine AsyncSubprocessEngine
Concrete (in-memory) ConcreteEngine AsyncConcreteEngine
Control mode (tmux -C) ControlModeEngine AsyncControlModeEngine (event stream via subscribe())
Native imsg (binary protocol) ImsgEngine (opt-in easter egg)

Control engines use an I/O-free bytes ControlModeParser with FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX + SCM_RIGHTS, PROTOCOL_VERSION 8) and has a live parity test vs the subprocess engine the prototype never had.

Models — libtmux.experimental.models: frozen Pane/Window/Session/ServerSnapshot (typed core + raw field tail), from_pane_rows() builds the whole tree from one list-panes -a query, round-trips to plain dicts — neo-like but decoupled and serializable.

Facades — libtmux.experimental.facade ("mode lives in the type"): eager Server→Session→Window→Pane navigation, LazyWindow/LazyPane, AsyncWindow/AsyncPane — all over the same ops; control mode is just an engine choice.

Docs: an in-repo tmuxop-catalog Sphinx directive renders catalog() into the operation reference (exercised by the docs gate), so the reference can't drift from the code.

Testing

  • ~240 experimental tests + doctests; the pure spine/models/concrete tests need no tmux, while classic/control/async/imsg engines and the facades are validated against a real tmux server via the libtmux fixtures.
  • Cross-engine contract suite: same typed result across engines; serialization round-trips.
  • Full repo gate green: ruff, ruff format, mypy --strict, pytest (1501 passed, 2 skipped), build-docs. (The occasional test_retry.py timing flake is pre-existing and unrelated — passes in isolation.)

Design notes

  • Revises Design typed operations and engines #688: execution mode lives in the facade type, not a runtime-bound engine attribute (return types differ by mode).
  • Per-engine error policy: classic reproduces today's behavior; newer engines return typed results with opt-in raise_for_status(). Same result shape across engines.
  • Core is stdlib-dataclass-only; an OTel/MCP edge can sit behind an extra.
  • imsg is opt-in and non-default: it depends on tmux's internal protocol (v8), is POSIX-only, and cannot host attach (which falls back to a local spawn).

Refs #688, #689.

why: Operationalizes the typed-operations/engines architecture
(issues 688, 689) with the pure substrate that was absent from every
prototype branch: an inert, statically-typed operation value that
renders tmux commands, carries its result type, and serializes without
a live tmux server. Engines stay transport-agnostic over it. None of
this touches or changes existing public APIs.

what:
- Add libtmux.experimental.{ops,engines} packages (experimental, not
  under the versioning policy)
- ops: frozen Operation[ResultT] with class-level metadata as the
  single source of truth; pure render() with declarative version gating
  (LooseVersion); build_result() adapting raw output to typed results
- ops: typed Result base + raise_for_status() (CPython/requests
  precedent), SplitWindowResult/CapturePaneResult payloads
- ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/
  IndexRef/Special/SlotRef) with fail-closed validation
- ops: fail-closed OperationRegistry keyed by kind, with OpSpec views
  and predicate listing; stdlib dict serialization with round-trips
- ops: four seed operations (split-window, capture-pane, send-keys,
  select-layout) registered via @register
- engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/
  CommandResult, EngineSpec; run()/arun() execute bridge sharing one
  render/build path (sync vs await is the only divergence)
- tests: 111 pure, fixture-parametrizable unit tests + doctests, all
  runnable without a tmux server
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.68582% with 1291 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.42%. Comparing base (42cf219) to head (4229265).

Files with missing lines Patch % Lines
scripts/mcp_swap.py 26.72% 314 Missing and 15 partials ⚠️
src/libtmux/experimental/engines/imsg/base.py 51.59% 163 Missing and 34 partials ⚠️
src/libtmux/experimental/engines/control_mode.py 65.43% 70 Missing and 33 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py 71.74% 42 Missing and 21 partials ⚠️
src/libtmux/experimental/engines/imsg/v8.py 75.29% 46 Missing and 17 partials ⚠️
...rc/libtmux/experimental/mcp/vocabulary/_resolve.py 60.95% 46 Missing and 11 partials ⚠️
docs/_ext/tmuxop.py 18.18% 36 Missing ⚠️
src/libtmux/experimental/mcp/vocabulary/pane.py 75.34% 32 Missing and 4 partials ⚠️
src/libtmux/experimental/mcp/__init__.py 46.77% 28 Missing and 5 partials ⚠️
src/libtmux/experimental/workspace/runner.py 60.86% 19 Missing and 8 partials ⚠️
... and 61 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #690       +/-   ##
===========================================
+ Coverage   51.30%   72.42%   +21.12%     
===========================================
  Files          25      192      +167     
  Lines        3487    10873     +7386     
  Branches      686     1431      +745     
===========================================
+ Hits         1789     7875     +6086     
- Misses       1403     2414     +1011     
- Partials      295      584      +289     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tony added 4 commits June 21, 2026 09:08
why: Proves the operation/result contract is transport-agnostic -- the
same typed result whether produced by a real tmux subprocess or an
in-memory simulator -- and provides the offline engine that lets ops
doctests and tests run without a tmux server (issue 689 phases 2-3).

what:
- engines.subprocess: classic SubprocessEngine mirroring tmux_cmd
  (has-session stderr fold, backslashreplace, trailing-blank strip;
  tmux failure returned as data, only missing binary raises), with
  for_server() deriving -L/-S/-f/-2 flags from a live Server
- engines.concrete: deterministic in-memory engine (fabricated pane/
  window/session ids, canned capture lines) for tests and docs
- engines.registry: name-keyed engine registry (register/create/
  available), seeded with subprocess + concrete
- tests/experimental/contract: engine-agnostic operation contract run
  offline via concrete, plus classic-vs-concrete parity against a real
  tmux server (same result type + argv, payload may differ)
why: Completes the sync/async-symmetric execution story plus the
deferred-execution and documentation mechanisms from issue 689
(phase 5 + docs), still without touching any existing API.

what:
- engines.asyncio: real AsyncSubprocessEngine on
  create_subprocess_exec (terminates the child on cancellation; not a
  thread wrapper), mirroring the classic engine's output handling so it
  returns the same typed result
- ops.plan: LazyPlan records operations without touching tmux and
  resolves SlotRef forward refs at execute time via a sans-I/O
  generator; sync execute() and async aexecute() share one resolution
  core (run vs await arun is the only divergence); whole-plan
  serialization round-trips
- ops.catalog: registry-driven CatalogEntry list (scope, version
  gates, effects, safety, result type, summary) -- the single source a
  docs domain renders, so runtime and docs cannot drift
- tests: lazy resolution sync+async, plan serialization, catalog
  coverage, async-vs-sync classic parity against a real tmux server
why: Proves control mode is just another engine returning the same
typed result (issue 689 phase 4) -- an operation run over a persistent
tmux -C connection is indistinguishable, at the result level, from one
run via fork-per-call subprocess.

what:
- engines.control_mode: ControlModeEngine over one persistent tmux -C
  connection; run_batch pipelines commands and parses each command's
  %begin/%end/%error block into a CommandResult; selectors-based
  nonblocking reads with timeout; startup-ACK discard; lifecycle via
  close()/context manager (lock-guarded teardown)
- engines.control_mode: I/O-free ControlModeParser, unit-testable
  without tmux, adapted from the chain runner + protocol-engines parser
- register control_mode in the engine registry and export it
- tests: pure parser tests + real-tmux contract (split creates a real
  pane, batched commands, control-vs-concrete parity)
why: Demonstrates the "mode lives in the type" model from issue 689 --
EagerPane.split() returns a live EagerPane while LazyPane.split() returns
a deferred LazyPane, each a single statically-known return type, both
backed by the same SplitWindow operation. One Pane class with a
runtime-bound engine could not type these return values distinctly.

what:
- facade.pane.EagerPane: executes immediately, returns live handles
  (split -> EagerPane), typed results for capture/send_keys
- facade.pane.LazyPane: records into a LazyPlan, returns deferred handles
  (split -> LazyPane bound to the new pane's SlotRef), chainable
- seed of the wider Server/Session/Window/Pane/Client x mode matrix
- tests: eager live handles, lazy deferral + forward-ref resolution,
  and same-operation-backs-both-facades parity
@tony tony changed the title Typed operations and engines: inert op spine (#689) Typed operations and engines: spine + 4 engines + facades (#689) Jun 21, 2026
@tony tony changed the title Typed operations and engines: spine + 4 engines + facades (#689) Typed operations and engines Jun 21, 2026
tony added 9 commits June 21, 2026 09:57
why: Closes the two async gaps from issue 689: control mode and concrete
had no async sibling. The async control engine is the one async engine
that earns its place -- it adds an event stream subprocess cannot -- and
prior libtmux/mux control-mode work (surfaced across agent histories via
agentgrep, plus the asyncio-2 branches) shaped its correlation design.

what:
- engines.async_control_mode: AsyncControlModeEngine over a persistent
  tmux -C (create_subprocess_exec + one reader task). FIFO future
  correlation with skip-when-empty so unsolicited %begin blocks (hook-
  triggered commands and the startup ACK) never desync results; the
  startup ACK is consumed synchronously in start() to close the
  correlation race our whole-block parser would otherwise have. DEAD
  state fails pending commands on reader EOF/error. Cancellation via
  asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded
  subscribe() notification stream with drop-counting. for_server() helper
- engines.control_mode: ControlModeParser now surfaces bare %-notification
  lines via notifications() (additive; the sync engine ignores them)
- engines.concrete: AsyncConcreteEngine sibling over shared simulation;
  removes the async test shim
- ControlNotification typed event value
- tests: parser notification/drain; async control vs real tmux (split,
  pipelined batch, concrete parity, live event stream, lifecycle)
why: Many tmux commands print nothing (rename-window, kill-pane,
select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls
cmdq_error on failure, framed in control mode as %end vs %error (see
tmux cmd-queue.c) -- they never cmdq_print. They still need a typed
result that records success/failure without inventing a payload.

what:
- results.AckResult: a typed acknowledgement (no payload) whose
  raise_for_status() still surfaces the error path; documents the tmux
  success/error mapping
- retarget send-keys and select-layout to AckResult (both print nothing)
- add no-output ops: rename-window (mutating), kill-window and kill-pane
  (destructive) -- exercising AckResult across scopes and safety tiers
- export AckResult and the new ops; refresh the catalog doctest
- tests: render + AckResult success/failure across the no-output ops and
  destructive safety metadata; update classic/control parity assertions
why: A neo-like read model is useful, but neo.Obj is one flat ~200-field
class fused to the query/dispatch pipeline. The experimental namespace
lets us try a decoupled, immutable, serializable snapshot layer without
any risk to the shipped ORM APIs.

what:
- libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot /
  SessionSnapshot / ServerSnapshot, each a typed core plus the full raw
  tmux-format tail in .fields (nothing tmux reported is lost)
- from_format() builds one node from a format mapping;
  ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row
  set into an ordered session/window/pane tree
- to_dict()/from_dict() round-trip the whole tree as plain data, with no
  live objects
- pure tests (no tmux): value coercion, tree grouping/order, round-trip
why: The list/show read commands overlap neo's reader. Rather than
touch the ORM, add a parallel typed read surface in experimental.ops
that yields immutable models snapshots. The render version must thread
into result parsing first, because the -F template is version-gated and
the parser must split against the same fields it was rendered with.

what:
- operation: thread `version` through build_result -> _make_result so
  payload parsing matches the version-gated render (backward compatible;
  existing overrides accept and ignore it); execute.run/arun pass it
- ops._read: re-export neo.get_output_format / parse_output and
  formats.FORMAT_SEPARATOR as the single source of truth (no copies)
- list-panes / list-windows / list-sessions ops (readonly,
  chainable=False) render the same -F template neo builds and parse rows
  into models snapshots
- ListPanesResult/.../ store JSON-friendly rows and derive typed views
  (.panes/.server/.windows/.sessions) via properties, so results
  serialize and round-trip with no special-casing
- tests: -F parity with neo, snapshot-tree build, serialize round-trip,
  and live list-panes/sessions/windows against a real tmux server
why: The operation catalog is registry-derived data, so rendering it in
docs keeps the operation reference from drifting from the code -- and the
docs gate then exercises catalog() on every build.

what:
- docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that
  walks libtmux.experimental.ops.catalog() and emits a table, with
  :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty
- conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions
- docs/experimental.md: an experimental ops/engines overview embedding
  the catalog (full + readonly + destructive views), in the index toctree
why: The sync control engine skipped tmux's startup ACK with a fragile
one-shot flags==0 heuristic and had no defense against hook-emitted
%begin/%end blocks, so a stray block could desync request->result
alignment. The async engine already handles this; backport the approach.

what:
- consume the startup ACK synchronously at connect (_consume_startup),
  dropping the one-shot _startup_ack_pending heuristic, so the startup
  block can never be conflated with a command's result block
- drain buffered unsolicited blocks before each batch
  (_drain_unsolicited), so a hook-triggered command's block left over
  from a prior call is not mis-attributed to the next command
- drain notifications during reads to keep the parser buffer bounded
- regression test: many sequential commands stay aligned (first result
  is real; each call drains before reading its own block)

A hook firing mid-pipelined-batch still needs per-command number
correlation to disambiguate; single-command run() is robust.
why: The chainable-commands prototype folds independent commands into one
"tmux a ; b" dispatch. Our typed-op model is a better host for it -- the
Operation already carries a `chainable` classvar and the result Status
already reserves `skipped` for exactly the chain-drop case. So yes, lazy
mode can adopt the prototype's chainability.

what:
- mark output/creation ops non-chainable (capture-pane, split-window;
  list-* already were) so a fold never drops captured data or an id
- ops._chain: render_chain (join chainable ops with standalone ';',
  escaping a trailing-';' arg), ensure_chainable (fail closed), and
  attribute -- splitting one merged ';'-chain result into a typed result
  per op (success -> all complete; failure -> first failed, rest skipped,
  matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then
- Operation.__rshift__/then compose into an OpChain; result_with_status()
  builds a result with an explicit status (skipped/failed attribution)
- LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of
  chainable, resolved ops dispatch once via engine.run; the sans-I/O
  _drive yields _Single or _Chain so sync and async share the core;
  add_chain() records an OpChain
- tests: >> composition, render_chain, fold=one dispatch, fold-off=N
  dispatches, failure attribution, creators stay unfolded, add_chain
why: Extend the mode-in-the-type facades beyond the pane seed so a typed
return value distinguishes eager/lazy/async across scopes -- and add the
few creation ops the cross-scope navigation needs.

what:
- ops: NewWindow / NewSession (CreateResult, capture the new id),
  KillSession, RenameSession; generalize binding capture via
  Result.created_id (base None; SplitWindowResult -> new_pane_id;
  CreateResult -> new_id) so lazy plans bind window/session creations too
- facade: eager Server -> Session -> Window -> Pane navigation
  (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a
  plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops.
  Control mode stays an engine choice, not a separate facade family
- EagerServer.for_server() binds the classic engine to a live Server
- tests: offline navigation across scopes/modes (concrete engine), and a
  live eager Server -> Session -> Window -> Pane build against real tmux
  with cleanup
why: The native binary peer-protocol engine is the strongest proof the
operation/result contract is transport-agnostic -- the same typed
CommandResult whether produced by a subprocess, tmux -C, or by speaking
tmux's imsg protocol directly. Research confirmed it is pure-stdlib and
CI-verifiable; the prototype it is ported from only ever tested against a
fake socketpair server, never real tmux.

what:
- port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines:
  ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and
  ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len,
  peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT
  handshake); posix_spawn local fallback for attach / start-server /
  no-server-running
- adapt to the experimental tuple CommandResult (drop the process field);
  add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion)
  and select the v8 codec directly; keep the version-mismatch retry
- register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX
  is only touched at runtime; tests skip without it)
- tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the
  live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine
  return identical stdout/returncode for read-only commands against a
  real tmux server (runs across the CI tmux matrix)
@tony tony changed the title Typed operations and engines Typed operations & engines: spine, 6 engines, plans, models, facades (#689) Jun 21, 2026
tony added 11 commits June 21, 2026 12:01
why: Finish the mode-in-the-type matrix so every tmux scope has
eager/lazy/async facades, and add the client-scoped ops a Client facade
needs. The matrix is now 5 scopes x 3 modes, all over the shared spine.

what:
- ops: detach-client, refresh-client, switch-client (AckResult, client
  scope; switch-client renders -c/-t rather than the generic target)
- facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new
  client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server
  binds the async engine to a live Server
- tests: a lazy full Server->Session->Window->pane plan, async navigation,
  and eager/lazy/async client methods
why: The pre-commit gate now runs `uv run ty check`, so ty must be a
configured dev tool. Brings the ty setup from the add-ty-type-checker
branch and makes the experimental tree ty-clean.

what:
- add `ty` to the dev dependency group (uv.lock updated)
- add [tool.ty] (environment py3.10, src=src/tests) with the documented
  rule ignores for known ty false positives, ported verbatim
- fixes ty surfaced in experimental: Target is now a real union (ty
  rejects an implicit two-string type alias); OperationRegistry.list ->
  select so the `-> list[OpSpec]` return annotation is not shadowed by
  the method name
why: Make lazy-plan dispatch strategy pluggable and A/B-testable, and add
the chainable-commands {marked} lone-pane single-dispatch optimization
the plain ;-fold lacked.

what:
- ops.planner: Planner Protocol + PlanStep; SequentialPlanner (one
  dispatch per op), FoldingPlanner (;-fold maximal chainable runs),
  MarkedPlanner (fold a pane creation + the chainable ops decorating its
  slot into one "split -P -F ; select-pane -m ; ... -t {marked} ;
  select-pane -M" dispatch)
- _chain: render_marked / attribute_marked
- LazyPlan.execute/aexecute take planner= (default SequentialPlanner),
  replacing fold=bool; _drive consumes the planner's PlanStep units and
  stays sans-I/O so sync and async share it
- tests (NamedTuple + test_id): planner dispatch counts 3/2/1 with an
  identical PlanResult, marked single-dispatch rendering + fallback, and
  a live {marked} fold against a real tmux server
why: The read seam only covered the list-* family, leaving common
queries (existence, format evaluation, option dumps, attached
clients) outside the typed operation/result model.

what:
- Add has-session, display-message, show-options, list-clients ops,
  each rendering inert argv and parsing tmux output into a typed result
- Add HasSessionResult.exists, DisplayMessageResult.text,
  ShowOptionsResult.options, ListClientsResult.clients result types
- Add ClientSnapshot model (a leaf view, not part of the tree)
- has-session maps rc != 0 to exists=False (a valid answer, not failure)
- Wire ops/results/snapshot exports; update enumerating doctests/tests
- Add test_read_breadth.py (NamedTuple + test_id render/parse/round-trip
  cases plus live tmux coverage)
why: The operation surface lacked the pane verbs the ORM relies on
(select/resize/swap/break/join/move/respawn/pipe/clear-history),
blocking pane-level parity for engine-driven callers.

what:
- Add select-pane, last-pane, resize-pane, respawn-pane, pipe-pane,
  clear-history (single-target) ops
- Add swap-pane, join-pane, move-pane (dual-target) and break-pane
  (creates a window, captures #{window_id} into CreateResult)
- Add src_target field + src_args() helper on Operation for the -s
  source of dual-target commands; serialize handles src_target like
  target
- Wire ops/exports; extend the catalog kind-enumeration doctest
- Add test_pane_ops.py (NamedTuple + test_id render/round-trip cases
  plus live tmux coverage)
why: Window-level parity was missing the verbs the ORM uses to
navigate and rearrange windows, so engine-driven callers could not
select, move, or relink windows.

what:
- Add select-window, last-window, next-window, previous-window,
  resize-window, rotate-window, respawn-window, unlink-window
- Add swap-window, move-window, link-window (dual-target, via -s
  src_target)
- Wire ops/exports; extend the catalog kind-enumeration doctest
- Add test_window_ops.py (NamedTuple + test_id render/round-trip
  cases plus live navigation/swap/move/unlink coverage)
why: Engine-driven callers had no typed way to drive the tmux server
lifecycle or write options, environment, and hooks -- the write side
of the options surface that show-options already read.

what:
- Add start-server, kill-server, run-shell, source-file,
  suspend-client lifecycle ops
- Add set-option, set-window-option (the write counterpart to
  show-options), set-environment, set-hook
- Wire ops/exports; extend the catalog kind-enumeration doctest
- Add test_lifecycle_ops.py (NamedTuple + test_id render/round-trip
  cases plus live option/env/hook/run-shell/source-file coverage)
why: The paste-buffer family the ORM uses for clipboard interchange
had no typed operations, leaving buffer set/load/save/paste outside
the engine-driven surface.

what:
- Add set-buffer, delete-buffer, load-buffer, save-buffer,
  paste-buffer ops
- Add show-buffer read op + ShowBufferResult.text (buffer contents)
- Wire ops/results/exports; extend the catalog kind-enumeration and
  registry readonly doctests
- Add test_buffer_ops.py (NamedTuple + test_id render/round-trip
  cases plus a live set/show/save/delete and load/paste round-trip)
why: The experimental page described operations and the catalog but
not how to run them or compose multi-step plans, leaving the engine
choice and planner A/B story undocumented.

what:
- Add "Running an operation" (run/arun, raise_for_status policy)
- Add "Choosing an engine" (engine table, create_engine, async peers)
- Add "Lazy plans and planners" (LazyPlan slot refs, >> chaining,
  Sequential/Folding/Marked planners)
- All examples are executable doctests via the in-memory ConcreteEngine
why: Record the experimental operations/engines layer for the
upcoming release so the unreleased section tracks what landed.

what:
- Add a "What's new" deliverable under the unreleased 0.59.x section
  for the experimental operations and engines layer (#690)
- Defer the release lead paragraph until the version is cut
why: An adversarial review of the new ops against tmux's command
grammar found two defects: move-window could not request its
kill-on-collision behavior, and paste-buffer's -r flag was
documented as a space replacement it never performs.

what:
- MoveWindow: add kill (-k) field; tmux move-window's option string
  is "abdkrs:t:" and -k replaces any window already at the
  destination index
- PasteBuffer: rename no_format to no_replace and fix the docstring;
  -r keeps linefeeds instead of converting them to the default
  carriage-return separator (it has nothing to do with spaces)
- Add render cases for move-window -k/-r and paste-buffer -r
tony added 7 commits June 21, 2026 18:49
why: SendKeys(literal=True, enter=True) rendered 'send-keys -l <keys>
Enter', but tmux's -l sends every arg literally, so "Enter" was typed
as five characters and the line was never submitted.

what:
- __post_init__ raises ValueError on literal+enter (fail closed); the
  correct pattern is two operations
- Document the constraint on the enter parameter
- Add parametrized test_send_keys_literal_enter_guard
why: DisplayMessageResult.text took only stdout[0], silently dropping
all but the first line of a multi-line display-message format.

what:
- Join all stdout lines into .text (matching ShowBuffer); single-line
  output is unchanged
- Add a multi-line parse case to test_read_breadth
why: subprocess/asyncio/imsg each folded has-session's stderr into
stdout but control mode did not, so HasSession's result diverged by
engine. The fold is a has-session concern, not an engine concern.

what:
- HasSession._make_result surfaces stderr[0] in stdout when stdout is
  empty, so every engine yields a consistent result
- Remove the per-engine `"has-session" in cmd` fold from subprocess,
  asyncio, imsg; soften the subprocess/asyncio docstrings accordingly
- Add test_has_session_folds_stderr_to_stdout
why: ConcreteEngine is stateless, so has-session (and other existence
queries) always report success -- HasSession.exists is always True
through it. That surprise should be documented.

what:
- Add a Notes section to ConcreteEngine documenting the stateless
  simulation and that queries like has-session need a live engine
why: imsg _connect created the socket inside the try whose except calls
sock.close(); if socket() itself failed (e.g. fd exhaustion), sock was
unbound and the handler raised UnboundLocalError, masking the real
OSError.

what:
- Create the socket before the try so the except only runs once sock
  exists
- Add test_imsg_connect_socket_failure_raises_oserror (monkeypatched
  socket.socket)
why: If the tmux server closed the socket right after MSG_EXIT (before
MSG_EXITED), recv_frame raised ImsgProtocolError, which run() did not
catch -- so a normal command exit became an exception, diverging from
the subprocess engine.

what:
- Catch ImsgProtocolError around recv_frame; once seen_exit is set,
  treat a clean close as the end and return the computed exit result
why: _run_socket_command duplicated stdin/stdout fds for SCM_RIGHTS
transfer, but if building the identify frames or opening the transport
raised before send_frames ran, those descriptors leaked (send_frames
only closes the fds once it owns the frames).

what:
- Close the dup'd fds if codec.identify_messages or the transport
  constructor raises; once send_frames runs it owns/closes them
@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

why: The v8 identify burst sent MSG_IDENTIFY_LONGFLAGS twice -- a
byte-identical copy-paste in the initial codec. A real tmux client
sends it once; the duplicate is harmless (the server sets the flags
idempotently) but is redundant wire traffic.

what:
- Drop the duplicate MSG_IDENTIFY_LONGFLAGS frame in
  ProtocolV8Codec.identify_messages
- Add a parametrized regression test asserting each identify frame
  type is emitted the expected number of times (LONGFLAGS once)
@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

tony added 10 commits June 22, 2026 17:12
why: The async control-mode tests only dispatched single-command
operations on happy paths, leaving the reader's multi-block
correlation, batch pipelining, lifecycle short-circuits, and the
error-as-data policy uncovered.

what:
- Cover %output and no-% notification parsing
- Test run_batch([]) short-circuit and aclose-before-start no-op
- Assert for_server threads the live socket into server_args
- Pipeline two requests through one run_batch call
- Fold a ; chain via FoldingPlanner (exercises expected>1 blocks)
- Confirm a rejected command returns a failed result, no raise
why: Declarative workspace builders (tmuxp-style) need injected setup
commands kept out of shell history. tmux has no native flag; the
convention is a leading space honored by HISTCONTROL=ignorespace.

what:
- Add suppress_history field (default False, opt-in, no behavior change)
- Prepend a space to the keys when set and not literal
- Document the convention and add a render doctest
why: A declarative WorkspaceBuilder must target the first pane of a
created window (e.g. to focus it) without the caller handling ids. The
implicit first pane had no captured id, so it could only be addressed as
the active pane -- which moves once the window is split.

what:
- NewSession.capture_panes / NewWindow.capture_pane (opt-in): emit a
  multi-id -F so the result also carries first_window_id/first_pane_id
- CreateResult gains first_window_id/first_pane_id + created_subids
- SlotRef.part ("self"/"window"/"pane") + .window/.pane sub-refs; the
  plan binds created_subids so a sub-ref resolves to its captured id
- ConcreteEngine fabricates one id per #{*_id} token (single-token
  formats unchanged, preserving existing fabricated-id sequences)
why: tmuxp-style workspace creation needs a structural, declarative
object language (à la SQLAlchemy Declarative on Core) -- declare the
shape of a session/windows/panes and let a compiler lower it to Core
operations, engine- and sync/async-neutral, instead of hand-driving
ops.

what:
- New experimental.workspace package: analyzer (tmuxp YAML/dict -> IR),
  ir (Workspace/Window/Pane specs), compiler (spec -> Core LazyPlan,
  wiring first-pane sub-refs so the user never handles an id), runner
  (build/abuild over any engine + host steps + idempotent replace),
  confirm (live structure diff)
- Workspace.compile()/build()/abuild(); on_exists error|replace|reuse
- Robust QA: offline (op order, plan serialize round-trip, 3-way
  planner equivalence) + live (rich 3-window build over subprocess and
  async-control: names/order, pane counts, focus, options, env, cwd,
  commands)
why: The MCP tier needs to round-trip a plan's forward-ref bindings
through JSON (tuple keys are not JSON-native) and to dry-run a plan's
argv without an engine.

what:
- serialize.bindings_to_dict / bindings_from_dict ((slot, part) <-> "slot:part")
- LazyPlan.preview(): render each op's argv, None for unresolved SlotRefs
why: Expose the typed-operations Core + Declarative tiers as a typed,
chained, toolable command surface for agents, without coupling the
library to any MCP framework.

what:
- experimental.mcp: ToolDescriptor/ParamDescriptor + OperationToolRegistry
  (per-op descriptors generated from the registry), an optional-pydantic
  schema builder, and TargetResolver (string/dict -> typed Target)
- plan tools: preview_plan (dry-run), execute_plan (+ bindings),
  result_schema introspection, build_workspace
- curated vocabulary: intuitive named tools mirroring libtmux's ORM
  (create_session/split_pane/send_input/list_*/kill_*/...) -> typed results
- pure (ConcreteEngine) + live tmux tests; no fastmcp dependency
why: Prove the projection drives a real MCP server and give downstream
servers a one-call binding -- behind an optional extra so the core stays
dependency-free.

what:
- fastmcp_adapter.build_server(engine): register the curated vocabulary as
  typed FastMCP tools (engine bound out of the schema, safety ->
  ToolAnnotations)
- add fastmcp to a new [project.optional-dependencies] mcp extra
- in-process tests (offline + live) call the tools via fastmcp's Client
why: The fastmcp adapter only exposed the curated vocabulary -- agents
had no access to the full operation set or to plan composition, and the
server could not be launched on its own.

what:
- Register one op_<kind> tool per operation via a dynamic-schema Tool
  subclass (engine + descriptor on PrivateAttr, explicit parameters
  schema), re-injecting target/src_target at the adapter edge; tagged
  per-op and hidden by default (expose_operations=True reveals them)
- Register plan tools (preview_plan/execute_plan/result_schema/
  build_workspace) taking serialized operations + a planner name
- Add build_server flags: include_operations/expose_operations/
  include_plan_tools
- Make the server runnable: main()/default_server(), __main__.py,
  fastmcp.json, and a libtmux-engine-mcp console script
- Extend adapter tests: offline per-op/plan/workspace, default-server,
  --help exit, live plan execution
why: To test and dogfood the MCP server in local agent CLIs, we need a
way to point Claude/Codex/Cursor/Gemini at this checkout instead of a
pinned release.

what:
- Port scripts/mcp_swap.py from libtmux-mcp (PEP 723 uv-script, tomlkit):
  detect/status/use-local/revert with timestamped backups, dry-run, and
  Claude user/project scopes
- Derive the server slug from the [project.scripts] entry
  (libtmux-engine-mcp -> libtmux-engine) instead of project.name, so it
  stays distinct from a sibling libtmux server; a strict generalization
  (libtmux-mcp still resolves to libtmux)
- Namespace the swap state dir libtmux-engine-mcp-dev
- Add tests: console-script registration (always-on) + slug/local-spec
  derivation (tomlkit-gated)
why: fastmcp is only the optional `mcp` extra, so a plain `uv sync`
prunes it -- which silently turns `uv run mypy` red and makes every
fastmcp adapter test importorskip away. The committed adapter's green
gate depended on fastmcp happening to be installed.

what:
- Add fastmcp + tomlkit to the dev and testing dependency-groups so the
  standard gate type-checks and runs the adapter + mcp_swap tests
  (fastmcp also stays the `mcp` extra for end users)
- Add --ignore=docs/_build to pytest addopts: `docs` is a testpath, so a
  stale built-HTML tree poisons collection (the gate's rm docs/_build
  first-step was the only guard)
- Reformat ops/plan.py (pre-existing blank-line drift surfaced by ruff)
- uv.lock: add tomlkit (no other version churn)
tony added 9 commits June 23, 2026 17:33
why: The Declarative WorkspaceBuilder tier had thin coverage -- a
single analyzer shorthand case, and nothing exercising the compiler's
host-step schedule or the runner's on_exists preflight policy. Lock in
those behaviors so a regression in the Declarative-to-Core lowering or
the host-side orchestration is caught.

what:
- Analyzer/IR (offline): dimensions in both [x, y] and {width, height}
  forms; shell_command shorthand (string / list / {cmd} items); the
  None-pane and unsupported-pane TypeError paths; non-mapping-YAML
  rejection; session-field passthrough; per-pane orchestration fields;
  and Pane.commands run-form normalization
- Compiler (offline): dimensions threaded into new-session -x/-y;
  env/option/window-option ops emitted with their values; the
  before_script and pane sleep_before/sleep_after host-step schedule
  asserted off the pure op spine (anchored by send-keys position, not
  literal index); first-window reuse vs create-the-rest; and
  Workspace.compile() == compile_full().plan
- Runner/confirm (live tmux): before_script runs as a host step in
  start_directory; on_exists='reuse' short-circuits to an empty-but-ok
  result leaving the session untouched while 'error' raises
  FileExistsError; and confirm() flags a structural mismatch
why: The swap tool covered Claude / Codex / Cursor / Gemini but not
the Grok or Antigravity (agy) CLIs, so a local-checkout swap could not
reach two installed agents. Extending the registry lets one use-local
repoint the tmux MCP across all six.

what:
- Register grok (~/.grok/config.toml, TOML "mcp_servers" table, same
  shape as codex) and agy/Antigravity
  (~/.gemini/antigravity/mcp_config.json, JSON "mcpServers", same shape
  as cursor/gemini) in CLIName / ALL_CLIS / CLIS
- Route grok through the existing codex branch and agy through the
  cursor/gemini branch in get_server / set_server / delete_server
- Tolerate an empty JSON config in load_config so the swap can seed
  Antigravity's initially-empty mcp_config.json instead of raising
- Note in the docstring that the Antigravity IDE and the agy CLI may
  read different profiles; only the documented profile path is written
- Tests: grok (TOML) and agy (JSON) set/get/delete round-trips, the
  empty-JSON tolerance, and the registry shapes
why: The experimental MCP exposed only a thin synchronous projection.
Agents driving tmux need an intuitive, non-blocking surface that knows
which pane they are calling from, resolves "the pane relative to me" in
one call, and never silently targets the wrong pane.

what:
- Refactor the curated vocabulary into an async-first package
  (session/window/pane/buffer/option/server): each tool is one async
  def over arun, with a derived sync twin via a sans-I/O trampoline
  (_bridge.synced) -- a single source of truth per tool.
- Expand the lean curated set with high-value verbs and conveniences
  (grep_pane, capture_active_pane, geometry-resolved relative/corner
  pane tools, directional select_pane) plus a guarded run_tmux hatch.
- Add build_async_server (default AsyncControlModeEngine) awaited on
  FastMCP's loop; build_server stays the sync wrapper; main() and
  fastmcp.json go async-first.
- Add a live event stream (events.py): a push watch_events tool and a
  pull tmux://events ring buffer, selected by LIBTMUX_MCP_EVENTS.
- Make the surface caller-aware: server name 'tmux' plus steering
  instructions (when/anti-triggers/concrete-id rule); CallerContext
  reads TMUX_PANE/TMUX from the server's own env, socket-scoped;
  get_caller_context anchor; is_caller on list_panes/search_panes rows;
  the relative tools default to and require the caller pane origin;
  capture_relative_pane/grep_relative_pane/search_panes.
- Reject relative special targets ({up-of}/{down-of}/...) on capture,
  grep, send, and destructive pane tools with a hint pointing to the
  relative tools; anchor specials ({marked}/{last}) pass through.
- Cover with experimental tests + doctests (no pytest-asyncio).
why: The round-2 caller-awareness read only the MCP's own environment,
which real launchers (agent -> uv -> python child) strip -- so the
caller pane was undiscoverable and the whole surface went inert.

what:
- Add CallerContext.discover(): process-env -> explicit override
  (LIBTMUX_MCP_CALLER_PANE/TMUX) -> a bounded, same-uid Linux /proc
  parent-process env walk (vocabulary/_proc.py). Fail-closed (never
  raises), env-minimised to TMUX/TMUX_PANE, depth-capped, and injectable
  so it is unit-testable without /proc; records the discovery source.
- Bind the default engine to the discovered caller's -S socket when no
  explicit override (--socket-path/--socket-name/--no-caller-socket,
  $LIBTMUX_SOCKET*), so a stripped-env MCP still drives the user's own
  tmux server instead of a fresh default one.
- Add a conservative socket comparator (socket_could_match /
  is_conservative_caller) alongside the strict one: the strict stays on
  the is_caller annotation and origin resolution; the conservative,
  fail-safe one guards destructive ops.
- Refuse self-kill: kill_pane / respawn_pane / kill_window /
  kill_session (and the others=True siblings) decline the pane, window,
  or session running this MCP, with a hint to act manually.
- Thread the discovered context to the tool bodies by stashing it on the
  engine (read by caller_of); SyncToAsyncEngine delegates server_args /
  _caller_context so the sync surface sees the same identity.
- Cover with /proc parser, discovery precedence, comparator, and
  self-kill tests (no pytest-asyncio); fix the stale
  resolve_relative_pane active-pane-fallback docstring.
why: An adversarial review of the caller-discovery work surfaced
fail-unsafe and over-broad edges in the new guards.

what:
- guard_self_kill and the others=True sibling guards now resolve the
  caller's own pane to its window/session through a fail-safe helper: a
  caller pane absent from the engine's server is not a self-kill, so a
  kill on an unrelated target no longer raises a raw tmux error.
- Scope the ambient (socket=None) branch of both comparators to a
  process-env caller, so a parent-walked caller is not matched to an
  unbound default engine (which would mis-target resolve_origin reads
  and over-refuse kills under --no-caller-socket).
- Wrap socket_matches' realpath comparisons (a $TMUX-controlled path can
  raise) so the read-only tools degrade instead of crashing.
- Guard the op_* per-operation kill/respawn surface too, closing the
  bypass when --operations is enabled.
- Name the caller pane in the others=True refusal hints; correct the
  get_caller_context docstring; document the socket/caller precedence in
  --help; make the adapter tests hermetic (no host /proc walk).
why: Agents need to know when a command in a pane finishes without
hard-coding a needle (regex/sentinel) the tool must guess, and
without blocking the server. tmux stops emitting %output the instant
a pane goes quiet, so idle-since-last-%output is a structural signal
the agent interprets via the captured chunk plus pane metadata.

what:
- Add _settle.py pure core: decode_output (tmux octal), output_payload
  (per-pane filter, split not join so inner whitespace survives), and
  accumulate_until_settle (settle/byte/time/end fold over an injected
  async stream + clock). All doctested; no I/O, no fastmcp.
- Add wait_for_output edge tool in events.py: folds decoded %output to
  a frozen MonitorResult, reads DoneMetadata (pane_dead/status,
  pane_current_command) so the agent disambiguates finished vs blocked.
  ctx.info for live partials; aclosing for cancellation safety; each
  call runs in its own task.
- _ensure_attached: a bare tmux -C client emits no %output until
  attach-session, so attach (sticky per engine) before folding; raise
  on a failed attach so a stale session never yields a silent capture.
- Tests: pure settle unit + cancellation (test_settle.py); offline
  integration + attach/dropped/done coverage (test_events.py); live
  end-to-end against real tmux (test_monitor_live.py).
why: A capable agent asked to run tests in a pane and wait fell back
to sleep + capture_pane polling -- it never found wait_for_output
because the tool's surface said "watch pane output / settles /
needle-free", not the agent's intent "run a command and wait for it
to finish". The capability shipped in round 4; this surfaces it.

what:
- _instructions(): add a run-a-command-and-wait paragraph naming the
  split_pane/send_input -> wait_for_output workflow, the test/build
  use case, the "prefer over sleep + capture_pane polling" steer, and
  the "settled is not success" caveat. Gate it (events_enabled) so the
  sync server never names a tool it does not register.
- wait_for_output: enrich description= with discovery vocabulary
  (completion, exit/return code, success/failed) and add a NumPy
  Parameters section -- FastMCP parses it into per-param schema
  descriptions even when description= is overridden.
- docs: rename the colliding sync polling helper wait_for_output ->
  wait_for_text and point agents at the event-backed tool.
- tests: lock the discoverable wording -- instructions name the tool +
  workflow + anti-polling steer; tool metadata carries the vocab + per
  param descriptions; events=off omits the live-output guidance.
why: The self-kill guards left two known gaps: the per-op (op_*) kill
surface skipped the others=True sibling case, and the conservative
socket match relied on path reconstruction that diverges on macOS.

what:
- Add an authoritative conservative_socket() that queries the engine's
  #{socket_path} for a -L name or ambient socket, so the guard's socket
  scoping survives a macOS $TMUX_TMPDIR divergence; an explicit -S path
  is used as-is.
- Lift the others=True guards (guard_kill_other_panes / _windows) into
  _resolve so the curated tools and guard_destructive_op share them; the
  per-op op_kill_pane/op_kill_window now refuse killing the caller's
  sibling pane/window too.
- Cover with offline (conservative_socket) and live (curated + per-op
  others=True) regression tests.
why: An adversarial review found the needle-free monitor could falsely
report a command "settled" and could raise or mislead when the watched
pane died -- the exact cases the design targets.

what:
- Make AsyncControlModeEngine.subscribe() a true broadcast: each
  subscriber gets its own queue, so wait_for_output, watch_events, and
  poll_events no longer steal each other's %output frames (which caused
  a premature settle with truncated text).
- Make wait_for_output fail-safe when the watched pane is gone:
  _read_done no longer raises or fabricates pane_dead=False on a blank
  or fallback-pane probe (pane_dead becomes Optional/unknown, keyed on
  #{pane_id}), and the settle snapshot capture is guarded; the result
  is preserved.
- Add a derived exit_code to MonitorResult.
- Supervise the pull ring drainer (aclosing + recorded error surfaced
  via poll_events) so a reader failure cannot silently freeze it.
- Regression test: two concurrent subscribers each see every event.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant