[orchestrator-v2] Fix shell cache persistence for fast startup#3640
[orchestrator-v2] Fix shell cache persistence for fast startup#3640mwolson wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8866ebd. Configure here.
ApprovabilityVerdict: Needs human review 2 blocking correctness issues found. This PR modifies cache persistence timing and adds concurrent state handling. Unresolved review comments identify potential blocking/hanging bugs in the finalizer loop and disconnect handler that should be addressed before merging. You can customize Macroscope's approvability policy. Learn more. |
8866ebd to
191408f
Compare
191408f to
d6a0e95
Compare
Dismissing prior approval to re-evaluate d6a0e95
d6a0e95 to
275c738
Compare
275c738 to
a9a74af
Compare
| yield* persist(snapshot.value); | ||
| }); | ||
|
|
||
| const setDisconnected = flushLiveShellSnapshot.pipe( |
There was a problem hiding this comment.
🟡 Medium state/shell.ts:116
setDisconnected now calls flushLiveShellSnapshot — which synchronously runs cache.saveShell — before updating the SubscriptionRef status. Because setDisconnected runs inside Stream.runForEach on supervisor.state, a slow or hanging persistence backend blocks the stream, so the shell state stays stuck at "live" and stops processing later connection transitions until disk I/O completes. Consider making setDisconnected update the status first and flush the snapshot in the background, or fork the flush so connection transitions are not gated on persistence.
🤖 Copy this AI Prompt to have your agent fix this:
In file @packages/client-runtime/src/state/shell.ts around line 116:
`setDisconnected` now calls `flushLiveShellSnapshot` — which synchronously runs `cache.saveShell` — before updating the `SubscriptionRef` status. Because `setDisconnected` runs inside `Stream.runForEach` on `supervisor.state`, a slow or hanging persistence backend blocks the stream, so the shell state stays stuck at `"live"` and stops processing later connection transitions until disk I/O completes. Consider making `setDisconnected` update the status first and flush the snapshot in the background, or fork the flush so connection transitions are not gated on persistence.
a9a74af to
fcce425
Compare
Ignore Grok ACP JSON-RPC responses with non-numeric ids before they reach Effect RpcClient so agent-internal messages such as skills-reload do not break provider startup. Persist and round-trip the latest live shell snapshot through IndexedDB so project and thread shells can hydrate from cache before live synchronization catches up.
fcce425 to
daa9283
Compare
| Effect.forkScoped, | ||
| ); | ||
|
|
||
| yield* Effect.addFinalizer(() => flushLiveShellSnapshot); |
There was a problem hiding this comment.
🟡 Medium state/shell.ts:201
The scope finalizer calls flushLiveShellSnapshot, which runs persist's while (true) loop until latestLiveSnapshot stops advancing. But the finalizer runs before the forked subscription fiber is interrupted, so applyItem can keep setting latestLiveSnapshot (lines 171-177) with newer sequences while cache.saveShell is still in progress. Each iteration sees a newer snapshot, so the loop's exit condition (latestSnapshot.value.snapshotSequence <= nextSnapshot.snapshotSequence) never holds, and the finalizer hangs indefinitely — blocking environment teardown and app shutdown. Consider stopping the subscription stream before flushing, or having flushLiveShellSnapshot read and clear latestLiveSnapshot atomically so concurrent applyItem calls can't keep the loop alive.
🤖 Copy this AI Prompt to have your agent fix this:
In file @packages/client-runtime/src/state/shell.ts around line 201:
The scope finalizer calls `flushLiveShellSnapshot`, which runs `persist`'s `while (true)` loop until `latestLiveSnapshot` stops advancing. But the finalizer runs before the forked subscription fiber is interrupted, so `applyItem` can keep setting `latestLiveSnapshot` (lines 171-177) with newer sequences while `cache.saveShell` is still in progress. Each iteration sees a newer snapshot, so the loop's exit condition (`latestSnapshot.value.snapshotSequence <= nextSnapshot.snapshotSequence`) never holds, and the finalizer hangs indefinitely — blocking environment teardown and app shutdown. Consider stopping the subscription stream before flushing, or having `flushLiveShellSnapshot` read and clear `latestLiveSnapshot` atomically so concurrent `applyItem` calls can't keep the loop alive.

Summary
skills-reloadfrom breaking provider startup.subscribeShellsynchronization is still starting.Startup Impact Observed
Local AppImage validation showed three separate startup-cache blockers:
61b9d3bf-...with 17 projects, 5 active threads, 1 archived thread, and snapshot sequence2195612.latestVisibleMessage.updatedAtwas stored as an ISO string while the cache envelope decoded it asDateTime.Utc. This branch now fixes that JSON boundary and adds a regression test that round-trips the cached JSON shape.With these fixes in place, a local AppImage restart on July 2, 2026 showed fast startup spans:
desktop.startupcompleted in about 327ms,server.startupin about 350ms, orchestration v2 projection verification in about 238ms, and orchestration v2 recovery in about 82ms.Problem and Fix
skills-reloadwith non-numeric ids. Those responses are not valid Effect RPC responses and could fail BigInt id decoding during provider startup.What This Does Not Change
subscribeShellremains authoritative after startup hydration.Validation
vp test packages/effect-acp/src/protocol.test.ts packages/client-runtime/src/state/shell-sync.test.ts packages/client-runtime/src/platform/orchestrationCache.test.tspassedvp checkpassedvp run typecheckpasseddesktop.startupat about 327ms andserver.startupat about 350msNote
Fix shell cache persistence to guarantee latest snapshot is saved on disconnect
OrchestrationV2ShellSnapshotJson,OrchestrationV2ThreadProjectionJson, etc.) in orchestrationV2.ts so shell and thread snapshots serialize date fields as ISO strings.StoredOrchestrationShellSnapshotandStoredOrchestrationThreadSnapshotin orchestrationCache.ts to encode/decode against these JSON-mapped schemas.Refand loop until the newest snapshot is persisted, preventing stale saves when updates arrive faster than the debounce.Macroscope summarized daa9283.