Skip to content

Load pi extensions in the cua CLI with /reload and trust gating#41

Open
rgarcia wants to merge 7 commits into
mainfrom
hypeship/harness-extension-host
Open

Load pi extensions in the cua CLI with /reload and trust gating#41
rgarcia wants to merge 7 commits into
mainfrom
hypeship/harness-extension-host

Conversation

@rgarcia

@rgarcia rgarcia commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Brings pi extensions to cua end to end: a HarnessExtensionHost that loads pi extensions against cua's lower-level CuaAgentHarness (which pi's AgentSession-based extension system doesn't bind to), and the CLI runtime wiring so running cua actually discovers, loads, and hot-reloads them. This is the substrate for self-improving computer-use agents — an agent can author a learned tool as an extension and /reload it into the next run.

What's in the PR

1. The host (packages/cli/src/extensions/{host,seams,bridge}.ts)
Reuses pi's host-agnostic extension loader + runner: discovers extensions (jiti, no build step), registers their tools on the harness, bridges harness events into the runner's extension-event emitters, and mirrors AgentSession.reload for hot-swap. Tier A scope: tools, events, model/thinking/active-tool control, session-entry writes, headless (no-op UI). Deferred (stubbed): ctx.ui.*, slash/flag/renderer registration, the session-replacement family.

Correctness details handled: extension tools survive a setModel (which rebuilds the harness tool list) yet honor an explicit setActiveTools opt-out; reload drops tools from removed/renamed extensions; reload/dispose can't race into a double-teardown; sendUserMessage attaches the first-turn screenshot like the CLI's own prompt sites.

2. CLI runtime wiring (cli-harness.ts, cli.ts, tui/main.ts, tui/slash-commands.ts, extensions/setup.ts)

  • The host is constructed + loaded in setupHarnessRuntime via a browser-free helper (loadHarnessExtensions) and disposed before the browser handle closes on all three run paths (print / interactive / action). A throwing extension load closes the handle before rethrowing, so it can't leak the browser session.
  • A /reload TUI command hot-swaps edited extensions, surfaces loadErrors, and appears in autocomplete.
  • initialScreenshot is wired from the browser handle, reusing the existing captureScreenshot.

3. Loading
Extensions load on every run from <cwd>/.agents/extensions, the implicit <cwd>/.pi/extensions scan, and the global pi agent dir (~/.pi/agent/extensions). --no-extensions disables loading entirely. (cua already executes agent-authored code — bash, file edits, the browser — so project-local extensions load without a separate trust gate.)

Testing

  • Unit/integration — host load/reload/dispose, tool registration + active-state across setModel, stale-tool removal on reload; the CLI load path (loadHarnessExtensions + a buildTestHarness fixture + temp dirs, no browser) including that both <cwd>/.pi/extensions and <cwd>/.agents/extensions load by default and --no-extensions returns no host; /reload TUI glue and parsing.
  • End-to-end self-improve loop (test/e2e/) — five scenarios drive the host through: inefficient first run → meta-agent authors a learned tool → host.reload() → second run calls it in one step. Covers template-match click, DOM table extraction, form-fill macro, nav shortcut, and a pagination de-dup extractor (which also proves agent_start handlers re-bind after reload).
  • npx tsc -b exits 0; cd packages/cli && npx vitest --run58 passed | 5 skipped (the 5 skipped are pre-existing ptywright-dependent TUI fixtures).

Test plan

  • npx tsc -b exits 0
  • cd packages/cli && npx vitest --run green (58 passed | 5 skipped)
  • Manual: a global extension under ~/.pi/agent/extensions exposes its tool in cua; a project .agents/extensions or .pi/extensions extension loads on run; editing one + /reload hot-swaps; --no-extensions disables

🤖 Generated with Claude Code


Note

Medium Risk
Extensions execute arbitrary project/global code and register tools on the agent harness, increasing attack surface and runtime complexity around tool lists and cleanup, though failures are partially isolated with dispose/close ordering.

Overview
Adds pi extension support to the cua CLI via a new HarnessExtensionHost that discovers extensions with pi’s loader/runner, registers their tools on CuaAgentHarness, bridges harness events into extension handlers (including reducers for context, provider payload, and tool call/result), and re-applies extension tools after setModel so they are not dropped when the harness rebuilds its tool list.

Runtime wiring: setupHarnessRuntime loads extensions through loadHarnessExtensions (project .agents/extensions, implicit .pi/extensions, global ~/.pi/agent/extensions), wires first-turn screenshots for extension-initiated prompts, disposes the host before closing the browser on print/interactive/action paths, and closes the browser if extension load throws. New --no-extensions flag disables loading; help/docs bump the recommended Anthropic model to claude-opus-4-8.

TUI: /reload re-discovers extensions from disk, surfaces loadErrors, and is included in slash-command autocomplete.

Tests: Unit/integration coverage for load paths, reload hot-swap, host lifecycle, and five e2e “self-improve” scenarios (learned tools after reload).

Reviewed by Cursor Bugbot for commit 2010817. Bugbot is set up for automated code reviews on this repo. Configure here.

Load pi extensions (arbitrary TS, default-exported factory) against cua's
lower-level AgentHarness, which pi's AgentSession-based extension system does
not bind to. Reuses pi's host-agnostic loader and runner: binds the runner's
action seams to the harness, bridges harness events into the runner's
extension-event emitters, registers extension tools, and mirrors
AgentSession.reload for hot-reload.

Tier A scope: tools, events, model/thinking/active-tool control, and
session-entry writes, headless (no-op UI). Slash commands, flags, ctx.ui.*,
and the session-replacement family are deferred (stubbed to throw).

Re-applies the extension-tool union on model_update because
CuaAgentHarness.setModel rebuilds tools from construction-time extraTools and
would otherwise drop runtime-registered ones.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rgarcia rgarcia marked this pull request as ready for review June 27, 2026 11:17

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 4 issues found in the latest run.

  • ✅ Fixed: sendUserMessage lacks screenshot
    • sendUserMessage now routes through a host prompt path that attaches an initial screenshot on first-turn sessions via { images } before calling harness.prompt.
  • ✅ Fixed: Reapply reactivates extension tools
    • reapplyTools now preserves the existing active-tool set and only auto-activates newly introduced extension tools instead of force-enabling all extension tools.
  • ✅ Fixed: Reload races async dispose
    • Shutdown requests raised during reload are now latched and handled inside reload before rebuilding, preventing async dispose from tearing down a freshly rebuilt runner/bridge.
  • ✅ Fixed: Skipped reapply while applying tools
    • Nested reapplyTools calls now set a queue flag and trigger a follow-up pass after the current setTools call completes, so refresh requests are not dropped.

Create PR

Or push these changes by commenting:

@cursor push ee06abc4e1
Preview (ee06abc4e1)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -1,4 +1,5 @@
 import type { AgentHarness, AgentTool, Session } from "@onkernel/cua-agent";
+import type { ImageContent } from "@onkernel/cua-ai";
 import {
 	AuthStorage,
 	discoverAndLoadExtensions,
@@ -23,6 +24,8 @@
 	harness: AgentHarness;
 	/** The same `Session` the harness was constructed with; used for entry writes. */
 	session: Session;
+	/** Capture a screenshot attachment for first-turn user messages. */
+	initialScreenshot: () => Promise<ImageContent[] | undefined>;
 	cwd: string;
 	/** Extension paths passed straight to `discoverAndLoadExtensions`. */
 	configuredPaths: string[];
@@ -48,6 +51,7 @@
 export class HarnessExtensionHost {
 	private readonly harness: AgentHarness;
 	private readonly session: Session;
+	private readonly initialScreenshot: () => Promise<ImageContent[] | undefined>;
 	private readonly cwd: string;
 	private readonly configuredPaths: string[];
 	private readonly agentDir?: string;
@@ -66,6 +70,12 @@
 	private extensionTools: AgentTool[] = [];
 	/** Guards `harness.setTools` so a tools_update never re-enters reapply. */
 	private applyingTools = false;
+	/** Follow-up pass requested while `harness.setTools` is in flight. */
+	private reapplyQueued = false;
+	/** Marks reload critical sections where shutdown requests must not race. */
+	private reloading = false;
+	/** Sticky shutdown request raised by `ctx.shutdown()` or owner disposal. */
+	private shutdownRequested = false;
 	/** Guards `dispose` so `ctx.shutdown()` and an owner call don't double-tear-down. */
 	private disposed = false;
 	private sessionName: string | undefined;
@@ -76,6 +86,7 @@
 	constructor(options: HarnessExtensionHostOptions) {
 		this.harness = options.harness;
 		this.session = options.session;
+		this.initialScreenshot = options.initialScreenshot;
 		this.cwd = options.cwd;
 		this.configuredPaths = options.configuredPaths;
 		this.agentDir = options.agentDir;
@@ -84,6 +95,7 @@
 
 		this.actions = makeExtensionActions(this.harness, this.session, {
 			refreshTools: () => void this.reapplyTools(),
+			sendUserMessage: (text) => this.promptUserMessage(text),
 			getSessionName: () => this.sessionName,
 			setSessionName: (name) => {
 				this.sessionName = name;
@@ -112,17 +124,26 @@
 	 * the loader imports each extension fresh from disk.
 	 */
 	async reload(): Promise<void> {
+		if (this.disposed) return;
 		const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
-		await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
-		this.teardownBridge?.();
-		this.teardownBridge = undefined;
+		this.reloading = true;
+		try {
+			await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
+			if (await this.disposeIfShutdownRequested()) return;
+			this.teardownBridge?.();
+			this.teardownBridge = undefined;
 
-		await this.buildRunner();
-		for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
+			await this.buildRunner();
+			if (await this.disposeIfShutdownRequested()) return;
+			for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
 
-		await this.reapplyTools();
-		this.installBridge();
-		await this.runner?.emit({ type: "session_start", reason: "reload" });
+			await this.reapplyTools();
+			if (await this.disposeIfShutdownRequested()) return;
+			this.installBridge();
+			await this.runner?.emit({ type: "session_start", reason: "reload" });
+		} finally {
+			this.reloading = false;
+		}
 	}
 
 	/**
@@ -135,6 +156,7 @@
 	 */
 	async dispose(): Promise<void> {
 		if (this.disposed) return;
+		this.shutdownRequested = true;
 		this.disposed = true;
 		this.teardownBridge?.();
 		this.teardownBridge = undefined;
@@ -159,28 +181,45 @@
 
 	/**
 	 * Rebuild the extension-tool union and apply it to the harness as the
-	 * authoritative tool list. Extension tools are de-duped by name (the harness
-	 * rejects duplicates) and kept active alongside the base tools. The
-	 * re-entrancy guard makes a stray `tools_update` subscriber safe; reapply is
-	 * only triggered from `load`/`reload`/`model_update`/`refreshTools`, none of
-	 * which run concurrently.
+	 * authoritative tool list. Existing active-tool choices are preserved for
+	 * both base and extension tools, while newly introduced extension tools start
+	 * active by default. A queued follow-up pass handles refresh requests that
+	 * arrive while `harness.setTools` is still in flight.
 	 */
 	private async reapplyTools(): Promise<void> {
-		if (!this.runner || this.applyingTools) return;
-		this.extensionTools = wrapRegisteredTools(this.runner.getAllRegisteredTools(), this.runner);
-		const extensionNames = new Set(this.extensionTools.map((tool) => tool.name));
-		const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
-		const final = [...base, ...this.extensionTools];
-		const activeNames = [
-			...this.harness.getActiveTools().map((tool) => tool.name),
-			...extensionNames,
-		];
-		this.applyingTools = true;
-		try {
-			await this.harness.setTools(final, [...new Set(activeNames)]);
-		} finally {
-			this.applyingTools = false;
+		if (!this.runner) return;
+		if (this.applyingTools) {
+			this.reapplyQueued = true;
+			return;
 		}
+		do {
+			this.reapplyQueued = false;
+			if (!this.runner) return;
+
+			const previousExtensionNames = new Set(this.extensionTools.map((tool) => tool.name));
+			const nextExtensionTools = wrapRegisteredTools(this.runner.getAllRegisteredTools(), this.runner);
+			const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
+			const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
+			const final = [...base, ...nextExtensionTools];
+			const finalNames = new Set(final.map((tool) => tool.name));
+			const activeNames = new Set(
+				this.harness
+					.getActiveTools()
+					.map((tool) => tool.name)
+					.filter((name) => finalNames.has(name)),
+			);
+			for (const name of extensionNames) {
+				if (!previousExtensionNames.has(name)) activeNames.add(name);
+			}
+
+			this.extensionTools = nextExtensionTools;
+			this.applyingTools = true;
+			try {
+				await this.harness.setTools(final, [...activeNames]);
+			} finally {
+				this.applyingTools = false;
+			}
+		} while (this.reapplyQueued);
 	}
 
 	private installBridge(): void {
@@ -191,6 +230,35 @@
 	}
 
 	private requestShutdown(): void {
+		this.shutdownRequested = true;
+		if (this.reloading) return;
 		void this.dispose();
 	}
+
+	private async promptUserMessage(text: string): Promise<void> {
+		const images = await this.maybeInitialScreenshot();
+		await this.harness.prompt(text, images ? { images } : undefined);
+	}
+
+	private async maybeInitialScreenshot(): Promise<ImageContent[] | undefined> {
+		const hasPriorTurn = await sessionHasPriorTurn(this.session);
+		if (hasPriorTurn) return undefined;
+		return this.initialScreenshot();
+	}
+
+	private async disposeIfShutdownRequested(): Promise<boolean> {
+		if (!this.shutdownRequested && !this.disposed) return false;
+		await this.dispose();
+		return true;
+	}
 }
+
+async function sessionHasPriorTurn(session: Session): Promise<boolean> {
+	const entries = await session.getBranch();
+	for (const entry of entries) {
+		if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
+			return true;
+		}
+	}
+	return false;
+}

diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -19,6 +19,8 @@
 export interface SeamHooks {
 	/** Re-apply the authoritative base+extension tool union to the harness. */
 	refreshTools: () => void;
+	/** Forward user text through the host's first-turn image attachment path. */
+	sendUserMessage: (text: string) => Promise<void>;
 	/** Synchronous mirror of the session name (kept because the action getter is sync). */
 	getSessionName: () => string | undefined;
 	/** Record the latest session name set through the action surface. */
@@ -41,7 +43,7 @@
 		},
 		sendUserMessage(content): void {
 			const text = typeof content === "string" ? content : textPartsOf(content);
-			void harness.prompt(text);
+			void hooks.sendUserMessage(text);
 		},
 		appendEntry(customType, data): void {
 			void session.appendCustomEntry(customType, data);

diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -36,6 +36,7 @@
 	const created = new HarnessExtensionHost({
 		harness: fx.harness,
 		session: fx.session,
+		initialScreenshot: async () => undefined,
 		cwd: fx.cwd,
 		configuredPaths: [makeExtensionDir()],
 		agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),

You can send follow-ups to the cloud agent here.

Comment thread packages/cli/src/extensions/seams.ts Outdated
Comment thread packages/cli/src/extensions/host.ts Outdated
Comment thread packages/cli/src/extensions/host.ts
Comment thread packages/cli/src/extensions/host.ts Outdated
- reapplyTools: preserve base active state and re-activate extension tools
  unless explicitly deactivated through the host's setActiveTools seam (tracked
  in inactiveExtensionTools). Keeps extension tools active across a setModel —
  which rebuilds the harness tool list and drops them — while honoring an
  opt-out, instead of unconditionally re-enabling every extension tool.
- reapplyTools: coalesce a reapply requested while setTools is in flight into a
  follow-up pass rather than dropping it.
- reload: latch a ctx.shutdown() raised during the reload critical section and
  honor it at await boundaries, so an unawaited dispose can't tear down the
  freshly rebuilt runner and bridge.
- sendUserMessage: route through the host and attach the first-turn screenshot
  via an optional initialScreenshot callback, matching the CLI's prompt sites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 4 issues found in the latest run.

  • ✅ Fixed: Load after dispose breaks lifecycle
    • load() now returns immediately when the host is disposed, preventing runner/bridge reconstruction after terminal shutdown.
  • ✅ Fixed: Repeated load stacks bridge listeners
    • load() now no-ops once a runner already exists, so duplicate bridge installations and stacked harness listeners cannot occur.
  • ✅ Fixed: Nested reload double teardown
    • reload() now exits early when reloading is already true, preventing reentrant reload passes from tearing down each other’s newly built state.
  • ✅ Fixed: Reload skips idle wait
    • reload() now awaits harness.waitForIdle() before shutdown/bridge teardown, so in-flight runs finish before listeners are detached.

Create PR

Or push these changes by commenting:

@cursor push a135464efc
Preview (a135464efc)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -118,6 +118,7 @@
 	}
 
 	async load(): Promise<void> {
+		if (this.disposed || this.runner) return;
 		await this.buildRunner();
 		await this.reapplyTools();
 		this.installBridge();
@@ -132,14 +133,16 @@
 	 * the loader imports each extension fresh from disk.
 	 */
 	async reload(): Promise<void> {
-		if (this.disposed) return;
-		const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
+		if (this.disposed || this.reloading) return;
 		// `reloading` defers any `ctx.shutdown()` raised by an extension's
 		// session_shutdown handler so an unawaited dispose can't tear down the
-		// runner/bridge mid-rebuild. Each await boundary then honors a pending
-		// request before continuing.
+		// runner/bridge mid-rebuild (including while waiting for the harness to go
+		// idle). Each await boundary then honors a pending request before continuing.
 		this.reloading = true;
 		try {
+			await this.harness.waitForIdle();
+			if (await this.disposeIfShutdownRequested()) return;
+			const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
 			await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
 			if (await this.disposeIfShutdownRequested()) return;
 			this.teardownBridge?.();

You can send follow-ups to the cloud agent here.

await this.reapplyTools();
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "startup" });
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load after dispose breaks lifecycle

High Severity

Calling load() after dispose() rebuilds the runner and bridge while disposed stays true, so dispose(), reload(), and shutdown via ctx.shutdown() become no-ops. The harness keeps forwarding events, but the host cannot be torn down or reloaded cleanly.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e574d60. Configure here.

this.teardownBridge = installBridge(this.harness, this.runner, this.bridgeState, () =>
this.reapplyTools(),
);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated load stacks bridge listeners

Medium Severity

A second load() on the same host calls installBridge() without tearing down the previous bridge or emitting session_shutdown on the old runner. Earlier harness subscribers stay registered, so events are forwarded to multiple runners and teardown only removes the latest bridge.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e574d60. Configure here.

await this.reapplyTools();
if (await this.disposeIfShutdownRequested()) return;
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "reload" });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested reload double teardown

Medium Severity

reload() has no reentrancy guard. If an extension triggers another reload() while the outer call is still awaiting session_shutdown, the inner run can finish (new runner, bridge, session_start) and then the outer call continues from its next lines, tearing down that bridge and rebuilding again. Extensions may see duplicate shutdown/start cycles and inconsistent runner state.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e574d60. Configure here.

await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
if (await this.disposeIfShutdownRequested()) return;
this.teardownBridge?.();
this.teardownBridge = undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reload skips idle wait

Medium Severity

HarnessExtensionHost.reload() tears down the event bridge immediately without awaiting harness.waitForIdle(), even though the command seam exposes idle waiting and the PR claims parity with AgentSession.reload. Reload during an in-flight agent run detaches extension listeners and reducers while the harness loop keeps running, so extensions miss bridged events for that run and bridgeState can stay wrong until the next agent_start/agent_end.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e574d60. Configure here.

Five end-to-end tests drive HarnessExtensionHost through the full
learn-a-tool loop: an inefficient first run (base computer-use steps), a
meta-agent-authored learned tool written to disk, host.reload() discovering
it, and a second run that calls the learned tool in one step.

Each models a real computer-use use case — DOM table extraction,
template-match click, parameterized form-fill macro, navigation shortcut,
and a pagination de-dup extractor that additionally proves an extension's
agent_start handler is re-bound after reload (result reports runs=1: the
handler re-fired, and the fresh-from-disk import reset the prior count).

Learned tools are pure JS over inputs that stand in for screenshots/DOM (the
fake harness has no browser), exercising load/reload, reapplyTools
registration+activation, and the event bridge end to end. No host changes
were needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 5 total unresolved issues (including 4 from previous reviews).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Reload keeps removed extension tools
    • reapplyTools now excludes both newly loaded and previously registered extension tool names from the base harness list, so removed extensions are dropped on reload, and a regression test covers this case.

Create PR

Or push these changes by commenting:

@cursor push 9153a337fb
Preview (9153a337fb)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -218,7 +218,12 @@
 				this.runner,
 			);
 			const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
-			const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
+			const previousExtensionNames = new Set(this.extensionTools.map((tool) => tool.name));
+			const base = this.harness
+				.getTools()
+				.filter(
+					(tool) => !extensionNames.has(tool.name) && !previousExtensionNames.has(tool.name),
+				);
 			const final = [...base, ...nextExtensionTools];
 			const finalNames = new Set(final.map((tool) => tool.name));
 			const activeNames = new Set(

diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -1,5 +1,5 @@
 import { afterEach, describe, expect, it } from "vitest";
-import { cpSync, mkdtempSync } from "node:fs";
+import { cpSync, mkdtempSync, rmSync } from "node:fs";
 import { tmpdir } from "node:os";
 import { dirname, join } from "node:path";
 import { fileURLToPath } from "node:url";
@@ -104,4 +104,29 @@
 
 		expect(fx!.harness.getTools().map((tool) => tool.name)).toContain("click_visual");
 	});
+
+	it("drops removed extension tools after reload", async () => {
+		fx = await buildTestHarness({
+			turns: [
+				{ steps: [{ type: "tool_call", toolName: "click_visual", args: { description: "the button" } }] },
+				{ steps: [{ type: "text", text: "done" }] },
+			],
+		});
+		const extDir = makeExtensionDir();
+		const created = new HarnessExtensionHost({
+			harness: fx.harness,
+			session: fx.session,
+			cwd: fx.cwd,
+			configuredPaths: [extDir],
+			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+		});
+		await created.load();
+		host = created;
+		expect(fx.harness.getTools().map((tool) => tool.name)).toContain("click_visual");
+
+		rmSync(join(extDir, "click-visual.ts"));
+		await created.reload();
+
+		expect(fx.harness.getTools().map((tool) => tool.name)).not.toContain("click_visual");
+	});
 });

You can send follow-ups to the cloud agent here.

Comment thread packages/cli/src/extensions/host.ts
reapplyTools built the base tool list by filtering the live harness tools
only against the newly loaded extension set, so a tool registered by a prior
generation that a reload removed or renamed lingered on the harness, bound to
the dead runner generation. Exclude prior-generation extension tool names
from base as well, and cover it with a reload test that renames a tool.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.

There are 6 total unresolved issues (including 4 from previous reviews).

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Base tools lost after name clash
    • reapplyTools() now rebuilds base tools from CuaAgentHarness.getRuntimeTools() when available so a removed shadowing extension no longer deletes the underlying built-in tool.
  • ✅ Fixed: Load succeeds after startup shutdown
    • load() now checks for a latched shutdown after startup session_start, disposes, and throws so startup does not report success after extension-triggered shutdown.

Create PR

Or push these changes by commenting:

@cursor push 83fff7ee80
Preview (83fff7ee80)
diff --git a/packages/agent/src/agent.ts b/packages/agent/src/agent.ts
--- a/packages/agent/src/agent.ts
+++ b/packages/agent/src/agent.ts
@@ -372,6 +372,10 @@
 		});
 	}
 
+	getRuntimeTools(): AgentTool[] {
+		return this.runtime.tools();
+	}
+
 	/**
 	 * Mirror pi `AgentHarness.setModel()` while accepting CUA model refs.
 	 *

diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -20,6 +20,10 @@
 	makeExtensionContextActions,
 } from "./seams";
 
+type RuntimeToolAwareHarness = AgentHarness & {
+	getRuntimeTools?: () => AgentTool[];
+};
+
 export interface HarnessExtensionHostOptions {
 	harness: AgentHarness;
 	/** The same `Session` the harness was constructed with; used for entry writes. */
@@ -122,6 +126,9 @@
 		await this.reapplyTools();
 		this.installBridge();
 		await this.runner?.emit({ type: "session_start", reason: "startup" });
+		if (await this.disposeIfShutdownRequested()) {
+			throw new Error("HarnessExtensionHost shut down during startup");
+		}
 	}
 
 	/**
@@ -224,9 +231,12 @@
 				this.runner,
 			);
 			const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
-			const base = this.harness
-				.getTools()
-				.filter((tool) => !extensionNames.has(tool.name) && !priorExtensionNames.has(tool.name));
+			const { tools: runtimeBaseTools, authoritative } = this.getCurrentBaseTools();
+			const base = runtimeBaseTools.filter((tool) => {
+				if (extensionNames.has(tool.name)) return false;
+				if (authoritative) return true;
+				return !priorExtensionNames.has(tool.name);
+			});
 			const final = [...base, ...nextExtensionTools];
 			const finalNames = new Set(final.map((tool) => tool.name));
 			const activeNames = new Set(
@@ -248,6 +258,12 @@
 		} while (this.reapplyQueued);
 	}
 
+	private getCurrentBaseTools(): { tools: AgentTool[]; authoritative: boolean } {
+		const runtimeTools = (this.harness as RuntimeToolAwareHarness).getRuntimeTools?.();
+		if (runtimeTools) return { tools: runtimeTools, authoritative: true };
+		return { tools: this.harness.getTools(), authoritative: false };
+	}
+
 	/**
 	 * Apply an extension-requested active-tool set, recording which extension
 	 * tools were turned off so `reapplyTools` won't silently re-enable them.

diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -132,6 +132,47 @@
 		expect(toolNames).toContain("beta_tool");
 		expect(toolNames).not.toContain("alpha_tool");
 	});
+
+	it("restores colliding base tools when an extension stops registering them", async () => {
+		const extDir = mkdtempSync(join(tmpdir(), "cua-ext-"));
+		const extFile = join(extDir, "shadow.ts");
+		fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+		const collidingToolName = fx.harness.getTools()[0]?.name;
+		expect(collidingToolName).toBeDefined();
+		writeFileSync(extFile, makeToolExtension(collidingToolName!));
+
+		const created = new HarnessExtensionHost({
+			harness: fx.harness,
+			session: fx.session,
+			cwd: fx.cwd,
+			configuredPaths: [extDir],
+			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+		});
+		host = created;
+		await created.load();
+
+		writeFileSync(extFile, makeNoopExtension());
+		await created.reload();
+
+		expect(fx.harness.getTools().map((tool) => tool.name)).toContain(collidingToolName);
+	});
+
+	it("fails startup when an extension requests shutdown during session_start", async () => {
+		const extDir = mkdtempSync(join(tmpdir(), "cua-ext-"));
+		const extFile = join(extDir, "shutdown.ts");
+		writeFileSync(extFile, makeShutdownOnStartupExtension());
+		fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+		const created = new HarnessExtensionHost({
+			harness: fx.harness,
+			session: fx.session,
+			cwd: fx.cwd,
+			configuredPaths: [extDir],
+			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+		});
+		host = created;
+
+		await expect(created.load()).rejects.toThrow("HarnessExtensionHost shut down during startup");
+	});
 });
 
 /** A minimal, import-free extension that registers a single named tool. */
@@ -149,3 +190,16 @@
 		"",
 	].join("\n");
 }
+
+function makeNoopExtension(): string {
+	return ["export default function () {}", ""].join("\n");
+}
+
+function makeShutdownOnStartupExtension(): string {
+	return [
+		"export default function (pi) {",
+		'  pi.on("session_start", (_event, ctx) => ctx.shutdown());',
+		"}",
+		"",
+	].join("\n");
+}

You can send follow-ups to the cloud agent here.

const base = this.harness
.getTools()
.filter((tool) => !extensionNames.has(tool.name) && !priorExtensionNames.has(tool.name));
const final = [...base, ...nextExtensionTools];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base tools lost after name clash

High Severity

reapplyTools() rebuilds the tool list by filtering harness.getTools() and dropping names in priorExtensionNames, without re-merging CUA runtime base tools. If an extension registered a tool whose name matches a built-in tool, then reload removes that extension, the built-in tool can disappear from the harness until something like setModel rebuilds from the runtime.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7df3659. Configure here.

await this.reapplyTools();
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "startup" });
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load succeeds after startup shutdown

Medium Severity

load() awaits session_start but never checks shutdownRequested or disposed afterward. If an extension calls ctx.shutdown() during that emit, requestShutdown() runs immediate dispose() while reloading is false, yet load() still resolves successfully and leaves a disposed host that looks initialized.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7df3659. Configure here.

Construct and load HarnessExtensionHost in setupHarnessRuntime via a new
browser-free helper (extensions/setup.ts), carry it on HarnessRuntime, and
dispose it before closing the browser handle on all three run paths
(print/interactive/action). Add a /reload TUI command that hot-swaps edited
extensions and surfaces load errors, wire it into autocomplete, and pass the
first-turn screenshot from the browser handle.

Gate project-local extensions behind trust: global <agentDir>/extensions load
on every run, but the implicit <cwd>/.pi/extensions scan and the explicit
<cwd>/.agents/extensions dir only load when the project is trusted (persisted
pi trust or --trust-extensions); --no-extensions disables loading. Project
extensions execute arbitrary TypeScript in-process, so they are never
auto-run by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rgarcia rgarcia changed the title Add Tier-A pi extension host for CuaAgentHarness Load pi extensions in the cua CLI with /reload and trust gating Jun 27, 2026
cua already runs agent-authored code (bash, file edits, the browser), so
gating project-local extensions behind trust was inconsistent and blocked the
self-improve loop — an agent writes a learned tool into the project extension
dir and it should load on the next run. Remove the projectExtensionsTrusted
host option and the --trust-extensions flag: <cwd>/.agents/extensions, the
implicit <cwd>/.pi/extensions scan, and global ~/.pi/agent/extensions all load
on every run. --no-extensions still disables loading entirely.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.

There are 10 total unresolved issues (including 6 from previous reviews).

Autofix Details

Bugbot Autofix prepared fixes for all 4 issues found in the latest run.

  • ✅ Fixed: Project extensions skip trust gating
    • Project-local extension loading is now trust-gated via persisted project trust or explicit --trust-extensions, while global agent-dir extensions still load.
  • ✅ Fixed: Reload drops bridge on failure
    • Reload now rebuilds and reapplies tools before bridge teardown and restores prior host state on errors so the bridge remains installed.
  • ✅ Fixed: /reload runs during active agent
    • /reload now waits for harness.waitForIdle() before calling host.reload() to prevent mid-turn bridge or runner swaps.
  • ✅ Fixed: Aborted reload reports success
    • Reload command feedback now checks host disposal state and reports shutdown instead of showing a false 'extensions reloaded' success notice.

Create PR

Or push these changes by commenting:

@cursor push 4fd32a09e4
Preview (4fd32a09e4)
diff --git a/packages/cli/src/cli-harness.ts b/packages/cli/src/cli-harness.ts
--- a/packages/cli/src/cli-harness.ts
+++ b/packages/cli/src/cli-harness.ts
@@ -177,6 +177,7 @@
 	noSession: boolean;
 	noSkills: boolean;
 	noExtensions: boolean;
+	trustExtensions: boolean;
 	debugTui: boolean;
 	jsonlIncludeDeltas: boolean;
 	jsonlIncludeImages: boolean;
@@ -441,6 +442,7 @@
 			session,
 			cwd,
 			noExtensions: flags.noExtensions,
+			trustExtensions: flags.trustExtensions,
 			initialScreenshot,
 		});
 	} catch (err) {

diff --git a/packages/cli/src/cli.ts b/packages/cli/src/cli.ts
--- a/packages/cli/src/cli.ts
+++ b/packages/cli/src/cli.ts
@@ -68,6 +68,8 @@
       --no-extensions            Disable pi extensions, which otherwise load from
                                  <cwd>/.agents/extensions, <cwd>/.pi/extensions,
                                  and the pi agent dir (~/.pi/agent/extensions/)
+      --trust-extensions         Trust project-local extension directories for this
+                                 run (<cwd>/.agents/extensions and <cwd>/.pi/extensions)
       --debug-tui                Enable TUI render diagnostics for manual repros
   -v, --verbose                  Verbose progress output to stderr
   -h, --help                     Show this help
@@ -101,6 +103,7 @@
 	noSession: boolean;
 	noSkills: boolean;
 	noExtensions: boolean;
+	trustExtensions: boolean;
 	debugTui: boolean;
 	jsonlIncludeDeltas: boolean;
 	jsonlIncludeImages: boolean;
@@ -150,6 +153,7 @@
 				skill: { type: "string", multiple: true, default: [] },
 				"no-skills": { type: "boolean", default: false },
 				"no-extensions": { type: "boolean", default: false },
+				"trust-extensions": { type: "boolean", default: false },
 				"debug-tui": { type: "boolean", default: false },
 				output: { type: "string", short: "o" },
 				"jsonl-include-deltas": { type: "boolean", default: false },
@@ -187,6 +191,7 @@
 		noSession: !!parsed.values["no-session"],
 		noSkills: !!parsed.values["no-skills"],
 		noExtensions: !!parsed.values["no-extensions"],
+		trustExtensions: !!parsed.values["trust-extensions"],
 		debugTui: !!parsed.values["debug-tui"],
 		model: parsed.values.model as string | undefined,
 		thinking: parsed.values.thinking as string | undefined,
@@ -216,6 +221,7 @@
 		noSession: flags.noSession,
 		noSkills: flags.noSkills,
 		noExtensions: flags.noExtensions,
+		trustExtensions: flags.trustExtensions,
 		debugTui: flags.debugTui,
 		jsonlIncludeDeltas: flags.jsonlIncludeDeltas,
 		jsonlIncludeImages: flags.jsonlIncludeImages,

diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -2,9 +2,12 @@
 import type { ImageContent } from "@onkernel/cua-ai";
 import {
 	AuthStorage,
+	DefaultResourceLoader,
 	discoverAndLoadExtensions,
 	ExtensionRunner,
+	getAgentDir,
 	ModelRegistry,
+	SettingsManager,
 	SessionManager,
 	wrapRegisteredTools,
 } from "@earendil-works/pi-coding-agent";
@@ -13,6 +16,7 @@
 	ExtensionCommandContextActions,
 	ExtensionContextActions,
 } from "@earendil-works/pi-coding-agent";
+import { isAbsolute, relative, resolve } from "node:path";
 import { installBridge, type BridgeState } from "./bridge";
 import {
 	makeExtensionActions,
@@ -27,6 +31,8 @@
 	cwd: string;
 	/** Extension paths passed straight to `discoverAndLoadExtensions`. */
 	configuredPaths: string[];
+	/** Whether project-local extension sources under `cwd` may be executed. */
+	projectTrusted: boolean;
 	/** Agent config dir searched for `extensions/`. Pass a temp dir to isolate from `~/.agents`. */
 	agentDir?: string;
 	/**
@@ -58,6 +64,7 @@
 	private readonly session: Session;
 	private readonly cwd: string;
 	private readonly configuredPaths: string[];
+	private readonly projectTrusted: boolean;
 	private readonly agentDir?: string;
 	private readonly initialScreenshot?: () => Promise<ImageContent[] | undefined>;
 	private readonly sessionManager: SessionManager;
@@ -95,6 +102,7 @@
 		this.session = options.session;
 		this.cwd = options.cwd;
 		this.configuredPaths = options.configuredPaths;
+		this.projectTrusted = options.projectTrusted;
 		this.agentDir = options.agentDir;
 		this.initialScreenshot = options.initialScreenshot;
 		this.sessionManager = SessionManager.inMemory(this.cwd);
@@ -111,6 +119,7 @@
 		});
 		this.contextActions = makeExtensionContextActions(this.harness, {
 			isIdle: () => this.bridgeState.isIdle,
+			isProjectTrusted: () => this.projectTrusted,
 			getSignal: () => undefined,
 			shutdown: () => this.requestShutdown(),
 		});
@@ -118,21 +127,30 @@
 	}
 
 	async load(): Promise<void> {
-		await this.buildRunner();
+		const { runner, loadErrors } = await this.buildRunner();
+		this.runner = runner;
+		this.loadErrors = loadErrors;
 		await this.reapplyTools();
 		this.installBridge();
 		await this.runner?.emit({ type: "session_start", reason: "startup" });
 	}
 
+	isDisposed(): boolean {
+		return this.disposed;
+	}
+
 	/**
-	 * Mirror `AgentSession.reload`: carry over flag values, tear down the old
-	 * runner's bridge, re-discover extensions from disk, build a fresh runner over
-	 * the same in-memory services, restore flags, rebind, re-apply tools, reinstall
-	 * the bridge, then emit `session_start`. No extension cache is cleared because
-	 * the loader imports each extension fresh from disk.
+	 * Mirror `AgentSession.reload`: carry over flag values, re-discover
+	 * extensions from disk, build a fresh runner over the same in-memory
+	 * services, restore flags, re-apply tools, swap bridges, then emit
+	 * `session_start`. No extension cache is cleared because the loader imports
+	 * each extension fresh from disk.
 	 */
 	async reload(): Promise<void> {
 		if (this.disposed) return;
+		const previousRunner = this.runner;
+		const previousLoadErrors = this.loadErrors;
+		const previousExtensionTools = this.extensionTools;
 		const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
 		// `reloading` defers any `ctx.shutdown()` raised by an extension's
 		// session_shutdown handler so an unawaited dispose can't tear down the
@@ -140,19 +158,33 @@
 		// request before continuing.
 		this.reloading = true;
 		try {
-			await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
+			await previousRunner?.emit({ type: "session_shutdown", reason: "reload" });
 			if (await this.disposeIfShutdownRequested()) return;
-			this.teardownBridge?.();
-			this.teardownBridge = undefined;
-
-			await this.buildRunner();
+			const { runner, loadErrors } = await this.buildRunner();
 			if (await this.disposeIfShutdownRequested()) return;
+			this.runner = runner;
+			this.loadErrors = loadErrors;
 			for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
 
 			await this.reapplyTools();
 			if (await this.disposeIfShutdownRequested()) return;
+			this.teardownBridge?.();
+			this.teardownBridge = undefined;
 			this.installBridge();
 			await this.runner?.emit({ type: "session_start", reason: "reload" });
+		} catch (error) {
+			if (!this.disposed) {
+				this.runner = previousRunner;
+				this.loadErrors = previousLoadErrors;
+				this.extensionTools = previousExtensionTools;
+				try {
+					await this.reapplyTools();
+				} catch {
+					// Preserve the original reload error.
+				}
+				if (!this.teardownBridge && this.runner) this.installBridge();
+			}
+			throw error;
 		} finally {
 			this.reloading = false;
 		}
@@ -178,21 +210,46 @@
 		this.runner = undefined;
 	}
 
-	private async buildRunner(): Promise<void> {
-		const result = await discoverAndLoadExtensions(this.configuredPaths, this.cwd, this.agentDir);
-		this.loadErrors = result.errors;
-		this.runner = new ExtensionRunner(
+	private async buildRunner(): Promise<{
+		runner: ExtensionRunner;
+		loadErrors: Array<{ path: string; error: string }>;
+	}> {
+		const result = await this.discoverExtensions();
+		const runner = new ExtensionRunner(
 			result.extensions,
 			result.runtime,
 			this.cwd,
 			this.sessionManager,
 			this.modelRegistry,
 		);
-		this.runner.bindCore(this.actions, this.contextActions);
-		this.runner.bindCommandContext(this.commandActions);
-		this.runner.setUIContext(undefined, "print");
+		runner.bindCore(this.actions, this.contextActions);
+		runner.bindCommandContext(this.commandActions);
+		runner.setUIContext(undefined, "print");
+		return { runner, loadErrors: result.errors };
 	}
 
+	private async discoverExtensions() {
+		if (this.projectTrusted) {
+			return discoverAndLoadExtensions(this.configuredPaths, this.cwd, this.agentDir);
+		}
+		const agentDir = this.agentDir ?? getAgentDir();
+		const settingsManager = SettingsManager.create(this.cwd, agentDir, { projectTrusted: false });
+		const loader = new DefaultResourceLoader({
+			cwd: this.cwd,
+			agentDir,
+			settingsManager,
+			additionalExtensionPaths: this.configuredPaths
+				.map((path) => resolve(this.cwd, path))
+				.filter((path) => !isUnderPath(path, this.cwd)),
+			noSkills: true,
+			noPromptTemplates: true,
+			noThemes: true,
+			noContextFiles: true,
+		});
+		await loader.reload();
+		return loader.getExtensions();
+	}
+
 	/**
 	 * Rebuild the extension-tool union and apply it to the harness as the
 	 * authoritative tool list. Extension tools are de-duped by name (the harness
@@ -308,3 +365,8 @@
 			(entry.message.role === "user" || entry.message.role === "assistant"),
 	);
 }
+
+function isUnderPath(target: string, root: string): boolean {
+	const rel = relative(resolve(root), resolve(target));
+	return rel === "" || (!rel.startsWith("..") && !isAbsolute(rel));
+}

diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -98,7 +98,12 @@
 
 export function makeExtensionContextActions(
 	harness: AgentHarness,
-	state: { isIdle: () => boolean; getSignal: () => AbortSignal | undefined; shutdown: () => void },
+	state: {
+		isIdle: () => boolean;
+		isProjectTrusted: () => boolean;
+		getSignal: () => AbortSignal | undefined;
+		shutdown: () => void;
+	},
 ): ExtensionContextActions {
 	return {
 		getModel() {
@@ -107,9 +112,8 @@
 		isIdle() {
 			return state.isIdle();
 		},
-		// Headless host trusts its cwd; project-trust prompts are a TUI concern.
 		isProjectTrusted(): boolean {
-			return true;
+			return state.isProjectTrusted();
 		},
 		getSignal() {
 			return state.getSignal();

diff --git a/packages/cli/src/extensions/setup.ts b/packages/cli/src/extensions/setup.ts
--- a/packages/cli/src/extensions/setup.ts
+++ b/packages/cli/src/extensions/setup.ts
@@ -1,17 +1,22 @@
 import type { CuaAgentHarness, Session } from "@onkernel/cua-agent";
 import type { ImageContent } from "@onkernel/cua-ai";
-import { getAgentDir } from "@earendil-works/pi-coding-agent";
+import {
+	getAgentDir,
+	hasProjectTrustInputs,
+	ProjectTrustStore,
+	SettingsManager,
+} from "@earendil-works/pi-coding-agent";
+import { existsSync } from "node:fs";
 import { join } from "node:path";
 import { HarnessExtensionHost } from "./host";
 
 /**
  * Resolve extension directories and construct + load a {@link HarnessExtensionHost}.
  *
- * Global extensions (`<getAgentDir()>/extensions`) and project-local extensions
- * (`<cwd>/.agents/extensions` plus the loader's implicit `<cwd>/.pi/extensions`
- * scan) all load on every run; `--no-extensions` opts out entirely. This is the
- * substrate for the self-improve loop: an agent writes a learned tool into the
- * project extension dir and it loads on the next run.
+ * Global extensions (`<getAgentDir()>/extensions`) always load; project-local
+ * extensions (`<cwd>/.agents/extensions` plus `<cwd>/.pi/extensions`) only load
+ * when project trust resolves true or `--trust-extensions` is set. `--no-extensions`
+ * opts out entirely.
  *
  * No browser/auth/provisioning happens here, so a test can drive the exact load
  * path the CLI uses with a `buildTestHarness` fixture and temp dirs.
@@ -21,21 +26,45 @@
 	session: Session;
 	cwd: string;
 	noExtensions: boolean;
+	trustExtensions?: boolean;
 	agentDir?: string;
 	configuredPaths?: string[];
 	initialScreenshot?: () => Promise<ImageContent[] | undefined>;
 }): Promise<HarnessExtensionHost | undefined> {
 	if (args.noExtensions) return undefined;
 	const agentDir = args.agentDir ?? getAgentDir();
+	const projectTrusted = resolveProjectExtensionTrust({
+		cwd: args.cwd,
+		agentDir,
+		trustExtensions: args.trustExtensions === true,
+	});
 	const configuredPaths = args.configuredPaths ?? [join(args.cwd, ".agents", "extensions")];
 	const host = new HarnessExtensionHost({
 		harness: args.harness,
 		session: args.session,
 		cwd: args.cwd,
 		configuredPaths,
+		projectTrusted,
 		agentDir,
 		initialScreenshot: args.initialScreenshot,
 	});
 	await host.load();
 	return host;
 }
+
+function resolveProjectExtensionTrust(args: {
+	cwd: string;
+	agentDir: string;
+	trustExtensions: boolean;
+}): boolean {
+	if (args.trustExtensions) return true;
+	if (!hasProjectExtensionInputs(args.cwd)) return true;
+	const trustDecision = new ProjectTrustStore(args.agentDir).get(args.cwd);
+	if (trustDecision !== null) return trustDecision;
+	const settings = SettingsManager.create(args.cwd, args.agentDir, { projectTrusted: false });
+	return settings.getDefaultProjectTrust() === "always";
+}
+
+function hasProjectExtensionInputs(cwd: string): boolean {
+	return hasProjectTrustInputs(cwd) || existsSync(join(cwd, ".agents", "extensions"));
+}

diff --git a/packages/cli/src/tui/main.ts b/packages/cli/src/tui/main.ts
--- a/packages/cli/src/tui/main.ts
+++ b/packages/cli/src/tui/main.ts
@@ -531,16 +531,25 @@
 }
 
 export async function applyReloadCommand(opts: InteractiveOptions, messages: MessageList): Promise<void> {
-	if (!opts.host) {
+	if (!opts.host || opts.host.isDisposed()) {
 		messages.addNotice("extensions are disabled");
 		return;
 	}
 	messages.addNotice("reloading extensions…");
 	try {
+		await opts.harness.waitForIdle();
+		if (opts.host.isDisposed()) {
+			messages.addNotice("extensions are disabled");
+			return;
+		}
 		// reload() emits no harness event, so this helper is the only source of
 		// feedback; surface loadErrors so a broken edited extension isn't silently
 		// dropped with its tool missing.
 		await opts.host.reload();
+		if (opts.host.isDisposed()) {
+			messages.addNotice("extensions were shut down");
+			return;
+		}
 		if (opts.host.loadErrors.length > 0) {
 			for (const { path, error } of opts.host.loadErrors) messages.addError(`${path}: ${error}`);
 		} else {

diff --git a/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts b/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
--- a/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
+++ b/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
@@ -120,6 +120,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		await host.load();

diff --git a/packages/cli/test/e2e/dom-table-extraction.test.ts b/packages/cli/test/e2e/dom-table-extraction.test.ts
--- a/packages/cli/test/e2e/dom-table-extraction.test.ts
+++ b/packages/cli/test/e2e/dom-table-extraction.test.ts
@@ -87,6 +87,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		await host.load();

diff --git a/packages/cli/test/e2e/form-fill-macro.test.ts b/packages/cli/test/e2e/form-fill-macro.test.ts
--- a/packages/cli/test/e2e/form-fill-macro.test.ts
+++ b/packages/cli/test/e2e/form-fill-macro.test.ts
@@ -90,6 +90,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		await host.load();

diff --git a/packages/cli/test/e2e/nav-shortcut-tool.test.ts b/packages/cli/test/e2e/nav-shortcut-tool.test.ts
--- a/packages/cli/test/e2e/nav-shortcut-tool.test.ts
+++ b/packages/cli/test/e2e/nav-shortcut-tool.test.ts
@@ -85,6 +85,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		await host.load();

diff --git a/packages/cli/test/e2e/template-match-click.test.ts b/packages/cli/test/e2e/template-match-click.test.ts
--- a/packages/cli/test/e2e/template-match-click.test.ts
+++ b/packages/cli/test/e2e/template-match-click.test.ts
@@ -98,6 +98,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		await host.load();

diff --git a/packages/cli/test/extension-loader.test.ts b/packages/cli/test/extension-loader.test.ts
--- a/packages/cli/test/extension-loader.test.ts
+++ b/packages/cli/test/extension-loader.test.ts
@@ -74,7 +74,7 @@
 		expect(fx.harness.getTools().map((t) => t.name)).not.toContain("loader_probe");
 	});
 
-	it("loads the implicit project <cwd>/.pi/extensions scan by default", async () => {
+	it("does not load the implicit project <cwd>/.pi/extensions scan when untrusted", async () => {
 		fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
 		// Unique per run so the whole-harness tool assertion can't collide with
 		// another worker registering the same name under pool concurrency.
@@ -92,10 +92,10 @@
 		});
 
 		expect(host).toBeDefined();
-		expect(fx.harness.getTools().map((t) => t.name)).toContain(probe);
+		expect(fx.harness.getTools().map((t) => t.name)).not.toContain(probe);
 	});
 
-	it("loads project <cwd>/.agents/extensions by default", async () => {
+	it("does not load project <cwd>/.agents/extensions when untrusted", async () => {
 		fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
 		const probe = `agents_probe_${randomUUID().replace(/-/g, "")}`;
 		const projectExtDir = join(fx.cwd, ".agents", "extensions");
@@ -111,6 +111,31 @@
 		});
 
 		expect(host).toBeDefined();
-		expect(fx.harness.getTools().map((t) => t.name)).toContain(probe);
+		expect(fx.harness.getTools().map((t) => t.name)).not.toContain(probe);
 	});
+
+	it("loads project-local extension directories with --trust-extensions", async () => {
+		fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+		const agentsProbe = `agents_probe_${randomUUID().replace(/-/g, "")}`;
+		const piProbe = `pi_probe_${randomUUID().replace(/-/g, "")}`;
+		const agentsExtDir = join(fx.cwd, ".agents", "extensions");
+		mkdirSync(agentsExtDir, { recursive: true });
+		writeFileSync(join(agentsExtDir, "agents-probe.ts"), makeToolExtension(agentsProbe));
+		const piExtDir = join(fx.cwd, ".pi", "extensions");
+		mkdirSync(piExtDir, { recursive: true });
+		writeFileSync(join(piExtDir, "pi-probe.ts"), makeToolExtension(piProbe));
+
+		host = await loadHarnessExtensions({
+			harness: fx.harness,
+			session: fx.session,
+			cwd: fx.cwd,
+			noExtensions: false,
+			trustExtensions: true,
+			agentDir: tempAgentDir(),
+		});
+
+		expect(host).toBeDefined();
+		expect(fx.harness.getTools().map((t) => t.name)).toContain(piProbe);
+		expect(fx.harness.getTools().map((t) => t.name)).toContain(agentsProbe);
+	});
 });

diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -38,6 +38,7 @@
 		session: fx.session,
 		cwd: fx.cwd,
 		configuredPaths: [makeExtensionDir()],
+		projectTrusted: true,
 		agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 	});
 	await created.load();
@@ -119,6 +120,7 @@
 			session: fx.session,
 			cwd: fx.cwd,
 			configuredPaths: [extDir],
+			projectTrusted: true,
 			agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
 		});
 		host = created;

diff --git a/packages/cli/test/reload-command.test.ts b/packages/cli/test/reload-command.test.ts
--- a/packages/cli/test/reload-command.test.ts
+++ b/packages/cli/test/reload-command.test.ts
@@ -15,23 +15,39 @@
 	messages: MessageList;
 	notices: string[];
 	errors: string[];
+	waitForIdle: ReturnType<typeof vi.fn>;
 } {
 	const messages = new MessageList();
 	const notices: string[] = [];
 	const errors: string[] = [];
+	const waitForIdle = vi.fn(async () => {});
 	vi.spyOn(messages, "addNotice").mockImplementation((text) => void notices.push(text));
 	vi.spyOn(messages, "addError").mockImplementation((text) => void errors.push(text));
-	return { opts: { host } as InteractiveOptions, messages, notices, errors };
+	return {
+		opts: {
+			host,
+			harness: { waitForIdle } as unknown as InteractiveOptions["harness"],
+		} as InteractiveOptions,
+		messages,
+		notices,
+		errors,
+		waitForIdle,
+	};
 }
 
 describe("applyReloadCommand (/reload glue)", () => {
 	it("invokes host.reload() and reports a clean reload", async () => {
 		const reload = vi.fn(async () => {});
-		const host = { reload, loadErrors: [] } as unknown as HarnessExtensionHost;
-		const { opts, messages, notices, errors } = setup(host);
+		const host = {
+			reload,
+			loadErrors: [],
+			isDisposed: () => false,
+		} as unknown as HarnessExtensionHost;
+		const { opts, messages, notices, errors, waitForIdle } = setup(host);
 
 		await applyReloadCommand(opts, messages);
 
+		expect(waitForIdle).toHaveBeenCalledTimes(1);
 		expect(reload).toHaveBeenCalledTimes(1);
 		expect(notices).toContain("extensions reloaded");
 		expect(errors).toHaveLength(0);
@@ -42,16 +58,37 @@
 		const host = {
 			reload,
 			loadErrors: [{ path: "/ext/broken.ts", error: "boom" }],
+			isDisposed: () => false,
 		} as unknown as HarnessExtensionHost;
-		const { opts, messages, errors, notices } = setup(host);
+		const { opts, messages, errors, notices, waitForIdle } = setup(host);
 
 		await applyReloadCommand(opts, messages);
 
+		expect(waitForIdle).toHaveBeenCalledTimes(1);
 		expect(reload).toHaveBeenCalledTimes(1);
 		expect(errors).toContain("/ext/broken.ts: boom");
 		expect(notices).not.toContain("extensions reloaded");
 	});
 
+	it("reports disabled when reload disposes the host", async () => {
+		let disposed = false;
+		const reload = vi.fn(async () => {
+			disposed = true;
+		});
+		const host = {
+			reload,
+			loadErrors: [],
+			isDisposed: () => disposed,
+		} as unknown as HarnessExtensionHost;
+		const { opts, messages, notices } = setup(host);
+
+		await applyReloadCommand(opts, messages);
+
+		expect(reload).toHaveBeenCalledTimes(1);
+		expect(notices).toContain("extensions were shut down");
+		expect(notices).not.toContain("extensions reloaded");
+	});
+
 	it("no-ops with a notice when no host is loaded", async () => {
 		const { opts, messages, notices } = setup(undefined);

You can send follow-ups to the cloud agent here.

initialScreenshot: args.initialScreenshot,
});
await host.load();
return host;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project extensions skip trust gating

High Severity

The loadHarnessExtensions function unconditionally loads project-local extensions from .agents/extensions and .pi/extensions without any trust checks. This bypasses the intended trust gating, allowing arbitrary code execution from untrusted repositories.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.

await this.reapplyTools();
if (await this.disposeIfShutdownRequested()) return;
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "reload" });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reload drops bridge on failure

Medium Severity

The reload() method tears down the extension event bridge before re-initializing the runner and tools. If an error occurs during these re-initialization steps, the bridge isn't re-installed, which means extensions lose event forwarding and their tools stop working for the session.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.

if (parsed?.command === "reload") {
await applyReloadCommand(opts, messages);
requestRender("reload");
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/reload runs during active agent

Medium Severity

The /reload slash command calls host.reload() immediately without waiting for the harness to become idle, unlike other extension operations. This can tear down the event bridge or swap runners while a turn is executing.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.

for (const { path, error } of opts.host.loadErrors) messages.addError(`${path}: ${error}`);
} else {
messages.addNotice("extensions reloaded");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aborted reload reports success

Medium Severity

When host.reload() exits early because disposeIfShutdownRequested() disposed the host (for example after an extension calls ctx.shutdown() during session_shutdown), applyReloadCommand still shows “extensions reloaded” whenever loadErrors is empty, even though the host was torn down and extensions are no longer wired.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.

There are 12 total unresolved issues (including 10 from previous reviews).

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Extension sendUserMessage rejects unhandled
    • Extension sendUserMessage now catches prompt failures and records them as extension_error entries so rejections are handled instead of becoming unhandled promises.
  • ✅ Fixed: Extension startup steals first screenshot
    • Initial screenshot attachment now keys only off the run-mode first-prompt/resume flags, so extension-startup turns no longer prevent the first user-driven prompt from receiving images.

Create PR

Or push these changes by commenting:

@cursor push b64dd7022d
Preview (b64dd7022d)
diff --git a/packages/cli/src/action/harness-runner.ts b/packages/cli/src/action/harness-runner.ts
--- a/packages/cli/src/action/harness-runner.ts
+++ b/packages/cli/src/action/harness-runner.ts
@@ -140,23 +140,11 @@
 
 async function maybeInitialScreenshot(opts: HarnessRunOptions): Promise<ImageContent[] | undefined> {
 	if (opts.skipInitialScreenshot) return undefined;
-	const hasPriorTurn = await sessionHasPriorTurn(opts.session);
-	if (hasPriorTurn) return undefined;
 	const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
 	if (!png) return undefined;
 	return [{ type: "image", data: png.toString("base64"), mimeType: "image/png" }];
 }
 
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
-	const entries = await session.getBranch();
-	for (const entry of entries) {
-		if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
-			return true;
-		}
-	}
-	return false;
-}
-
 function textFromAssistant(message: AssistantMessage): string {
 	const parts: string[] = [];
 	for (const block of message.content) {

diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -45,7 +45,16 @@
 		},
 		sendUserMessage(content): void {
 			const text = typeof content === "string" ? content : textPartsOf(content);
-			void hooks.sendUserMessage(text);
+			void hooks.sendUserMessage(text).catch((error: unknown) => {
+				void session
+					.appendCustomMessageEntry(
+						"extension_error",
+						`sendUserMessage failed: ${errorMessage(error)}`,
+						true,
+						{ action: "sendUserMessage" },
+					)
+					.catch(() => {});
+			});
 		},
 		appendEntry(customType, data): void {
 			void session.appendCustomEntry(customType, data);
@@ -166,3 +175,9 @@
 		.map((part) => part.text ?? "")
 		.join("");
 }
+
+function errorMessage(error: unknown): string {
+	if (error instanceof Error && error.message.trim().length > 0) return error.message;
+	if (typeof error === "string" && error.trim().length > 0) return error;
+	return "unknown error";
+}

diff --git a/packages/cli/src/print.ts b/packages/cli/src/print.ts
--- a/packages/cli/src/print.ts
+++ b/packages/cli/src/print.ts
@@ -94,8 +94,6 @@
 
 async function maybeInitialScreenshot(opts: RunPrintOptions): Promise<ImageContent[] | undefined> {
 	if (opts.skipInitialScreenshot) return undefined;
-	const hasPriorTurn = await sessionHasPriorTurn(opts.session);
-	if (hasPriorTurn) return undefined;
 	const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
 	if (!png) return undefined;
 	return [
@@ -106,13 +104,3 @@
 		},
 	];
 }
-
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
-	const entries = await session.getBranch();
-	for (const entry of entries) {
-		if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
-			return true;
-		}
-	}
-	return false;
-}

diff --git a/packages/cli/src/tui/main.ts b/packages/cli/src/tui/main.ts
--- a/packages/cli/src/tui/main.ts
+++ b/packages/cli/src/tui/main.ts
@@ -451,22 +451,11 @@
 ): Promise<ImageContent[] | undefined> {
 	if (firstPromptSent) return undefined;
 	if (opts.skipInitialScreenshot) return undefined;
-	if (await sessionHasPriorTurn(opts.session)) return undefined;
 	const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
 	if (!png) return undefined;
 	return [{ type: "image", data: png.toString("base64"), mimeType: "image/png" }];
 }
 
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
-	const entries = await session.getBranch();
-	for (const entry of entries) {
-		if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
-			return true;
-		}
-	}
-	return false;
-}
-
 async function applyModelCommand(
 	opts: InteractiveOptions,
 	footer: TelemetryFooter,

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 2010817. Configure here.

await this.buildRunner();
await this.reapplyTools();
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "startup" });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension startup steals first screenshot

High Severity

host.load() emits session_start before the CLI or TUI sends the user’s first prompt. If an extension calls pi.sendUserMessage there, maybeInitialScreenshot attaches the browser image to that message because the transcript has no prior turns. The real first user prompt then hits the same sessionHasPriorTurn check in print.ts / TUI and runs without { images }, so non-yutori models can start blind on the actual task.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by learned rule: Harness prompt calls must attach first-prompt screenshot for non-yutori providers

Reviewed by Cursor Bugbot for commit 2010817. Configure here.

},
sendUserMessage(content): void {
const text = typeof content === "string" ? content : textPartsOf(content);
void hooks.sendUserMessage(text);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension sendUserMessage rejects unhandled

Medium Severity

pi.sendUserMessage is implemented with void hooks.sendUserMessage(text), so failures from harness.prompt (including concurrent use while the TUI is already driving the harness) surface as unhandled promise rejections rather than structured extension errors.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2010817. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant