Skip to content

Add fork identity wait plumbing#298

Open
sjmiller609 wants to merge 2 commits into
mainfrom
hypeship/fork-identity-wait
Open

Add fork identity wait plumbing#298
sjmiller609 wants to merge 2 commits into
mainfrom
hypeship/fork-identity-wait

Conversation

@sjmiller609

@sjmiller609 sjmiller609 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add optional fork identity wait mode, enabled by KERNEL_FORK_IDENTITY_WAIT=true
  • add internal API routes for host-injected identity:
    • POST /internal/fork-identity
    • GET /internal/fork-identity/config
  • add shared forkidentity payload/env/path helpers
  • update wrapper startup so Chrome and kernel-images-api can stay warm while wrapper waits for identity, then apply identity-bound env before starting identity-bound services

Default Behavior

No behavior changes unless KERNEL_FORK_IDENTITY_WAIT=true is set.

Without that env:

  • wrapper startup does not wait for a fork identity payload
  • identity injection returns conflict
  • config reads return not found when no payload exists
  • kernel-images-api follows the existing startup/restart path

Tests

  • go test ./cmd/api ./cmd/wrapper ./lib/forkidentity -count=1
  • git diff --check

Note

Medium Risk
Changes VM boot ordering and accepts host-injected identity secrets (JWT, URLs) on internal HTTP endpoints; risk is mitigated because the feature is off unless KERNEL_FORK_IDENTITY_WAIT is enabled.

Overview
Adds optional fork identity wait mode (KERNEL_FORK_IDENTITY_WAIT=true) so forked snapshot restores can keep Chrome and CDP warm while the control plane injects per-instance identity.

A new forkidentity library coordinates payload files under /run/kernel/, maps JSON fields to process env (including JWT and metro URLs), and exposes extension-facing config. Internal API routes (outside OpenAPI): POST /internal/fork-identity accepts a bounded JSON payload, writes it, and blocks until the wrapper marks identity applied; GET /internal/fork-identity/config returns 202 while waiting and JSON instanceName / metroApiUrl once applied.

The wrapper replaces the prior FORK HOOK placeholder: in wait mode it starts kernel-images-api after Chromium is up, probes public CDP, stops envoy and polls for the payload file, applies env via Setenv/Unsetenv, writes the applied marker, then runs init-envoy without restarting kernel-images-api so CDP sessions stay connected. Shutdown cancels the startup wait via context.

Default behavior is unchanged when the wait env is unset.

Reviewed by Cursor Bugbot for commit d905fac. Bugbot is set up for automated code reviews on this repo. Configure here.

@sjmiller609 sjmiller609 marked this pull request as ready for review June 25, 2026 15:07
@sjmiller609 sjmiller609 requested a review from hiroTamada June 25, 2026 15:07
// services use `restart` so the same code path works for boot (start a
// stopped service) and post-fork (stop+start to force a re-read of
// refreshed envs).
if !waitForForkIdentityIfEnabled(startupCtx, forkIdentityWait) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early POST loses payload

High Severity

With fork identity wait enabled, kernel-images-api starts early and can accept and persist a fork identity payload. The wrapper then deletes the payload file, leading to a race where an injected payload is immediately removed, causing injections to fail and boot to stall until retried.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 252875c. Configure here.

restartAll("kernel-images-api")
if !forkIdentityWait {
restartAll("kernel-images-api")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API keeps pre-identity env

Medium Severity

In fork identity wait mode, kernel-images-api starts before identity is applied and isn't restarted. Since it loads configuration like S2 basin, token, and stream only at process start, fork-injected identity environment variables are not picked up. This can result in stale or disabled S2 telemetry streams.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 252875c. Configure here.

w.WriteHeader(http.StatusAccepted)
return
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale config without wait mode

Low Severity

When fork-identity wait is disabled, GET /internal/fork-identity/config returns 200 with JSON whenever fork-identity.json exists, without checking the applied marker. A leftover payload from a snapshot or prior run can expose the wrong instance or metro URL to consumers that treat 200 as authoritative.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 252875c. Configure here.

@hiroTamada hiroTamada left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved — reviewed as opt-in plumbing.

everything is gated behind KERNEL_FORK_IDENTITY_WAIT. with it unset the boot path is unchanged: the if forkIdentityWait branches are skipped, kernel-images-api is still restarted in the identity phase, and the two new internal routes answer 409/404. traced the default path end to end — WaitEnabled() returns (false, nil) on empty env, so there's no error/fatal and no wait. the forkidentity lib is self-contained and unit-tested.

one non-blocking nit:

  • forkidentity/payload.go WaitAppliedMarker and wrapper/fork_identity.go waitForForkIdentityPayload busy-spin with runtime.Gosched() and no sleep, so they peg a core for up to the 30s timeout. the other wrapper wait loops (waitForSocket, waitForHTTPProbe) use time.Sleep(20ms) — worth matching for consistency.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d905fac. Configure here.

return nil, ctx.Err()
case <-time.After(20 * time.Millisecond):
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No timeout waiting payload

Medium Severity

waitForForkIdentityPayload polls forever until a payload file appears or startupCtx is canceled. Unlike the POST handler’s 30-second apply wait, a missing or failed host injection leaves the wrapper stuck in startup with no automatic failure.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d905fac. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants