Skip to content

feat: summarize video transitions#410

Open
thymikee wants to merge 3 commits into
mainfrom
codex/video-transition-summary
Open

feat: summarize video transitions#410
thymikee wants to merge 3 commits into
mainfrom
codex/video-transition-summary

Conversation

@thymikee

Copy link
Copy Markdown
Member

Summary

Add transition summaries for visual frame sequences and recordings via diff frames and diff video.

diff frames accepts a PNG directory or explicit PNG frame paths, while diff video uses external ffmpeg/ffprobe to sample recordings into frames without adding JS dependencies. The summarizer reuses screenshot diff region/OCR hints only on selected transition boundaries, supports gesture telemetry labels such as after tap, and writes keyframe/diff artifacts under --out.

Example from a synthetic Settings-like frame sequence:

Frame transition summary: 1 transition, sampled 5 frames
  Artifacts: /tmp/.../out3

1. 0ms-300ms after tap x=150 y=74
   screen replacement; large center large-area changed
   peak=7.79% avg=6.35% duration=300ms
   changed: large center large-area, 51.08% of diff, mixed
   keyframes: before=/tmp/.../frames/settings-0.png mid=/tmp/.../frames/settings-2.png after=/tmp/.../frames/settings-3.png
   diff: /tmp/.../out3/transition-1.diff.png

diff video prerequisite behavior when FFmpeg is missing:

Error (TOOL_MISSING): diff video requires ffmpeg and ffprobe in PATH
Hint: Install FFmpeg, then retry diff video.

Touched-file count: 10. Scope stayed within the diff/media command family plus docs/skill guidance.

Validation

  • pnpm format
  • pnpm check:quick
  • pnpm vitest run src/utils/__tests__/transition-summary.test.ts src/__tests__/cli-diff.test.ts src/utils/__tests__/cli-option-schema.test.ts
  • pnpm test:smoke
  • pnpm build
  • git diff --check
  • Manual synthetic Settings-like pnpm ad diff frames <frames-dir> --out <out-dir> --telemetry <telemetry.json> --threshold 0
  • Manual pnpm ad diff video /tmp/missing.mp4 --out /tmp/transition-video-out in an environment without FFmpeg/FFprobe to verify the prerequisite error

Known gap: full pnpm check:unit/all-unit Vitest runs were attempted, but the tool session was terminated before Vitest printed a final summary; targeted unit/CLI tests, smoke tests, typecheck/lint, and build passed.

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-410/

Built to branch gh-pages at 2026-04-14 07:39 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7e96b9406

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/utils/video-frames.ts
Comment on lines +69 to +71
const files = (await fs.readdir(params.outputDir))
.filter((entry) => /^frame-\d+\.png$/i.test(entry))
.sort((left, right) => left.localeCompare(right, undefined, { numeric: true }));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear stale frame artifacts before reading extracted frames

When diff video is run with an existing --out directory, this code reads every frame-*.png in that folder after extraction, but the folder is never cleaned first. If a previous run produced more frames than the current run, leftover PNGs are included in files, so transition analysis uses stale frames and reports incorrect timing/content. This is reproducible by running diff video twice against the same output dir with different --max-frames or video lengths.

Useful? React with 👍 / 👎.

@thymikee

Copy link
Copy Markdown
Member Author

Re-checking this PR today: it is stale and not mergeable as-is. GitHub reports conflicts (mergeStateStatus: DIRTY, mergeable: CONFLICTING), and the last checks are from April with unit/typecheck failures.

If this feature is still intended to land, the branch needs to be rebased onto current main, conflicts resolved, and validation rerun before a useful final review. Until then I would not merge it.

Copy link
Copy Markdown
Member Author

Code review

Verdict: minor issues — no blockers; the security-sensitive parts are done correctly, but there are a few edge-case and hygiene gaps, and the branch is significantly stale against main.

Security (checked, no findings)

  • Command injection: clean. runCmd uses spawn(..., { shell: false }) with array args; the video path and fps values are never shell-interpolated, and the path is resolved absolute before being passed so it can't be mistaken for an ffmpeg option.
  • Stale-frame cleanup (ddeeb9f): correct and safely scoped. removeStaleFrames (src/utils/video-frames.ts:84-95) only deletes files matching ^frame-\d+\.png$ inside the command's own <out>/frames subdirectory.
  • Memory: frames are compared pairwise, two PNGs in memory at a time, capped at --max-frames ≤ 500.

Findings

  1. Minorsrc/utils/video-frames.ts:122-126 (formatFps): the clamped fps maxFrames / durationSec can format to "0" (e.g. --max-frames 2 on a ~70+ minute recording), making ffmpeg fail with an opaque COMMAND_FAILED instead of clamping to a minimum positive fps.

  2. Minorsrc/utils/video-frames.ts:44-47 + src/utils/transition-summary.ts:92: a very short video (0-1 sampled frames) or a corrupt video that ffmpeg exits 0 on surfaces as INVALID_ARGS: 'transition summary requires at least two frames' — blaming the user's arguments rather than the video.

  3. Minorsrc/utils/transition-summary.ts:122-124: the ddeeb9f fix clears stale frame-*.png but not stale transition-N.diff.png in --out; rerunning into the same dir with fewer detected transitions leaves misleading leftover diff images — the same bug class the latest commit fixed, one level up.

  4. Minordiff video branch in src/cli/commands/screenshot.ts: the output dir (or mkdtemp dir) is created before the ffmpeg TOOL_MISSING probe, littering empty temp dirs on failure; temp dirs are also never cleaned on success when --out is omitted.

  5. Minorsrc/utils/transition-summary.ts:181-189: directory mode ingests every *.png, so diff frames <dir> --out <same dir> makes the next run treat its own transition-1.diff.png as an input frame — no filtering or warning.

  6. Minorsrc/utils/video-frames.ts:22: the fixed 60 s VIDEO_EXTRACT_TIMEOUT_MS can SIGKILL extraction of long/high-res recordings (the exact case --max-frames 500 targets), with no flag to extend it.

  7. Minor (tests)video-frames.test.ts only covers TOOL_MISSING and stale-frame cleanup; no tests for the fps-clamp math, timestamp parsing/fallback, ffprobe failure, or the <2-extracted-frames path.

  8. Major (process, not code) — staleness/rebase risk: the branch is based on 0.12.4 (April); main has ~50 commits since, several touching exactly this PR's shared files — notably screenshot-diff.ts was refactored to a worker thread (b8172e3; options now have maxPixels, and this PR's includeOcr flag doesn't exist there), plus type-consolidation passes and large drift in command-schema.ts and output.ts. The includeOcr opt-out is load-bearing — without re-porting it onto main's worker version, the pairwise loop would run OCR on up to ~499 comparisons.

Overall

Well-structured feature: clean exec hygiene, bounded memory, sensible heuristics, and the latest artifact-cleanup fix is correct and conservative. The biggest practical concern is the two-month-stale base — this needs a careful rebase, re-threading includeOcr through the new worker-thread diff pipeline in particular, before merge.


Generated by Claude Code

@thymikee

Copy link
Copy Markdown
Member Author

diff frames / diff video add a transition-summary pipeline over PNG sequences and ffmpeg-sampled recordings. The video path is correctly routed through the shared exec.ts helpers (whichCmd/runCmd with allowFailure + explicit VIDEO_PROBE_TIMEOUT_MS/VIDEO_EXTRACT_TIMEOUT_MS), tooling is preflight-gated with a clean TOOL_MISSING hint, and the analysis is fully opt-in (new subcommands, no change to the default record/screenshot/snapshot hot path). Nice work on the prerequisite gating and stale-frame cleanup.

Regression risk: Medium — the new commands are additive and the default capture path is untouched, but the branch predates the capture-facet / router refactor and the one edit it makes to a shared file (screenshot-diff.ts) is now stale against a restructured OCR block.

Blocking

  • Stale base — will not apply; needs a full rebase. src/cli/commands/screenshot.ts:178 was rewritten on main: diff screenshot now routes through createAgentDevice(...).capture.diffScreenshot(...) and imports ClientCommandHandler from ./router-types.ts, not ./router.ts. The PR patches the old inline-compareScreenshots version, so the frames/video branches and the new helpers (readDiffThreshold, rejectUnsupportedDiffFlags, resolveTransitionOutputDir) must be re-grafted onto the new handler shape.
  • src/utils/command-schema.ts:60 / :928 / :1092 no longer own these definitions. On main command-schema.ts is a thin re-export (export type { CliFlags, ... }); CliFlags lives in src/utils/cli-flags.ts (flag defs at cli-flags.ts:1087) and the diff usageOverride/allowedFlags moved to src/commands/capture/index.ts:173. The new flags (sampleFps, maxFrames, frameIntervalMs, telemetry) and allowsExtraPositionals must be re-added there or they won't be registered/parsed.
  • src/utils/screenshot-diff.ts:140 — the includeOcr hook is stale and would regress shared OCR behavior. Main's compareScreenshots no longer has a bare differentPixels > 0 ? summarizeScreenshotOcr(...); OCR is now produced unconditionally and then filtered by a shouldIncludeOcr content check, and ScreenshotDiffOptions gained maxPixels. Re-apply the opt-out as options.includeOcr !== false && against the current gate, and confirm it still short-circuits the actual summarizeScreenshotOcr call (the perf win) rather than just the result assembly — otherwise the per-pair diffs in transition-summary.ts:790 still pay the OCR cost they intend to skip.

Non-blocking

  • Output size / token cost. formatTransitionSummaryText (src/utils/output.ts:334) emits ~6 lines per transition × up to DEFAULT_MAX_TRANSITIONS = 5 (transition-summary.ts:775) plus absolute keyframe/diff paths. That's reasonable and capped, but since agent token budget matters, consider whether emitting full absolute paths for before/mid/after and diff per transition in the default text mode is necessary, or whether those belong primarily in --json.
  • --out artifact cleanup for diff video. Frames are extracted into path.join(outputDir, 'frames') (screenshot.ts:233) inside a possibly user-supplied --out dir. removeStaleFrames (video-frames.ts:144) clears frame-*.png on re-run, but on the temp-dir fallback (resolveTransitionOutputDir, screenshot.ts:299) nothing is ever cleaned up — sampled frames + diff PNGs accumulate in os.tmpdir(). Worth a note in docs or a cleanup-on-temp path.
  • Determinism / flakiness. MIN_SIGNIFICANT_MISMATCH_PERCENTAGE = 0.5 and SEGMENT_GAP_TOLERANCE_PAIRS = 1 (transition-summary.ts:773) are fixed magic thresholds; segment selection also depends on ffmpeg's showinfo pts_time parsing (video-frames.ts:172). The diff video path has no end-to-end test that exercises real ffmpeg output, so timestamp/segmentation regressions in real recordings won't be caught — the existing video test only mocks runCmd.
  • durationMs on findTrigger (transition-summary.ts:969) reads 'durationMs' in event, but per daemon/types.ts:145 some variants (tap/scroll/swipe) make it optional/absent while pinch requires it — fine today, but the ±250ms/1000ms overlap windows are undocumented heuristics worth a comment justifying the constants.
  • Test coverage is otherwise solid (cli-diff.test.ts, transition-summary.test.ts, video-frames.test.ts cover flag rejection, telemetry anchoring, TOOL_MISSING, and stale-frame clearing) and no existing tests were weakened. Per the PR's own "Known gap", the full unit suite never completed — re-run pnpm check:unit after rebase.

Please rebase onto current main and re-validate against the capture-facet handler and the relocated flag/schema definitions before this can land.

🤖 Automated review via Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant