Skip to content

feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119

Open
d-cs wants to merge 7 commits into
mainfrom
runops/pr07-replication
Open

feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119
d-cs wants to merge 7 commits into
mainfrom
runops/pr07-replication

Conversation

@d-cs

@d-cs d-cs commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

What

Extends the ClickHouse runs-replication service to fan in from multiple Postgres sources (the control-plane DB and the run-ops DB) instead of a single source, plus the admin operations to run and observe it.

  • Multi-source fan-in (services/runsReplicationService.server.ts, new runsReplicationInstance.server.ts, runsReplicationGlobal.server.ts): factors the replication service into per-source instances and a coordinator so a single ClickHouse target is fed from more than one Postgres source.
  • Admin ops (routes/admin.api.v1.runs-replication.status.ts, admin.api.v1.runs-replication.backfill.ts, v3/services/adminWorker.server.ts): adds a status endpoint reporting per-source replication state and updates the backfill entrypoint for the multi-source shape.

Why

PR7 of the run-ops split stack, and the final piece: once run state can live in a separate run-ops DB (earlier PRs), the analytics replication into ClickHouse has to consume both sources so runs remain queryable regardless of residency. Behavior-changing for the replication service internals; the ClickHouse-facing output is unchanged (still one runs stream), and single-source operation is preserved when the split is not enabled.

Tests

New vitest coverage: runsReplicationInstance.test.ts (per-source instance behavior) and runsReplicationService.part8/part9 suites exercising the multi-source coordinator. Testcontainers-backed (ClickHouse + Postgres); no mocks.

Notes

Draft, stacked on #4118 (runops/pr06-write-path). Review that first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with Claude Code

@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 29bd826

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 093bb72b-3aca-48e5-82d9-a46afe4591d0

📥 Commits

Reviewing files that changed from the base of the PR and between 766138c and 29bd826.

📒 Files selected for processing (1)
  • apps/webapp/app/services/runsReplicationInstance.server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/webapp/app/services/runsReplicationInstance.server.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout. (12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
⚠️ CI failures not shown inline (4)

GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: feat(run-ops): ClickHouse multi-source replication fan-in + admin ops

Conclusion: failure

View job details

 -> build-legacy-run-engine.fix3
  * [new tag]             build-manual-checkpoints.rc1 -> build-manual-checkpoints.rc1
  * [new tag]             build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
  * [new tag]             build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
  * [new tag]             build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
  * [new tag]             build-new-build-system.rc.1 -> build-new-build-system.rc.1
  * [new tag]             build-otel-upgrade-rc.0     -> build-otel-upgrade-rc.0
  * [new tag]             build-otel-upgrade-rc.1     -> build-otel-upgrade-rc.1
  * [new tag]             build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
  * [new tag]             build-prod-rescue-rc.1      -> build-prod-rescue-rc.1
  * [new tag]             build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
  * [new tag]             build-re2.rc0               -> build-re2.rc0
  * [new tag]             build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
  * [new tag]             build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
  * [new tag]             build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
  * [new tag]             build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
  * [new tag]             build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
  * [new tag]             build-realtimestreams-dedupe -> build-realtimestreams-dedupe
  * [new tag]             build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
  * [new tag]             build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
  * [new tag]             build-remote-ecr-rc.0       -> build-remote-ecr-rc.0
  * [new tag]             build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
  * [new tag]             build-resume-fixes.rc1      -> build-resume-fixes.rc1
  * [new tag]             build-resu...

GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: feat(run-ops): ClickHouse multi-source replication fan-in + admin ops

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...

GitHub Actions: 📝 CLAUDE.md Audit / audit: feat(run-ops): ClickHouse multi-source replication fan-in + admin ops

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are reviewing a PR to check whether any CLAUDE.md files or .claude/rules/ files need updating.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check if there's a CLAUDE.md in that directory or a parent directory.
3. Determine if any CLAUDE.md or .claude/rules/ file should be updated based on the changes. Consider:
   - New files/directories that aren't covered by existing documentation
   - Changed architecture or patterns that contradict current CLAUDE.md guidance
   - New dependencies, services, or infrastructure that Claude should know about
   - Renamed or moved files that are referenced in CLAUDE.md
   - Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ CLAUDE.md files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **CLAUDE.md updates suggested:**
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead Claude in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new CLAUDE.md files - only updates to existing ones.
   trigger_phrase: `@claude`
   label_trigger: claude
   branch_prefix: claude/
   use_bedrock: false
   use_vertex: false
   use_foundry: false
   classify_inline_comments: true
   use_commit_signing: false
   bot_id: 41898282
   bot_name: claude[bot]
   track_progress: false
   include_fix_links: true
   display_report: false...

GitHub Actions: 📝 CLAUDE.md Audit / 0_audit.txt: feat(run-ops): ClickHouse multi-source replication fan-in + admin ops

Conclusion: failure

View job details

.0-beta.46 -> `@trigger.dev/yalt`@3.0.0-beta.46
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.47 -> `@trigger.dev/yalt`@3.0.0-beta.47
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.48 -> `@trigger.dev/yalt`@3.0.0-beta.48
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.49 -> `@trigger.dev/yalt`@3.0.0-beta.49
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.5 -> `@trigger.dev/yalt`@3.0.0-beta.5
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.50 -> `@trigger.dev/yalt`@3.0.0-beta.50
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.51 -> `@trigger.dev/yalt`@3.0.0-beta.51
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.52 -> `@trigger.dev/yalt`@3.0.0-beta.52
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.53 -> `@trigger.dev/yalt`@3.0.0-beta.53
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.55 -> `@trigger.dev/yalt`@3.0.0-beta.55
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.6 -> `@trigger.dev/yalt`@3.0.0-beta.6
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.7 -> `@trigger.dev/yalt`@3.0.0-beta.7
  * [new tag]             build-alert-hotfix.rc1      -> build-alert-hotfix.rc1
  * [new tag]             build-alert-hotfix.rc2      -> build-alert-hotfix.rc2
  * [new tag]             build-arm-builds-rc.1       -> build-arm-builds-rc.1
  * [new tag]             build-arm-builds-rc.2       -> build-arm-builds-rc.2
  * [new tag]             build-arm-builds-rc.3       -> build-arm-builds-rc.3
  * [new tag]             build-batchid-carryover-rc.0 -> build-batchid-carryover-rc.0
  * [new tag]             build-batching-rc.1         -> build-batching-rc.1
  * [new tag]             build-batching-rc.2         -> build-batching-rc.2
  * [new tag]             build-billing-0.0.1         -> build-billing-0.0.1
  * [new tag]             build-billing-0.0.2         -> build-billing-0.0.2
  * [new tag]             build-billing-0.0.3         -> build-billing-0.0.3
  * [new tag]             build-buildinfo-rc.0        -> build-bu...

Walkthrough

This change introduces multi-source replication support to RunsReplicationService, with per-source runtime state, source-specific versioning, acknowledgment handling, and metrics. It adds a global configured-sources registry, a new admin status loader that reports per-source leader-lock state, and split-gated instance initialization that can rebuild the service with dual sources or fail on misconfiguration. Backfill and worker call sites now select the active replication service from the global store when available. Tests cover source selection, split gating, backfill versioning, leader locks, dedup behavior, and metric labels.

Changes

Area Changes
Replication service core Multi-source runtime, per-source transaction and acknowledgment handling, source validation, and per-source metrics/versioning
Global registry New configured-sources storage and accessors
Instance initialization Split-aware boot flow, dual-source rebuild, and fatal misconfiguration handling
Admin endpoints New status loader plus backfill/worker selection of the active replication service
Docs New changelog entry for fan-in replication
Tests Unit and integration coverage for source building, split checks, leader locks, dedup, backfill versioning, and metrics

Sequence Diagram(s)

sequenceDiagram
  participant Init as initializeRunsReplicationInstance
  participant Global as runsReplicationGlobal
  participant Split as isSplitEnabled
  participant Service as RunsReplicationService
  Init->>Service: create legacy-only service
  Init->>Global: setRunsReplicationGlobal(legacy)
  Init->>Split: resolve split gate
  Split-->>Init: splitEnabled
  Init->>Init: buildReplicationSources()
  Init->>Init: assertReplicationCoversSplit()
  Init->>Service: recreate with dual sources (if enabled)
  Init->>Global: setRunsReplicationGlobal(dual service)
  Init->>Service: start()
Loading
sequenceDiagram
  participant Loader as admin.api.v1.runs-replication.status loader
  participant Global as getRunsReplicationConfiguredSources
  participant Redis
  Loader->>Global: fetch configured sources
  Loader->>Redis: check leader-lock key per source
  Redis-->>Loader: exists boolean
  Loader-->>Loader: build JSON response with leader flags
Loading

Related PRs: None identified.

Suggested labels: area: webapp, type: feature

Suggested reviewers: None identified.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: multi-source ClickHouse replication fan-in with admin operations.
Description check ✅ Passed The description covers the key What/Why/Tests points, but it omits the template's issue reference, checklist items, and screenshots section.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runops/pr07-replication

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@d-cs d-cs force-pushed the runops/pr06-write-path branch from 515b897 to cb97148 Compare July 2, 2026 18:02
@d-cs d-cs force-pushed the runops/pr07-replication branch from 5de29cb to 5c2d010 Compare July 2, 2026 18:02
@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@ef571f9

trigger.dev

npm i https://pkg.pr.new/trigger.dev@ef571f9

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@ef571f9

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@ef571f9

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@ef571f9

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@ef571f9

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@ef571f9

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@ef571f9

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@ef571f9

commit: ef571f9

@d-cs d-cs force-pushed the runops/pr06-write-path branch from c59d9c5 to d5d7fa1 Compare July 2, 2026 19:25
@d-cs d-cs force-pushed the runops/pr07-replication branch from 8fb6e8a to 4e08dc7 Compare July 2, 2026 19:25
@d-cs d-cs force-pushed the runops/pr06-write-path branch from a1ff262 to a8068e9 Compare July 2, 2026 20:23
@d-cs d-cs force-pushed the runops/pr07-replication branch from 4e08dc7 to 11dc0b7 Compare July 2, 2026 20:23
@d-cs d-cs force-pushed the runops/pr06-write-path branch from a8068e9 to 0ef3a6b Compare July 2, 2026 20:38
@d-cs d-cs force-pushed the runops/pr07-replication branch from 11dc0b7 to a9bc9e6 Compare July 2, 2026 20:38
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 0db90f0 to d5415e8 Compare July 2, 2026 21:44
@d-cs d-cs force-pushed the runops/pr07-replication branch 2 times, most recently from 277ecea to 2bba3b8 Compare July 3, 2026 00:17
@d-cs d-cs force-pushed the runops/pr06-write-path branch from aa55b6b to 3153bc4 Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr07-replication branch from 2bba3b8 to 0f1da3f Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 3153bc4 to d561590 Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr07-replication branch from 0f1da3f to 5a17a98 Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr06-write-path branch from d561590 to 9e7c367 Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr07-replication branch from 5a17a98 to 6b7fb6d Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 9e7c367 to e23432d Compare July 3, 2026 10:44
@d-cs d-cs force-pushed the runops/pr07-replication branch from 6b7fb6d to ac94ac3 Compare July 3, 2026 10:44
@d-cs d-cs force-pushed the runops/pr06-write-path branch from e23432d to 8dff8b2 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr07-replication branch from ac94ac3 to 62ce160 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 8dff8b2 to 891d81a Compare July 3, 2026 12:08
@d-cs d-cs force-pushed the runops/pr07-replication branch from 62ce160 to 2be7e51 Compare July 3, 2026 12:08
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 891d81a to 5140cbc Compare July 3, 2026 15:42
@d-cs d-cs force-pushed the runops/pr07-replication branch from 2be7e51 to b4bee3f Compare July 3, 2026 15:42
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 5140cbc to f8f3096 Compare July 3, 2026 16:33
@d-cs d-cs force-pushed the runops/pr07-replication branch from b4bee3f to 3538e4e Compare July 3, 2026 16:33
@d-cs d-cs force-pushed the runops/pr06-write-path branch from f8f3096 to 4bda37a Compare July 3, 2026 16:44
@d-cs d-cs force-pushed the runops/pr07-replication branch from 3538e4e to e497155 Compare July 3, 2026 16:44
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 4bda37a to ea22f52 Compare July 3, 2026 17:08
@d-cs d-cs force-pushed the runops/pr07-replication branch from e497155 to ef571f9 Compare July 3, 2026 17:08
Base automatically changed from runops/pr06-write-path to main July 3, 2026 17:52
d-cs and others added 6 commits July 3, 2026 18:53
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…comments/test names

Remove the internal plan-enumeration labels from runs-replication
comments and test names, keeping the behavioral descriptions intact.
Comment/label hygiene only; no product logic or test behavior changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, fix test status type

- Register the implicit single source with id "legacy" so its leader-lock key
  matches the id the admin status route probes; otherwise leadership always
  reads false in the non-split config.
- Guard the shutdown-path client.stop() fan-out against re-firing per incoming
  transaction and add a catch so rejections don't surface as unhandled.
- Use the TaskRunStatus type alias (not the const value) for status annotations
  in the dual-source dedup tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion fan-in

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the runops/pr07-replication branch from ef571f9 to 766138c Compare July 3, 2026 17:53
@d-cs d-cs marked this pull request as ready for review July 3, 2026 17:54
devin-ai-integration[bot]

This comment was marked as resolved.

…fore replacing it

The legacy-only instance constructed at boot opens a replication client (Redis + Redlock)
eagerly; when the split gate resolves to a multi-source service it was replaced without
cleanup, leaking one Redis connection per split-enabled boot. shutdown() the bootstrap
instance first (idempotent, safe on the never-started instance).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment thread apps/webapp/test/runsReplicationService.part8.test.ts
@d-cs d-cs enabled auto-merge (squash) July 3, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant