[flink][spark] Support dry_run in drop_global_index procedure by XiaoHongbo-Hope · Pull Request #8309 · apache/paimon

XiaoHongbo-Hope · 2026-06-21T07:29:01Z

Purpose

Tests

Add an optional `dry_run` BOOLEAN argument (default false) to the drop_global_index procedure. When true, it reports how many index files would be dropped without committing any change, mirroring the dry_run convention of remove_orphan_files. This lets users verify the index_type / column match (e.g. lumina vs the legacy lumina-vector-ann alias) before the destructive commit, since the delete filter matches index files by exact index type and field ids.

Add the same optional `dry_run` argument to the Spark drop_global_index procedure (mirroring remove_orphan_files): when true it scans the matching index files and returns without committing, so the deletion can be previewed before the destructive commit. Also move the Flink dry-run branch before the empty-match check so a dry run always reports preview semantics (e.g. "0 would be dropped") instead of the "no index found" message, and drop the misleading quotes around the boolean placeholder in the Flink docs.

Address review feedback on dry_run: the Spark procedure previously returned only a boolean, so the matched count -- the whole point of a dry run -- was visible only in the logs. Return the count instead, as remove_orphan_files does, so both dry_run and the normal path report how many index files were (or would be) dropped. The output column changes from `result` (boolean) to `dropped_file_count` (long); the existing tests are updated to assert the returned count.

drop_global_index already shipped in release-1.4 with a `result` BOOLEAN output column, so replacing it would break that released output contract. Add `dropped_file_count` as a second column instead of replacing `result`: existing callers that read `result` keep working, while dry_run and the normal path also return the (would-be-)dropped count.

- Flink: annotate the optional partitions/dry_run arguments with @nullable, consistent with other procedures (e.g. RemoveUnexistingFilesProcedure). - Spark: short-circuit when no index files match, returning (true, 0) instead of committing an empty change, matching the Flink behavior.

Cover the dry_run + partition-filter interaction: a partition-scoped dry run reports a different (smaller) preview count than an unscoped one, and neither commits any change.

JingsongLi · 2026-06-22T10:06:39Z

+1

XiaoHongbo-Hope added 5 commits June 20, 2026 21:17

XiaoHongbo-Hope marked this pull request as ready for review June 21, 2026 08:49

[flink] Add dry_run + partitions ITCase for drop_global_index

f9f1912

Cover the dry_run + partition-filter interaction: a partition-scoped dry run reports a different (smaller) preview count than an unscoped one, and neither commits any change.

JingsongLi closed this Jun 21, 2026

JingsongLi reopened this Jun 21, 2026

JingsongLi merged commit 25b7b0d into apache:master Jun 22, 2026
24 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flink][spark] Support dry_run in drop_global_index procedure#8309

[flink][spark] Support dry_run in drop_global_index procedure#8309
JingsongLi merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:drop_global_index_dry_run

XiaoHongbo-Hope commented Jun 21, 2026

Uh oh!

JingsongLi commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

XiaoHongbo-Hope commented Jun 21, 2026

Purpose

Tests

Uh oh!

JingsongLi commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants