Skip to content

[flink][spark] Support dry_run in drop_global_index procedure#8309

Merged
JingsongLi merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:drop_global_index_dry_run
Jun 22, 2026
Merged

[flink][spark] Support dry_run in drop_global_index procedure#8309
JingsongLi merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:drop_global_index_dry_run

Conversation

@XiaoHongbo-Hope

Copy link
Copy Markdown
Contributor

Purpose

Tests

Add an optional `dry_run` BOOLEAN argument (default false) to the
drop_global_index procedure. When true, it reports how many index files
would be dropped without committing any change, mirroring the dry_run
convention of remove_orphan_files.

This lets users verify the index_type / column match (e.g. lumina vs the
legacy lumina-vector-ann alias) before the destructive commit, since the
delete filter matches index files by exact index type and field ids.
Add the same optional `dry_run` argument to the Spark drop_global_index
procedure (mirroring remove_orphan_files): when true it scans the matching
index files and returns without committing, so the deletion can be
previewed before the destructive commit.

Also move the Flink dry-run branch before the empty-match check so a dry
run always reports preview semantics (e.g. "0 would be dropped") instead
of the "no index found" message, and drop the misleading quotes around the
boolean placeholder in the Flink docs.
Address review feedback on dry_run: the Spark procedure previously
returned only a boolean, so the matched count -- the whole point of a dry
run -- was visible only in the logs. Return the count instead, as
remove_orphan_files does, so both dry_run and the normal path report how
many index files were (or would be) dropped.

The output column changes from `result` (boolean) to `dropped_file_count`
(long); the existing tests are updated to assert the returned count.
drop_global_index already shipped in release-1.4 with a `result` BOOLEAN
output column, so replacing it would break that released output contract.
Add `dropped_file_count` as a second column instead of replacing `result`:
existing callers that read `result` keep working, while dry_run and the
normal path also return the (would-be-)dropped count.
- Flink: annotate the optional partitions/dry_run arguments with @nullable,
  consistent with other procedures (e.g. RemoveUnexistingFilesProcedure).
- Spark: short-circuit when no index files match, returning (true, 0)
  instead of committing an empty change, matching the Flink behavior.
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review June 21, 2026 08:49
Cover the dry_run + partition-filter interaction: a partition-scoped dry
run reports a different (smaller) preview count than an unscoped one, and
neither commits any change.
@JingsongLi JingsongLi closed this Jun 21, 2026
@JingsongLi JingsongLi reopened this Jun 21, 2026
@JingsongLi

Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 25b7b0d into apache:master Jun 22, 2026
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants