[flink][spark] Support dry_run in drop_global_index procedure#8309
Merged
JingsongLi merged 6 commits intoJun 22, 2026
Conversation
Add an optional `dry_run` BOOLEAN argument (default false) to the drop_global_index procedure. When true, it reports how many index files would be dropped without committing any change, mirroring the dry_run convention of remove_orphan_files. This lets users verify the index_type / column match (e.g. lumina vs the legacy lumina-vector-ann alias) before the destructive commit, since the delete filter matches index files by exact index type and field ids.
Add the same optional `dry_run` argument to the Spark drop_global_index procedure (mirroring remove_orphan_files): when true it scans the matching index files and returns without committing, so the deletion can be previewed before the destructive commit. Also move the Flink dry-run branch before the empty-match check so a dry run always reports preview semantics (e.g. "0 would be dropped") instead of the "no index found" message, and drop the misleading quotes around the boolean placeholder in the Flink docs.
Address review feedback on dry_run: the Spark procedure previously returned only a boolean, so the matched count -- the whole point of a dry run -- was visible only in the logs. Return the count instead, as remove_orphan_files does, so both dry_run and the normal path report how many index files were (or would be) dropped. The output column changes from `result` (boolean) to `dropped_file_count` (long); the existing tests are updated to assert the returned count.
drop_global_index already shipped in release-1.4 with a `result` BOOLEAN output column, so replacing it would break that released output contract. Add `dropped_file_count` as a second column instead of replacing `result`: existing callers that read `result` keep working, while dry_run and the normal path also return the (would-be-)dropped count.
- Flink: annotate the optional partitions/dry_run arguments with @nullable, consistent with other procedures (e.g. RemoveUnexistingFilesProcedure). - Spark: short-circuit when no index files match, returning (true, 0) instead of committing an empty change, matching the Flink behavior.
Cover the dry_run + partition-filter interaction: a partition-scoped dry run reports a different (smaller) preview count than an unscoped one, and neither commits any change.
Contributor
|
+1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Tests