Skip to content

fix: re-enabled null-equal join dynamic filters with an IS NULL predicate.#23106

Open
mdashti wants to merge 3 commits into
apache:mainfrom
paradedb:moe/null-equal-dynamic-filter
Open

fix: re-enabled null-equal join dynamic filters with an IS NULL predicate.#23106
mdashti wants to merge 3 commits into
apache:mainfrom
paradedb:moe/null-equal-dynamic-filter

Conversation

@mdashti

@mdashti mdashti commented Jun 23, 2026

Copy link
Copy Markdown

Note

Stacked on #23104, which adds the shared probe-NULL helper this builds on. That one should land first; until it does, this PR's diff includes its two commits.

Which issue does this close?

Re-enables the dynamic filter that #22965 disabled (#22964), with the proper null-equal semantics.

Rationale for this change

#22965 disabled hash-join dynamic filter pushdown for null-equal joins: the build-side bounds and membership predicates evaluate to NULL for a probe-side NULL key, so they prune rows that should null-match a build-side NULL. Its description already named the better fix, "generate a predicate with OR IS NULL". #23104 does that for null-aware anti joins; this re-enables the null-equal case the same way.

What changes are included in this PR?

  • Revert the null-equal return false in allow_join_dynamic_filter_pushdown.
  • Generalize the shared probe-NULL helper to cover both null-aware (single-key) and null-equal (multi-key) joins: OR key IS NULL for every nullable probe key. A NOT NULL key never widens the filter, so an all-NOT-NULL join keeps full selectivity.

Are these changes tested?

Yes. #22965's SLT now asserts the filter is back on the probe with the result unchanged, plus a multi-key null-equal case. The reject unit test flips to assert pushdown is allowed, and preserve_probe_nulls unit tests cover both the mixed nullable/NOT NULL case (only the nullable key widens) and the all-NOT-NULL case (no widening).

Are there any user-facing changes?

Null-equal joins regain dynamic filter pushdown, so they prune the probe scan again while returning correct results.

mdashti added 2 commits June 22, 2026 18:29
The hash-join dynamic filter pushed `key IN build_keys` down to the probe
scan for null-aware anti joins too. That drops the probe-side NULL, but
`NOT IN` three-valued logic needs it to collapse the result to zero rows,
so the join silently returned rows.

OR `probe_key IS NULL` into the pushed predicate. Non-NULL probe rows
still get filtered; only the NULL additionally survives.
Exercises the pushdown path the existing in-memory tests miss: parquet with
row-level filtering, so the pushed dynamic filter actually drops rows. Without
the fix `id NOT IN (SELECT eid ...)` returns 1 and 3 instead of zero rows.
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Jun 23, 2026
@mdashti mdashti force-pushed the moe/null-equal-dynamic-filter branch 2 times, most recently from 64e1820 to 3721aa9 Compare June 23, 2026 04:04
@mdashti

mdashti commented Jun 23, 2026

Copy link
Copy Markdown
Author

@adriangb Can you please take a look?

@mdashti mdashti changed the title Re-enabled null-equal join dynamic filters with an IS NULL predicate. fix: re-enabled null-equal join dynamic filters with an IS NULL predicate. Jun 23, 2026
apache#22965 disabled dynamic filter pushdown for null-equal joins because the
build-side predicate prunes a probe-side NULL that can null-match a build-side
NULL. Push the filter with `OR key IS NULL` over the nullable probe keys
instead, the way apache#23104 does for null-aware anti joins. A NOT NULL key never
widens the filter, so an all-NOT-NULL join keeps full selectivity.
@mdashti mdashti force-pushed the moe/null-equal-dynamic-filter branch from 3721aa9 to 9f4e40c Compare June 23, 2026 04:23
let any_key_is_null = self
.on_right
.iter()
.filter(|key| key.nullable(&self.probe_schema).unwrap_or(true))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we widen when we are unable to check column nullability ? i.e. unwrap_or(true).
From what I see this can only happen when on_right and schema are out of sync which seems to be an invalid state ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants