Skip to content

fix: preserve AND/OR precedence in composite filters#29

Open
jpetey75 wants to merge 1 commit into
mainfrom
fix/composite-filter-semantics
Open

fix: preserve AND/OR precedence in composite filters#29
jpetey75 wants to merge 1 commit into
mainfrom
fix/composite-filter-semantics

Conversation

@jpetey75

Copy link
Copy Markdown
Contributor

Closes #23

Problem

Composite filters were flat — they held a list of rules with a single aggregation. Two consequences, both of which silently changed the query a user wrote:

  1. .filter() after an OR group appended into the OR instead of AND-ing:

    model.query()
      .filter((status == "active") | (status == "pending"))
      .filter(status != "deleted")

    Docs promise (active OR pending) AND (not deleted); the SDK produced active OR pending OR (not deleted).

  2. Nested expressions flattened, losing precedence: a & (b | c) serialized as a & b & c.

For exploratory/notebook work this is a silent wrong-population bug.

Change

  • CompositeFilter is now a treefilters may contain nested CompositeFilter groups. A shared _combine() flattens a child group only when it already uses the same aggregation, so a & (b | c) keeps (b | c) nested.
  • to_dict() serializes nested groups and projects dimension vs table-calculation filters into their separate top-level groups. The one shape the API genuinely can't represent — an OR across the two field types — now raises a clear ValueError instead of serializing to the wrong thing (it previously produced a payload the API silently AND-ed).
  • Query.filter() always ANDs the new condition with everything so far, keeping any existing OR group nested.

Verified

Live, end-to-end through the SDK — the exact bug case:

(month == "2025-01-01" | month == "2025-03-01")  &  (month != "2025-01-01")
  • Payload: dimensions.and = [ {or: [Jan, Mar]}, {notEquals Jan} ]
  • Result: only March (pre-fix would return every month).
  • Separately confirmed the v2 endpoint applies nested groups correctly.

Tests

tests/test_composite_filter_semantics.py (12): nested AND-of-OR / OR-of-AND, the documented complex example, same-operator flattening still works (incl. the #20 same-field range), .filter()-always-ANDs incl. after an OR, dimension+calc AND splitting, and the cross-type-OR error. Full unit suite green (143 passed).

Docs

docs/SDK_GUIDE.md updated to state precedence is preserved, show the .filter()-as-a-unit AND example, and note the dimension/table-calc OR limitation.

🤖 Generated with Claude Code

Composite filters were flat: nested boolean expressions collapsed into a single
aggregation, and `.filter()` after an OR group appended into that OR instead of
AND-ing. Both silently changed the query population a user wrote.

- `CompositeFilter` is now a tree — `filters` may hold nested groups. A shared
  `_combine()` flattens a child only when it already uses the same aggregation,
  so `a & (b | c)` keeps `(b | c)` nested.
- `to_dict()` serializes nested groups and projects dimension vs table-calc
  filters into their separate top-level groups. The one shape the API cannot
  represent — OR across the two field types — raises a clear error instead of
  serializing to the wrong thing.
- `Query.filter()` always ANDs the new condition with everything so far, keeping
  any existing OR group nested: `.filter(a | b).filter(c)` => `(a OR b) AND c`.

Verified live: `(month==Jan | month==Mar).filter(month != Jan)` now returns only
March (previously every month). The API confirmed to apply nested groups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jpetey75

Copy link
Copy Markdown
Contributor Author

Backward-compatibility verification (for review)

Since this reworks the filter engine, I compared to_dict() output between main and this branch across a battery of existing filter patterns. All correct existing usage serializes byte-for-byte identically — only the three previously-broken cases change.

Existing patterns — identical on main vs. this branch ✅

Verified producing the exact same payload: single filter · a & b · a | b · chained a & b & c · chained a | b | c · same-field range (d >= x) & (d <= y) (the #20 behaviour) · CompositeFilter(..., aggregation="and") · CompositeFilter(..., aggregation="or") · table-calc-only · mixed dim & calc · .filter(a).filter(b) · single .filter().

The only behavioural changes — all cases main got wrong

Expression main (before) This branch (after)
.filter(a | b).filter(c) a OR b OR c (a OR b) AND c
a & (b | c) a AND b AND c a AND (b OR c)
(dimension) | (table_calc) dim AND calc (silently!) ❌ raises a clear ValueError

Notes for the reviewer

  • Nothing here is released — current PyPI is 1.0.1; this ships with the rest of the unreleased filter work in the next version, so there are no existing released users to affect.
  • filter.py is net +30 lines (41 removed / 71 added): the removed lines are the old verbose __and__/__or__ branches + flat to_dict(), replaced by the shared _combine() helper and a recursive serializer — refactor + extend, no lost functionality.
  • The one shape the Lightdash API can't represent — OR across the dimensions and tableCalculations groups (they're separate groups the API always ANDs) — now raises instead of mis-serializing.

Live end-to-end

(month == Jan | month == Mar).filter(month != Jan) → payload nests as AND[ OR[Jan, Mar], notEquals Jan ] → returns only March against the live API (pre-fix would return every month). Nested-group support independently confirmed against the v2 endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarify or fix composite filter boolean semantics

1 participant