feat: Add support for writing bloom filters#3265
Conversation
|
Thanks for working on this @renaudb. I've noticed that in the |
|
@Fokko the issue is with Bodo, not Boto. Bodo forces |
b86268b to
efd56e8
Compare
|
@Fokko this should be ready for review. |
| @pytest.mark.integration | ||
| @skip_if_bloom_filter_not_supported | ||
| @pytest.mark.parametrize("format_version", [1, 2]) | ||
| def test_write_parquet_bloom_filter_properties( |
There was a problem hiding this comment.
Would it make sense to assert pq.ParquetWriter is called with bloom_filter_options?
There was a problem hiding this comment.
Added a check that the filter offset and and length exist in the underlying files. However, this requires pyarrow 25 which won't be officially released for a few days.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
Closes #850
Note: This PR is currently held back by boto requiring
pyarrow<=23.1as bloom filter write support was added in pyarrow 24.Rationale for this change
Add support for writing bloom filters to parquet files. This changes leverages the new
bloom_filter_optionswrite_parquetargument in pyarrow 24.Are these changes tested?
Added tests for the metadata parsing. Added a very basic test for the writing path (there is currently no way to test for the existence of a bloomfilter in a parquet file using pyarrow).
Are there any user-facing changes?
N/A