refactor[fsst]: take ArrayRef in fsst_compress, decide offset width upfront#7900
Conversation
Merging this PR will improve performance by 16.33%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
169 µs | 205.9 µs | -17.93% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
27 µs | 16.3 µs | +66.21% |
| ⚡ | Simulation | eq_pushdown_low_match |
968.7 µs | 786 µs | +23.24% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
206.1 µs | 169.6 µs | +21.5% |
| ⚡ | Simulation | eq_pushdown_high_match |
1,063.9 µs | 880.5 µs | +20.83% |
| ⚡ | Simulation | chunked_varbinview_opt_into_canonical[(1000, 10)] |
219.4 µs | 183 µs | +19.95% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | pushdown_compare[(10000, 16, 8)] |
264.8 µs | 239.3 µs | +10.64% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
| ⚡ | Simulation | eq_i64_constant |
318.9 µs | 289.1 µs | +10.32% |
| 🆕 | Simulation | eq_pushdown_high_match_view |
N/A | 882.1 µs | N/A |
| 🆕 | Simulation | eq_pushdown_low_match_view |
N/A | 779.4 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mp/fsst-compress-refactor (a4b73a5) with develop (d53301e)
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.895x ✅ datafusion / vortex-file-compressed (0.895x ✅, 5↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.975x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.983x ➖, 1↑ 1↓)
datafusion / parquet (0.939x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.987x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.011x ➖, 0↑ 1↓)
duckdb / parquet (0.951x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.994x ➖, 0↑ 0↓)
datafusion / parquet (1.003x ➖, 0↑ 1↓)
datafusion / arrow (1.000x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.997x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.990x ➖, 0↑ 0↓)
duckdb / parquet (0.999x ➖, 0↑ 1↓)
duckdb / duckdb (0.997x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
|
What happens if the compress actually increase the array code size this estimate might still be incorrect |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.010x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.011x ➖, 0↑ 0↓)
datafusion / parquet (1.012x ➖, 1↑ 4↓)
duckdb / vortex-file-compressed (1.013x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.010x ➖, 0↑ 1↓)
duckdb / parquet (1.009x ➖, 0↑ 0↓)
duckdb / duckdb (1.001x ➖, 3↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.942x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.142x ➖, 0↑ 0↓)
datafusion / parquet (1.078x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.033x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.056x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.964x ➖ unknown / unknown (0.980x ➖, 2↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (0.987x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.004x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.002x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
datafusion / arrow (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
duckdb / duckdb (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.968x ➖, 0↑ 1↓)
datafusion / parquet (0.939x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.986x ➖, 2↑ 6↓)
duckdb / parquet (0.967x ➖, 1↑ 0↓)
duckdb / duckdb (0.950x ➖, 5↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.134x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.044x ➖, 0↑ 1↓)
datafusion / parquet (1.049x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (1.033x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.084x ➖, 0↑ 0↓)
duckdb / parquet (1.026x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.008x ➖ unknown / unknown (1.004x ➖, 0↑ 3↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.068x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.056x ➖, 0↑ 1↓)
datafusion / parquet (0.994x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.062x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.051x ➖, 0↑ 0↓)
duckdb / parquet (1.087x ➖, 0↑ 0↓)
Full attributed analysis
|
|
the upper bound of fsst is len*2+7, we can choose the type that fits this value and that WILL always work, so anything in 2**31-4 length will be u32 and anything above will be u64 |
|
I think you still want a compress method that can take ArrayRef so callers don't have to know what they're compressing and we don't have to expose internal of how fsst does compression, then internally you can switch on the encoding type. |
|
was there a merge that went wrong? There seems to be a bunch of changes in here that are not relevant to this PR |
|
re-merged develop |
|
This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days |
3dec60f to
ceb5796
Compare
|
Rebased onto develop as a single clean commit (the earlier merge noise is gone — the diff is now 23 fsst-related files). Status of review feedback:
The rebase also folds in develop-side changes: the #8324 test optimization (empty symbol table, ~half the memory) now runs through the new API, |
…pfront fsst_compress and fsst_train_compressor take an ArrayRef and dispatch internally on VarBinView/VarBin encoding, so callers don't need to know which string encoding they hold. The codes-offsets width (i32 vs i64) is chosen upfront from the FSST worst-case bound (2 * input bytes + 7 per non-null row), which always covers the actual output. Follow-up to #7853 addressing review feedback on #7833. Signed-off-by: mprammer <martin@spiraldb.com> Signed-off-by: Robert Kruszewski <github@robertk.io> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e626e80 to
330f68d
Compare
Signed-off-by: Robert Kruszewski <github@robertk.io>
Summary
Refactors
fsst_compressandfsst_train_compressorto take anArrayRefand dispatch internally on the input encoding (VarBinVieworVarBin), so callers don't need to know which string encoding they hold. The codes-offsets width (i32 vs i64) is decided upfront from the FSST worst-case bound (2 × input bytes + 7 per non-null row), which always covers the actual output regardless of how well the data compresses.Follow-up to #7853 addressing review feedback on #7833.
🤖 Generated with Claude Code