Skip to content

[parquet] add vector support for parquet#8282

Merged
JingsongLi merged 2 commits into
apache:masterfrom
steFaiz:support_vector_parquet
Jun 22, 2026
Merged

[parquet] add vector support for parquet#8282
JingsongLi merged 2 commits into
apache:masterfrom
steFaiz:support_vector_parquet

Conversation

@steFaiz

@steFaiz steFaiz commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Purpose

Currently parquet format do not recognize vector type. This will cause UnsupportedOperationException if users' do not store vectors in separated format. e.g. Lance, Vortex

Tests

UnitTests

@JingsongLi

Copy link
Copy Markdown
Contributor

Thanks for adding Parquet vector support. One correctness concern: VectorType is fixed-size, but this implementation encodes it as a Parquet LIST and currently accepts whatever list length is present. On the write path, ParquetRowDataWriter writes row.getVector(...).size() without checking it against VectorType.getLength(); on the read path, CastedVectorColumnVector returns a ColumnarVec using the Parquet list length, not the declared vector length. That can let malformed vectors be written/read and then fail later in Spark/DataConverter or vector indexing with less context. Could we validate the vector length at the Parquet boundary, and ideally add a test for mismatched length?

@steFaiz

steFaiz commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your review! Fixed

@JingsongLi

Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 6342ba8 into apache:master Jun 22, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants