delta lake python/yaml#39052
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for reading Delta Lake tables in Apache Beam's YAML SDK. It adds the necessary infrastructure to manage Delta Lake read transforms, including updating the expansion service, defining the read transform, and providing comprehensive integration tests to ensure functionality. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for reading Delta Lake tables within Apache Beam, enabling the integration of the Delta Lake read schema transform into the Python SDK and YAML pipelines. The changes include adding Java tests, updating the expansion service, mapping the new transform in Python, and adding YAML integration tests. The review feedback recommends lazily importing optional dependencies like pyarrow to avoid import errors, specifying UTF-8 encoding when writing files, and using Dict instead of Mapping for type hints to prevent potential NameErrors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Code Review
This pull request adds support for reading Delta Lake tables in Apache Beam, including Java-based schema transform provider tests, Python managed read transforms, and YAML integration tests. The review feedback suggests several improvements: replacing File.mkdirs() with Files.createDirectories() in Java tests to handle directory creation failures properly, importing pyarrow lazily in Python integration tests to avoid module-level import errors, and specifying encoding="utf-8" when opening files for writing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39052 +/- ##
=========================================
Coverage 55.89% 55.89%
Complexity 3913 3913
=========================================
Files 1320 1320
Lines 180971 181029 +58
Branches 2671 2671
=========================================
+ Hits 101148 101187 +39
- Misses 76931 76950 +19
Partials 2892 2892
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for reading Delta Lake tables in Apache Beam, including Java tests, Python bindings, and YAML integration. The feedback recommends importing pyarrow locally within the integration test helper to prevent module-level ImportErrors when the library is not installed. Additionally, it suggests restricting the Parquet write transform in the Java tests to a single shard to avoid potential race conditions when writing to a static filename.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| with: | ||
| python-version: default | ||
| java-version: | | ||
| 17 |
There was a problem hiding this comment.
"11 17" is effectively same as "17". Either change to 17 (and remove java17Home parameter) or add a "testJavaVersion=17" parameter and make the target honors it (like in https://github.com/apache/beam/pull/39064/changes#diff-0435a83a413ec063bf7e682cadcd56776cd18fc878f197cc99a65fc231ef2047)
| java-version: | | ||
| 17 | ||
| 11 | ||
| 17 |
Fixes: #38709
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.