Download spark_python_task workspace files in bundle generate job#5799
Merged
Conversation
bundle generate job only downloaded notebook tasks; spark_python_task files were left as absolute /Workspace paths in the generated config. Download workspace files referenced by spark_python_task and rewrite them to a relative path, matching notebook handling. Git-sourced files and cloud URIs (dbfs:/, s3:/, adls:/, gcs:/) are left untouched. Co-authored-by: Isaac
Co-authored-by: Isaac
Collaborator
Integration test reportCommit: 15267a1
23 interesting tests: 13 SKIP, 10 RECOVERED
Top 5 slowest tests (at least 2 minutes):
|
pietern
approved these changes
Jul 2, 2026
andersrexdb
approved these changes
Jul 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
bundle generate jobonly downloaded notebook tasks. Files referenced byspark_python_taskwere left as absolute/Workspace/...paths in the generated config, so the source file was never downloaded and the config wasn't portable.It now downloads workspace files referenced by
spark_python_taskand rewrites them to a relative path, reusing the samemarkFileForDownloadhelper already used for pipeline libraries. Git-sourced files (source: GIT) and cloud URIs (dbfs:/,s3:/,adls:/,gcs:/) are left untouched.Why
Reported by a user: generating a job with a notebook task and a
spark_python_taskdownloaded only the notebook. Thespark_python_taskbranch was simply never handled inMarkTaskForDownload.Tests
bundle/generate/downloader_test.gocovering the download+rewrite path and the skipped cases (cloud URI,source: GIT).acceptance/bundle/generate/spark_python_task_jobexercising the full CLI: a workspace-file task is downloaded and rewritten, adbfs:/cloud-URI task is preserved. Identical output on bothterraformanddirectengines.This PR was written by Isaac, an AI coding agent.