Skip to content

feat: Milestone 2 — Structured Run Manifest System (#64)#67

Open
DhanashreePetare wants to merge 27 commits into
dbpedia:gsoc-2026from
DhanashreePetare:gsoc-2026
Open

feat: Milestone 2 — Structured Run Manifest System (#64)#67
DhanashreePetare wants to merge 27 commits into
dbpedia:gsoc-2026from
DhanashreePetare:gsoc-2026

Conversation

@DhanashreePetare

Copy link
Copy Markdown
Collaborator

Pull Request

Description

Introduces the Structured Run Manifest System (Milestone 2) for the Databus Python Client. When --manifest is passed to any of the three existing commands (download, deploy, delete), a JSON-LD manifest file is written recording the complete details of the operation — input parameters, per-file URLs, checksums, byte sizes, timestamps, success/failure status, and a structured execution summary.
The manifest uses the DataID vocabulary (the same vocabulary used by the Databus platform itself), is versioned via dbus:schemaVersion, and never writes sensitive credentials. If the manifest file already exists at the given path, it auto-suffixes (run_1.jsonld, run_2.jsonld, ...) with a warning rather than silently overwriting. Passing a directory path instead of a file path raises a clear error. Manifest writing failure warns and continues — the exit code reflects the actual operation result, not the manifest write.
The manifest is written even when the operation itself fails, capturing whatever partial results were recorded before the failure — enabling the debugging use case described in the proposal (Use Case 5).

Related Issues
#64

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • This change requires a documentation update
  • Housekeeping

Checklist:

  • My code follows the ruff code style of this project.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
    • poetry run pytest - all tests passed
    • poetry run ruff check - no linting errors

DhanashreePetare added 25 commits June 5, 2026 15:28
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62b470be-e962-4d90-b4e9-d76f4c04531b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Comment thread README.md Outdated
Comment on lines +678 to +704
```

## Manifest

All three commands support an optional `--manifest` flag that writes a structured JSON-LD record of the operation to disk:

```bash
databusclient download https://databus.dbpedia.org/dbpedia/generic/labels/2023.12.01 \
--manifest ./manifests/labels-download.jsonld

databusclient deploy --version-id https://databus.dbpedia.org/myaccount/mygroup/mydata/1.0 \
--title "My Dataset" --abstract "..." --description "..." \
--license https://creativecommons.org/licenses/by-sa/3.0/ \
--apikey YOUR_KEY --manifest ./manifests/deploy-run.jsonld \
myfile.nt

databusclient delete https://databus.dbpedia.org/myaccount/mygroup/mydata/1.0 \
--databus-key YOUR_KEY --manifest ./manifests/delete-run.jsonld
```

The manifest records input parameters, per-file URLs, checksums, byte sizes, timestamps, and success/failure status for each file. It uses the DataID vocabulary and is versioned via `dbus:schemaVersion`.

- If the target path already exists, the manifest is written to an auto-suffixed path (e.g. `run_1.jsonld`) with a warning.
- Sensitive fields (API keys, vault tokens) are never written.
- If manifest writing fails, a warning is printed and the exit code reflects the actual operation result.

See `examples/reproducible-download.md` for a full walkthrough. No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement in the REAMDE could be better. At the top of the README is Table of Contents. There I would sort it in under:

- [CLI Usage](#cli-usage)
  - [Download](#cli-download)
  - [Deploy](#cli-deploy)
  - [Delete](#cli-delete)
  - [Manifest](#cli-manifest) <-- new

Accordingly the doc for Manifest should be placed below cli-delete

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread .gitignore Outdated
Comment on lines +33 to +37
# Explicitly un-ignore the manifest module folder (MANIFEST above is for Python packaging artifacts)
!databusclient/manifest/
!databusclient/manifest/**
databusclient/manifest/__pycache__/
*.py[cod]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like code from agent :D
Unless there is a reason to keep it, remove it. Moreover *.py[cod] is already present on line 8, and __pycache__/ on line 7

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, these lines were added to fix a Windows-specific gitignore conflict — the MANIFEST pattern on line 33 (for Python packaging artifacts) was case-insensitively matching databusclient/manifest/ on Windows, preventing the manifest module directory from being committed. The negation lines were the workaround at the time but now since the files are now tracked by git, the conflict no longer applies and I've removed them. Thanks for pointing this out truly.

Comment thread README.md Outdated
All three commands support an optional `--manifest` flag that writes a structured JSON-LD record of the operation to disk:

```bash
databusclient download https://databus.dbpedia.org/dbpedia/generic/labels/2023.12.01 \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use existing examples: https://databus.dbpedia.org/dbpedia/generic/labels/2023.12.01 does not exist

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 152 to 154
if queue is not None:
queue.add_uri(databusURI)
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real delete manifests are empty for successful deletions.

databusclient/api/delete.py:152 queues every non-dry-run delete and returns before recording. The public delete() always uses a queue, but DeleteQueue.execute() calls _delete_list() without passing
manifest_context at databusclient/api/delete.py:73. Result: databusclient delete ... --manifest out.jsonld can delete resources successfully while the manifest reports zero files.

{
  "@context": {
    "dataid": "http://dataid.dbpedia.org/ns#",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcterms": "http://purl.org/dc/terms/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "dbus": "http://databus.dbpedia.org/manifest/ns#"
  },
  "@type": "dbus:OperationManifest",
  "dbus:schemaVersion": "1.0",
  "dbus:clientVersion": "0.15",
  "dbus:command": "delete",
  "dcterms:issued": {
    "@value": "2026-07-03T09:15:02.095679+00:00",
    "@type": "xsd:dateTime"
  },
  "dbus:replayParams": {
    "databusURIs": [
      "https://databus.dev.dbpedia.link/fhofer/group1/artifact1/2027-07-03"
    ],
    "dry_run": false
  },
  "dataid:distribution": {
    "@type": "dataid:Distribution",
    "dataid:file": []
  },
  "dbus:executionResult": {
    "@type": "dbus:ExecutionSummary",
    "dbus:totalFiles": 0,
    "dbus:succeeded": 0,
    "dbus:failed": 0,
    "dbus:totalBytes": 0
  }
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, verified by deleting a deployed test dataset.

Comment thread databusclient/api/download.py Outdated
Comment on lines +528 to +535
if manifest_context is not None:
manifest_context.record_file(
url=url,
status="success",
sha256=actual_checksum or expected_checksum,
size_bytes=total_size_in_bytes if total_size_in_bytes else None,
downloaded_at=datetime.now(timezone.utc).isoformat(),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download manifests can record a file as successful before the requested conversion succeeds.

_download_file() writes the success entry at databusclient/api/download.py:526, but compression/format conversion runs afterward through databusclient/api/download.py:537. If convert_file() or recompression fails later, the manifest still contains a successful file entry for an operation whose final output was not produced

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants