Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# project-specific
tmp/
vault-token.dat
test-download/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Changelog

All notable changes to this project will be documented in this file.

## [0.15] - 2025-12-31

### Added
- Vault authentication improvements with host-restricted token exchange
- Comprehensive tests for Vault authentication behavior
- Enhanced docstrings across all modules for better documentation coverage
- Support for download redirect handling

### Fixed
- Vault token exchange now restricted to known hosts for improved security
- Clearer authentication error messages
- README instructions now consistent with PyPI release

### Changed
- Updated CLI usage documentation to reflect current command structure
- Improved error handling in download operations

### Notes
- Version 0.15 skips 0.13 and 0.14 as requested in issue #35
- This release updates the PyPI package to align with current repository features
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ Before using the client, install it via pip:
python3 -m pip install databusclient
```

Note: the PyPI release was updated and this repository prepares version `0.15`. If you previously installed `databusclient` via `pip` and observe different CLI behavior, upgrade to the latest release:

```bash
python3 -m pip install --upgrade databusclient==0.15
```

You can then use the client in the command line:

```bash
Expand Down Expand Up @@ -164,6 +170,8 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOAD
- If no `--localdir` is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e. `./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/`.
- `--vault-token`
- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with `--vault-token /path/to/vault-token.dat`. See [Registration (Access Token)](#registration-access-token) for details on how to get a vault token.

Note: Vault tokens are only required for certain protected Databus hosts (for example: `data.dbpedia.io`, `data.dev.dbpedia.link`). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass `--vault-token` for public downloads.
- `--databus-key`
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.

Expand Down
97 changes: 97 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Release Notes for databusclient 0.15

## Overview
This release addresses issue #35 by providing a new PyPI package (version 0.15) to ensure `pip install databusclient` provides the latest CLI features and bug fixes.

## Version
**0.15** (skipping 0.13 and 0.14 as requested)

## What's New

### Features & Improvements
- **Vault Authentication Enhancement**: Host-restricted token exchange for improved security
- **Better Error Messages**: Clearer authentication error messages for easier debugging
- **Download Redirect Handling**: Improved handling of redirects during file downloads
- **Comprehensive Documentation**: Enhanced docstrings across all modules

### Bug Fixes
- Fixed Vault token exchange to only work with known hosts
- Improved error handling in download operations
- Aligned README with current CLI behavior

### Testing
- Added comprehensive tests for Vault authentication
- Improved test coverage overall

## Installation

After this release is published to PyPI, users can install or upgrade with:

```bash
pip install databusclient==0.15
# or to upgrade
pip install --upgrade databusclient
```

## Build Artifacts

The following distribution files have been created and validated:
- `databusclient-0.15-py3-none-any.whl` (wheel format)
- `databusclient-0.15.tar.gz` (source distribution)

Both files have passed `twine check` validation.

## Publishing Instructions

### Prerequisites
1. PyPI account with maintainer access to the `databusclient` package
2. PyPI API token configured

### Steps to Publish

1. **Verify the build artifacts** (already done):
```bash
poetry build
twine check dist/*
```

2. **Upload to TestPyPI** (recommended first):
```bash
twine upload --repository testpypi dist/*
```
Then test installation:
```bash
pip install --index-url https://test.pypi.org/simple/ databusclient==0.15
```

3. **Upload to PyPI**:
```bash
twine upload dist/*
```

4. **Create a Git tag**:
```bash
git tag -a v0.15 -m "Release version 0.15"
git push origin v0.15
```

5. **Create a GitHub Release**:
- Go to GitHub repository → Releases → Draft a new release
- Choose tag `v0.15`
- Release title: `databusclient 0.15`
- Copy content from CHANGELOG.md
- Attach the dist files as release assets

## Verification

After publishing, verify the release:
```bash
pip install --upgrade databusclient==0.15
databusclient --version
databusclient --help
```

## Notes
- This release resolves issue #35
- The PyPI package will now be consistent with the repository's CLI documentation
- Version numbers 0.13 and 0.14 were intentionally skipped as requested
14 changes: 14 additions & 0 deletions databusclient/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,22 @@
"""Top-level package for the databus Python client.

This module exposes a small set of convenience functions and the CLI
entrypoint so the package can be used as a library or via
``python -m databusclient``.
"""

from databusclient import cli
from databusclient.api.deploy import create_dataset, create_distribution, deploy

__version__ = "0.15"
__all__ = ["create_dataset", "deploy", "create_distribution"]


def run():
"""Start the Click CLI application.

This function is used by the ``__main__`` module and the package
entrypoint to invoke the command line interface.
"""

cli.app()
18 changes: 17 additions & 1 deletion databusclient/__main__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
"""Module used for ``python -m databusclient`` execution.

Runs the package's CLI application.
"""

from databusclient import cli

cli.app()

def main():
"""Invoke the CLI application.

Kept as a named function for easier testing and clarity.
"""

cli.app()


if __name__ == "__main__":
main()
27 changes: 27 additions & 0 deletions databusclient/api/delete.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
"""Helpers for deleting Databus resources via the Databus HTTP API.

This module provides utilities to delete groups, artifacts and versions on a
Databus instance using authenticated HTTP requests. The class `DeleteQueue`
also allows batching of deletions.
"""

import json
from typing import List

Expand All @@ -16,23 +23,43 @@ class DeleteQueue:
"""

def __init__(self, databus_key: str):
"""Create a DeleteQueue bound to a given Databus API key.

Args:
databus_key: API key used to authenticate deletion requests.
"""
self.databus_key = databus_key
self.queue: set[str] = set()

def add_uri(self, databusURI: str):
"""Add a single Databus URI to the deletion queue.

The URI will be deleted when `execute()` is called.
"""
self.queue.add(databusURI)

def add_uris(self, databusURIs: List[str]):
"""Add multiple Databus URIs to the deletion queue.

Args:
databusURIs: Iterable of full Databus URIs.
"""
for uri in databusURIs:
self.queue.add(uri)

def is_empty(self) -> bool:
"""Return True if the queue is empty."""
return len(self.queue) == 0

def is_not_empty(self) -> bool:
"""Return True if the queue contains any URIs."""
return len(self.queue) > 0

def execute(self):
"""Execute all queued deletions.

Each queued URI will be deleted using `_delete_resource`.
"""
for uri in self.queue:
print(f"[DELETE] {uri}")
_delete_resource(
Expand Down
42 changes: 42 additions & 0 deletions databusclient/api/deploy.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
"""Build and publish Databus datasets (JSON-LD) from provided metadata.

This module exposes helpers to create distribution strings, compute file
information (sha256 and size), construct dataset JSON-LD payloads and
publish them to a Databus instance using the Databus publish API.
"""

import hashlib
import json
from enum import Enum
Expand Down Expand Up @@ -25,6 +32,13 @@ class DeployLogLevel(Enum):


def _get_content_variants(distribution_str: str) -> Optional[Dict[str, str]]:
"""Parse content-variant key/value pairs from a distribution string.

The CLI supports passing a distribution as ``url|lang=en_type=parsed|...``.
This helper extracts the ``lang``/``type`` style key/value pairs as a
dictionary.
"""

args = distribution_str.split("|")

# cv string is ALWAYS at position 1 after the URL
Expand All @@ -50,6 +64,12 @@ def _get_content_variants(distribution_str: str) -> Optional[Dict[str, str]]:
def _get_filetype_definition(
distribution_str: str,
) -> Tuple[Optional[str], Optional[str]]:
"""Extract an explicit file format and compression from a distribution string.

Returns (file_extension, compression) where each may be ``None`` if the
format should be inferred from the URL path.
"""

file_ext = None
compression = None

Expand Down Expand Up @@ -87,6 +107,12 @@ def _get_filetype_definition(


def _get_extensions(distribution_str: str) -> Tuple[str, str, str]:
"""Return tuple `(extension_part, format_extension, compression)`.

``extension_part`` is the textual extension appended to generated
filenames (e.g. ".ttl.gz").
"""

extension_part = ""
format_extension, compression = _get_filetype_definition(distribution_str)

Expand Down Expand Up @@ -126,6 +152,11 @@ def _get_extensions(distribution_str: str) -> Tuple[str, str, str]:


def _get_file_stats(distribution_str: str) -> Tuple[Optional[str], Optional[int]]:
"""Parse an optional ``sha256sum:length`` tuple from a distribution string.

Returns (sha256sum, content_length) or (None, None) when not provided.
"""

metadata_list = distribution_str.split("|")[1:]
# check whether there is the shasum:length tuple separated by :
if len(metadata_list) == 0 or ":" not in metadata_list[-1]:
Expand All @@ -146,6 +177,12 @@ def _get_file_stats(distribution_str: str) -> Tuple[Optional[str], Optional[int]


def _load_file_stats(url: str) -> Tuple[str, int]:
"""Download the file at ``url`` and compute its SHA-256 and length.

This is used as a fallback when the caller did not supply checksum/size
information in the CLI or metadata file.
"""

resp = requests.get(url, timeout=30)
if resp.status_code >= 400:
raise requests.exceptions.RequestException(response=resp)
Expand All @@ -156,6 +193,11 @@ def _load_file_stats(url: str) -> Tuple[str, int]:


def get_file_info(distribution_str: str) -> Tuple[Dict[str, str], str, str, str, int]:
"""Return parsed file information for a distribution string.

Returns a tuple `(cvs, format_extension, compression, sha256sum, size)`.
"""

cvs = _get_content_variants(distribution_str)
extension_part, format_extension, compression = _get_extensions(distribution_str)

Expand Down
Loading