Skip to content

Use uv project manager and add CI tests #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Use uv project manager and add CI tests #12

wants to merge 10 commits into from

Conversation

wengh
Copy link
Collaborator

@wengh wengh commented May 28, 2025

Changes

  • Migrate from Poetry specific pyproject.toml to the standard format. And switch to uv project manager
    • lock files are committed to ensure that the environemnt is reproducible
  • Add ci.yml to test with combinations of different python (3.9, 3.13) and pyspark (3.5.6, >=4.0.0) versions on Github PRs
  • Fix tests to make them compatible with pyspark 3.5.6, and to skip write tests if HF_TOKEN is missing

Testing

  • See CI results

@wengh wengh requested a review from Copilot May 28, 2025 02:54
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Migrates the project from Poetry to the UV build system by adopting PEP 621 metadata and restructuring dependencies.

  • Consolidate metadata under a PEP 621 [project] table instead of [tool.poetry]
  • Define core and development dependencies using [project.dependencies] and [dependency-groups]
  • Configure uv_build as the PEP 517 build backend and pin the development Python version

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pyproject.toml Replace Poetry fields with PEP 621 [project], add dependency-groups, and switch to uv_build backend
.python-version Pin the Python interpreter version to match the project’s minimum requirement

@wengh wengh changed the title Use uv project manager Use uv project manager and add CI tests May 30, 2025
@wengh wengh requested a review from Copilot May 30, 2025 03:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the uv project manager for dependency management and adds CI tests to ensure cross-version compatibility with Spark and Python.

  • Updates test files to improve compatibility with PySpark 3.x by replacing the use of toArrow().
  • Migrates project metadata to the new PEP 621 format and switches to uv_build.
  • Introduces a GitHub Actions workflow that runs tests on multiple Python versions and package combinations.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_huggingface_writer.py Updates test logic and fixture for environment-based token and converts DataFrame using pyarrow for Spark 3.x compatibility.
tests/test_huggingface.py Adds import for pyspark_huggingface.
pyproject.toml Migrates project configuration to PEP 621 and defines dependency groups using uv_build.
.python-version Specifies the default Python version for local development.
.github/workflows/ci.yml Introduces a CI workflow with a matrix for Python and package versions.
Comments suppressed due to low confidence (1)

tests/test_huggingface_writer.py:134

  • Collecting all rows from the DataFrame into memory for conversion to a PyArrow table may impact performance if the test data scale increases. Consider using a method that processes data in batches or limiting the test dataset size.
arrow_table = pa.Table.from_pylist([row.asDict() for row in df.collect()], schema=to_arrow_schema(df.schema))

@@ -0,0 +1 @@
3.9
Copy link
Preview

Copilot AI May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .python-version file statically specifies Python 3.9, yet the CI workflow includes Python 3.13. It would be helpful to document the intended supported Python versions or update .python-version accordingly for consistency.

Copilot uses AI. Check for mistakes.

@wengh wengh marked this pull request as ready for review May 30, 2025 03:32
@wengh wengh requested review from lhoestq and allisonwang-db May 31, 2025 01:08
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants