Skip to content

Chore/housekeeping #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Performance Benchmarks

on:
push:
branches: [main, develop]
branches: [main, dev]
pull_request:
branches: [main, develop]
branches: [main, dev]
# Schedule benchmarks to run weekly
schedule:
- cron: "0 0 * * 0" # Run at midnight on Sundays
Expand All @@ -13,12 +13,12 @@ jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for proper comparison

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "pip"
Expand All @@ -28,10 +28,9 @@ jobs:
python -m pip install --upgrade pip
pip install -e .
pip install -r requirements-dev.txt
pip install pytest-benchmark

- name: Restore benchmark data
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: .benchmarks
key: benchmark-${{ runner.os }}-${{ hashFiles('**/requirements*.txt') }}
Expand All @@ -41,15 +40,16 @@ jobs:
- name: Run benchmarks and save baseline
run: |
# Run benchmarks and save results
pytest tests/benchmark_text_service.py -v --benchmark-autosave
python -m pytest tests/benchmark_text_service.py -v --benchmark-autosave --benchmark-json=benchmark-results.json

- name: Check for performance regression
run: |
# Compare against the previous benchmark if available
# Fail if performance degrades by more than 10%
if [ -d ".benchmarks" ]; then
BASELINE=$(ls -t .benchmarks/Linux-CPython-3.10-64bit | head -n 2 | tail -n 1)
CURRENT=$(ls -t .benchmarks/Linux-CPython-3.10-64bit | head -n 1)
benchmark_dir=".benchmarks/Linux-CPython-3.10-64bit"
BASELINE=$(ls -t $benchmark_dir | head -n 2 | tail -n 1)
CURRENT=$(ls -t $benchmark_dir | head -n 1)
if [ -n "$BASELINE" ] && [ "$BASELINE" != "$CURRENT" ]; then
# Set full paths to the benchmark files
BASELINE_FILE="$benchmark_dir/$BASELINE"
Expand All @@ -71,10 +71,12 @@ jobs:
fi

- name: Upload benchmark results
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: .benchmarks/
path: |
.benchmarks/
benchmark-results.json

- name: Alert on regression
if: failure()
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/dev-cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ jobs:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files
run: pre-commit run --all-files --show-diff-on-failure

build:
runs-on: ubuntu-latest
Expand All @@ -46,7 +46,7 @@ jobs:
docker-images: true
swap-storage: true
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Tesseract OCR
Expand All @@ -64,6 +64,8 @@ jobs:
tesseract --list-langs
- name: Install Dependencies
run: |
# Create pip cache directory if it doesn't exist
mkdir -p ~/.cache/pip
pip install -U pip
pip install -e .
pip install tox just pre-commit
Expand Down
42 changes: 42 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Lint

on:
push:
branches: [main, dev]
pull_request:
branches: [main, dev]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "pip"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements-dev.txt

- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

- name: Lint with ruff
run: |
# Run ruff but don't fail the build yet (exit-zero)
ruff check . --exit-zero

- name: Type check with mypy
run: |
# Run mypy but don't fail the build yet
# Use --ignore-missing-imports to ignore missing stubs for third-party libraries
mypy datafog/ --ignore-missing-imports || true
83 changes: 79 additions & 4 deletions .github/workflows/publish-pypi.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: PyPI Release

on:
# Manual trigger with version input
workflow_dispatch:
inputs:
version:
Expand All @@ -10,17 +11,29 @@ on:
description: "Confirm all tests have passed"
type: boolean
required: true
is_prerelease:
description: "Is this a pre-release?"
type: boolean
default: false
required: false
# Auto-trigger for beta releases when merged to dev
push:
branches:
- dev

jobs:
release:
# Job for manual releases (stable or pre-release)
manual_release:
runs-on: ubuntu-latest
if: github.event.inputs.confirm_tests == 'true'
if: github.event_name == 'workflow_dispatch' && github.event.inputs.confirm_tests == 'true'
permissions:
contents: write
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
Expand All @@ -37,9 +50,71 @@ jobs:
git config user.email github-actions@github.com
git tag v${{ github.event.inputs.version }}
git push origin v${{ github.event.inputs.version }}
gh release create v${{ github.event.inputs.version }} --generate-notes
if [ "${{ github.event.inputs.is_prerelease }}" == "true" ]; then
gh release create v${{ github.event.inputs.version }} --prerelease --generate-notes
else
gh release create v${{ github.event.inputs.version }} --generate-notes
fi
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: twine upload dist/*

# Job for automatic beta releases on merge to dev
auto_beta_release:
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/dev'
permissions:
contents: write
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build twine setuptools-scm
- name: Generate beta version
id: beta_version
run: |
# Get the latest tag
LATEST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "0.0.0")
# Remove the 'v' prefix if present
LATEST_VERSION=${LATEST_TAG#v}
# Split version into components
IFS='.' read -r MAJOR MINOR PATCH <<< "$LATEST_VERSION"
# Extract any existing beta suffix
PATCH_NUM=${PATCH%%b*}
# Get commit count since last tag for unique beta number
COMMIT_COUNT=$(git rev-list --count $LATEST_TAG..HEAD 2>/dev/null || echo "1")
# Generate beta version
BETA_VERSION="$MAJOR.$MINOR.$(($PATCH_NUM))b$COMMIT_COUNT"
echo "Generated beta version: $BETA_VERSION"
echo "version=$BETA_VERSION" >> $GITHUB_OUTPUT
# Update version in setup.py or pyproject.toml if needed
# This depends on how your versioning is configured
- name: Build package
run: python -m build
- name: Create GitHub Pre-Release
env:
GITHUB_TOKEN: ${{ secrets.pypi }}
BETA_VERSION: ${{ steps.beta_version.outputs.version }}
run: |
git config user.name github-actions
git config user.email github-actions@github.com
git tag v$BETA_VERSION
git push origin v$BETA_VERSION
gh release create v$BETA_VERSION --prerelease --title "Beta Release v$BETA_VERSION" --notes "Automated beta release from dev branch"
- name: Publish to PyPI as Beta
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
BETA_VERSION: ${{ steps.beta_version.outputs.version }}
run: |
# Ensure package is marked as beta in PyPI
twine upload --skip-existing dist/*
40 changes: 40 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Tests

on:
push:
branches: [main, dev]
pull_request:
branches: [main, dev]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .
pip install -r requirements-dev.txt

- name: Run tests with pytest
run: |
python -m pytest tests/ --cov=datafog --cov-report=xml

- name: Upload coverage report
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
fail_ci_if_error: true
token: ${{ secrets.CODECOV_TOKEN }}
70 changes: 45 additions & 25 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,41 +1,61 @@
# Python bytecode
*.pyc
*.swp
*.prof
__pycache__/

# Distribution / packaging
MANIFEST
dist/
build/
*.egg-info/

# Testing
.coverage
.coverage.*
.cache/
*.egg-info/
.pytest_cache/
.tox/
src/datafog/__pycache__/
src/datafog/pii_tools/__pycache__/
tests/__pycache__/
coverage.xml
htmlcov/
.benchmarks/
tests/scratch.py
tests/.datafog_env/
node_modules/
datafog_debug.log
sotu_2023.txt
.DS_Store
venv/
datafog-python/datafog/processing/image_processing/__pycache__/
datafog-python/datafog/processing/text_processing/__pycache__/
datafog-python/datafog/services/__pycache__/
datafog-python/datafog/processing/__pycache__/
datafog-python/datafog/__pycache__/
error_log.txt

# Environment
.env
coverage.xml
htmlcov/
.venv/
node_modules/
.DS_Store
.venv
venv/
env/
examples/venv/
error_log.txt

# Editors
*.swp
*.swo
.idea/
.vscode/
*.sublime-*

# OS specific
.DS_Store
Thumbs.db

# Logs and debugging
*.log
*.prof
datafog_debug.log

# Project specific
sotu_2023.txt
node_modules/
scratch.py

# Documentation build
docs/_build/
docs/*
!docs/*.rst
!docs/conf.py
scratch.py
.coverage*
.benchmarks
!docs/Makefile
!docs/make.bat

# Keep all directories but ignore their contents
*/**/__pycache__/
Loading