Skip to content

Commit b1e789f

Browse files
authored
Merge pull request #76 from DataFog/chore/housekeeping
# Housekeeping for Version 4.1.0 ## Summary This PR implements comprehensive housekeeping improvements for the 4.1.0 release, enhancing build reliability, code quality, and CI/CD processes. ## Key Changes ### Dependency Management - Pinned exact versions in requirements.txt and requirements-dev.txt for reproducible builds - Updated dependencies to latest compatible versions (Sphinx 7.2.6, pytest 7.4.0, etc.) - Ensured compatibility across Python 3.10, 3.11, and 3.12 ### CI/CD Improvements - Updated GitHub Actions workflows to use latest versions (actions/setup-python@v5) - Changed branch references from 'develop' to 'dev' across all workflow files - Made mypy type checking non-blocking with appropriate flags - Added ruff linting to pre-commit configuration - Fixed benchmark workflow comparison script - Improved dev-cicd.yml workflow with better error handling - Enhanced PyPI release workflow to support automatic beta releases on merge to dev ### Code Quality - Fixed multiple mypy type errors across the codebase - Added proper type annotations in key files - Added `__all__` declarations to module __init__.py files - Improved code formatting with black and isort ### Documentation & Organization - Extended .gitignore to better handle build artifacts and development files - Updated CHANGELOG.MD with 4.1.0 features and improvements - Added CI workflow badges to README.md ## Next Steps - Continue addressing remaining mypy type issues - Monitor CI workflows for any issues - Begin planning for version 4.2.0 features
2 parents 1306c01 + 3ed4075 commit b1e789f

39 files changed

+308
-148
lines changed

.github/workflows/benchmark.yml

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ name: Performance Benchmarks
22

33
on:
44
push:
5-
branches: [main, develop]
5+
branches: [main, dev]
66
pull_request:
7-
branches: [main, develop]
7+
branches: [main, dev]
88
# Schedule benchmarks to run weekly
99
schedule:
1010
- cron: "0 0 * * 0" # Run at midnight on Sundays
@@ -13,12 +13,12 @@ jobs:
1313
benchmark:
1414
runs-on: ubuntu-latest
1515
steps:
16-
- uses: actions/checkout@v3
16+
- uses: actions/checkout@v4
1717
with:
1818
fetch-depth: 0 # Fetch all history for proper comparison
1919

2020
- name: Set up Python
21-
uses: actions/setup-python@v4
21+
uses: actions/setup-python@v5
2222
with:
2323
python-version: "3.10"
2424
cache: "pip"
@@ -28,10 +28,9 @@ jobs:
2828
python -m pip install --upgrade pip
2929
pip install -e .
3030
pip install -r requirements-dev.txt
31-
pip install pytest-benchmark
3231
3332
- name: Restore benchmark data
34-
uses: actions/cache@v3
33+
uses: actions/cache@v4
3534
with:
3635
path: .benchmarks
3736
key: benchmark-${{ runner.os }}-${{ hashFiles('**/requirements*.txt') }}
@@ -41,15 +40,16 @@ jobs:
4140
- name: Run benchmarks and save baseline
4241
run: |
4342
# Run benchmarks and save results
44-
pytest tests/benchmark_text_service.py -v --benchmark-autosave
43+
python -m pytest tests/benchmark_text_service.py -v --benchmark-autosave --benchmark-json=benchmark-results.json
4544
4645
- name: Check for performance regression
4746
run: |
4847
# Compare against the previous benchmark if available
4948
# Fail if performance degrades by more than 10%
5049
if [ -d ".benchmarks" ]; then
51-
BASELINE=$(ls -t .benchmarks/Linux-CPython-3.10-64bit | head -n 2 | tail -n 1)
52-
CURRENT=$(ls -t .benchmarks/Linux-CPython-3.10-64bit | head -n 1)
50+
benchmark_dir=".benchmarks/Linux-CPython-3.10-64bit"
51+
BASELINE=$(ls -t $benchmark_dir | head -n 2 | tail -n 1)
52+
CURRENT=$(ls -t $benchmark_dir | head -n 1)
5353
if [ -n "$BASELINE" ] && [ "$BASELINE" != "$CURRENT" ]; then
5454
# Set full paths to the benchmark files
5555
BASELINE_FILE="$benchmark_dir/$BASELINE"
@@ -71,10 +71,12 @@ jobs:
7171
fi
7272
7373
- name: Upload benchmark results
74-
uses: actions/upload-artifact@v3
74+
uses: actions/upload-artifact@v4
7575
with:
7676
name: benchmark-results
77-
path: .benchmarks/
77+
path: |
78+
.benchmarks/
79+
benchmark-results.json
7880
7981
- name: Alert on regression
8082
if: failure()

.github/workflows/dev-cicd.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ jobs:
1515
- name: Check out repo
1616
uses: actions/checkout@v4
1717
- name: Set up Python
18-
uses: actions/setup-python@v4
18+
uses: actions/setup-python@v5
1919
with:
2020
python-version: "3.10"
2121
- name: Install pre-commit
2222
run: pip install pre-commit
2323
- name: Run pre-commit
24-
run: pre-commit run --all-files
24+
run: pre-commit run --all-files --show-diff-on-failure
2525

2626
build:
2727
runs-on: ubuntu-latest
@@ -46,7 +46,7 @@ jobs:
4646
docker-images: true
4747
swap-storage: true
4848
- name: Set up Python
49-
uses: actions/setup-python@v4
49+
uses: actions/setup-python@v5
5050
with:
5151
python-version: ${{ matrix.python-version }}
5252
- name: Install Tesseract OCR
@@ -64,6 +64,8 @@ jobs:
6464
tesseract --list-langs
6565
- name: Install Dependencies
6666
run: |
67+
# Create pip cache directory if it doesn't exist
68+
mkdir -p ~/.cache/pip
6769
pip install -U pip
6870
pip install -e .
6971
pip install tox just pre-commit

.github/workflows/lint.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Lint
2+
3+
on:
4+
push:
5+
branches: [main, dev]
6+
pull_request:
7+
branches: [main, dev]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Set up Python
16+
uses: actions/setup-python@v5
17+
with:
18+
python-version: "3.10"
19+
cache: "pip"
20+
21+
- name: Install dependencies
22+
run: |
23+
python -m pip install --upgrade pip
24+
pip install -r requirements-dev.txt
25+
26+
- name: Lint with flake8
27+
run: |
28+
# stop the build if there are Python syntax errors or undefined names
29+
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
30+
# exit-zero treats all errors as warnings
31+
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
32+
33+
- name: Lint with ruff
34+
run: |
35+
# Run ruff but don't fail the build yet (exit-zero)
36+
ruff check . --exit-zero
37+
38+
- name: Type check with mypy
39+
run: |
40+
# Run mypy but don't fail the build yet
41+
# Use --ignore-missing-imports to ignore missing stubs for third-party libraries
42+
mypy datafog/ --ignore-missing-imports || true

.github/workflows/publish-pypi.yml

Lines changed: 79 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
name: PyPI Release
22

33
on:
4+
# Manual trigger with version input
45
workflow_dispatch:
56
inputs:
67
version:
@@ -10,17 +11,29 @@ on:
1011
description: "Confirm all tests have passed"
1112
type: boolean
1213
required: true
14+
is_prerelease:
15+
description: "Is this a pre-release?"
16+
type: boolean
17+
default: false
18+
required: false
19+
# Auto-trigger for beta releases when merged to dev
20+
push:
21+
branches:
22+
- dev
1323

1424
jobs:
15-
release:
25+
# Job for manual releases (stable or pre-release)
26+
manual_release:
1627
runs-on: ubuntu-latest
17-
if: github.event.inputs.confirm_tests == 'true'
28+
if: github.event_name == 'workflow_dispatch' && github.event.inputs.confirm_tests == 'true'
1829
permissions:
1930
contents: write
2031
steps:
2132
- uses: actions/checkout@v3
33+
with:
34+
fetch-depth: 0
2235
- name: Set up Python
23-
uses: actions/setup-python@v4
36+
uses: actions/setup-python@v5
2437
with:
2538
python-version: "3.10"
2639
- name: Install dependencies
@@ -37,9 +50,71 @@ jobs:
3750
git config user.email github-actions@github.com
3851
git tag v${{ github.event.inputs.version }}
3952
git push origin v${{ github.event.inputs.version }}
40-
gh release create v${{ github.event.inputs.version }} --generate-notes
53+
if [ "${{ github.event.inputs.is_prerelease }}" == "true" ]; then
54+
gh release create v${{ github.event.inputs.version }} --prerelease --generate-notes
55+
else
56+
gh release create v${{ github.event.inputs.version }} --generate-notes
57+
fi
4158
- name: Publish to PyPI
4259
env:
4360
TWINE_USERNAME: __token__
4461
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
4562
run: twine upload dist/*
63+
64+
# Job for automatic beta releases on merge to dev
65+
auto_beta_release:
66+
runs-on: ubuntu-latest
67+
if: github.event_name == 'push' && github.ref == 'refs/heads/dev'
68+
permissions:
69+
contents: write
70+
steps:
71+
- uses: actions/checkout@v3
72+
with:
73+
fetch-depth: 0
74+
- name: Set up Python
75+
uses: actions/setup-python@v5
76+
with:
77+
python-version: "3.10"
78+
- name: Install dependencies
79+
run: |
80+
python -m pip install --upgrade pip
81+
pip install build twine setuptools-scm
82+
- name: Generate beta version
83+
id: beta_version
84+
run: |
85+
# Get the latest tag
86+
LATEST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "0.0.0")
87+
# Remove the 'v' prefix if present
88+
LATEST_VERSION=${LATEST_TAG#v}
89+
# Split version into components
90+
IFS='.' read -r MAJOR MINOR PATCH <<< "$LATEST_VERSION"
91+
# Extract any existing beta suffix
92+
PATCH_NUM=${PATCH%%b*}
93+
# Get commit count since last tag for unique beta number
94+
COMMIT_COUNT=$(git rev-list --count $LATEST_TAG..HEAD 2>/dev/null || echo "1")
95+
# Generate beta version
96+
BETA_VERSION="$MAJOR.$MINOR.$(($PATCH_NUM))b$COMMIT_COUNT"
97+
echo "Generated beta version: $BETA_VERSION"
98+
echo "version=$BETA_VERSION" >> $GITHUB_OUTPUT
99+
# Update version in setup.py or pyproject.toml if needed
100+
# This depends on how your versioning is configured
101+
- name: Build package
102+
run: python -m build
103+
- name: Create GitHub Pre-Release
104+
env:
105+
GITHUB_TOKEN: ${{ secrets.pypi }}
106+
BETA_VERSION: ${{ steps.beta_version.outputs.version }}
107+
run: |
108+
git config user.name github-actions
109+
git config user.email github-actions@github.com
110+
git tag v$BETA_VERSION
111+
git push origin v$BETA_VERSION
112+
gh release create v$BETA_VERSION --prerelease --title "Beta Release v$BETA_VERSION" --notes "Automated beta release from dev branch"
113+
- name: Publish to PyPI as Beta
114+
env:
115+
TWINE_USERNAME: __token__
116+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
117+
BETA_VERSION: ${{ steps.beta_version.outputs.version }}
118+
run: |
119+
# Ensure package is marked as beta in PyPI
120+
twine upload --skip-existing dist/*

.github/workflows/tests.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [main, dev]
6+
pull_request:
7+
branches: [main, dev]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.10", "3.11", "3.12"]
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
cache: "pip"
24+
25+
- name: Install dependencies
26+
run: |
27+
python -m pip install --upgrade pip
28+
pip install -e .
29+
pip install -r requirements-dev.txt
30+
31+
- name: Run tests with pytest
32+
run: |
33+
python -m pytest tests/ --cov=datafog --cov-report=xml
34+
35+
- name: Upload coverage report
36+
uses: codecov/codecov-action@v4
37+
with:
38+
file: ./coverage.xml
39+
fail_ci_if_error: true
40+
token: ${{ secrets.CODECOV_TOKEN }}

.gitignore

Lines changed: 45 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,61 @@
1+
# Python bytecode
12
*.pyc
2-
*.swp
3-
*.prof
3+
__pycache__/
4+
5+
# Distribution / packaging
46
MANIFEST
57
dist/
68
build/
9+
*.egg-info/
10+
11+
# Testing
712
.coverage
13+
.coverage.*
814
.cache/
9-
*.egg-info/
1015
.pytest_cache/
1116
.tox/
12-
src/datafog/__pycache__/
13-
src/datafog/pii_tools/__pycache__/
14-
tests/__pycache__/
17+
coverage.xml
18+
htmlcov/
19+
.benchmarks/
1520
tests/scratch.py
1621
tests/.datafog_env/
17-
node_modules/
18-
datafog_debug.log
19-
sotu_2023.txt
20-
.DS_Store
21-
venv/
22-
datafog-python/datafog/processing/image_processing/__pycache__/
23-
datafog-python/datafog/processing/text_processing/__pycache__/
24-
datafog-python/datafog/services/__pycache__/
25-
datafog-python/datafog/processing/__pycache__/
26-
datafog-python/datafog/__pycache__/
22+
error_log.txt
23+
24+
# Environment
2725
.env
28-
coverage.xml
29-
htmlcov/
30-
.venv/
31-
node_modules/
32-
.DS_Store
3326
.venv
27+
venv/
28+
env/
3429
examples/venv/
35-
error_log.txt
30+
31+
# Editors
32+
*.swp
33+
*.swo
34+
.idea/
35+
.vscode/
36+
*.sublime-*
37+
38+
# OS specific
39+
.DS_Store
40+
Thumbs.db
41+
42+
# Logs and debugging
43+
*.log
44+
*.prof
45+
datafog_debug.log
46+
47+
# Project specific
48+
sotu_2023.txt
49+
node_modules/
50+
scratch.py
51+
52+
# Documentation build
53+
docs/_build/
3654
docs/*
3755
!docs/*.rst
3856
!docs/conf.py
39-
scratch.py
40-
.coverage*
41-
.benchmarks
57+
!docs/Makefile
58+
!docs/make.bat
59+
60+
# Keep all directories but ignore their contents
61+
*/**/__pycache__/

0 commit comments

Comments
 (0)