Skip to content

ff #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Jul 22, 2024
Merged

ff #46

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
155234f
Moved PySpark -> Optional Dependency
May 18, 2024
ecc5d4c
beta_pypi_testing_notebook
May 18, 2024
554b8ef
otel
May 18, 2024
7b42867
Create feature-ci-cd
sidmohan0 May 18, 2024
8121384
feature-cicd-tests
May 18, 2024
ffe5f8d
Delete .github/workflows/feature-ci-cd
sidmohan0 May 18, 2024
0781aaf
feature-cicd-tests
May 18, 2024
8adff1a
fixed justformat
May 18, 2024
4aeb2f0
fixed pydantic-spacy ver issue
May 18, 2024
2facea8
otel init
May 28, 2024
7b58346
gitignore
May 28, 2024
f56db58
Merge pull request #30 from DataFog/feature/otel-v3.2.1
sidmohan0 May 28, 2024
cf36dd6
v3.2.1 release
May 28, 2024
4311508
Merge pull request #28 from DataFog/feature/v3.2.1
sidmohan0 May 28, 2024
c95509d
v3.2.2 fixed OTel issue
Jun 20, 2024
f0e2ed5
add synchronous text pipelines
sroy9675 Jul 8, 2024
d77f6c1
recovered 3.2.2
Jul 12, 2024
1998a08
removed otel-sdk req
Jul 12, 2024
55c115d
Apply .gitignore retroactively
Jul 12, 2024
bbe1f08
numpy>=1.21.0,<2.0.0
Jul 12, 2024
298c152
updated .pre-commit-config, .flake8
Jul 12, 2024
1069669
+ tesseract install dev-cicd
Jul 13, 2024
72ebad4
feature-cicd
Jul 13, 2024
fcccb1f
updated main-cicd, dev-cicd
Jul 13, 2024
6d4230c
pre-commit passed locally
Jul 13, 2024
aa30f80
fixed tox issue w/ external command tesseract
Jul 13, 2024
ae80156
added py311
Jul 13, 2024
873fc30
fixed just format
Jul 13, 2024
213e286
updated setup.py, rebuilt requirements.txt
Jul 13, 2024
d954f35
documented dev flow, README.md
Jul 13, 2024
7bd1e42
removed .env, update gitignore
Jul 13, 2024
71e0c8d
removed py311 in tox
Jul 13, 2024
781bbb8
removed py311 in main-cicd.yml
Jul 13, 2024
3f48a3f
updated feature-cicd.yml
Jul 13, 2024
1734cf9
updated pre-commit, all passed locally, lazy import spark in pyspark_…
Jul 13, 2024
968c768
updated build stage feature-ci-cd
Jul 13, 2024
26658ad
updated pip install in feature-cicd
Jul 13, 2024
45291ca
added just as dev req
Jul 13, 2024
5c455af
removed codeblocks README.md
Jul 13, 2024
ce18eec
Merge pull request #33 from DataFog/feature/v3.2.1
sidmohan0 Jul 13, 2024
0f33170
updated main, dev cicd files
Jul 13, 2024
4228dd9
update checkout action version in cicd yaml files
Jul 13, 2024
d3066cd
Merge pull request #35 from DataFog/temp-update-cicd
sidmohan0 Jul 13, 2024
6d9c178
pre-commit passed
Jul 13, 2024
ba7aed9
Merge pull request #36 from DataFog/temp-pre-commit-fix
sidmohan0 Jul 13, 2024
baac07c
submitting PR for updated ymls
Jul 13, 2024
a7c6151
Merge pull request #37 from DataFog/hotfix-main-dev-cicd
sidmohan0 Jul 13, 2024
be84114
merge release 3.2.2 from main
sroy9675 Jul 13, 2024
f8fb354
Merge branch 'main' into synchronous_processing
sroy9675 Jul 13, 2024
f3546cd
fix test break
sroy9675 Jul 13, 2024
236c6d0
add missing SpacyPIIAnnotator import
sroy9675 Jul 13, 2024
7e9e0a7
add tests to increase coverage
sroy9675 Jul 13, 2024
aaac136
fix isort precommit error
sroy9675 Jul 13, 2024
bc1ce83
revert isort changes to files other than main
sroy9675 Jul 13, 2024
f88a09a
try to fix black errors
sroy9675 Jul 13, 2024
0b3b084
update version for release
sroy9675 Jul 14, 2024
4389855
Merge pull request #32 from DataFog/synchronous_processing
sidmohan0 Jul 14, 2024
d000e65
publish-pypi.yml
Jul 14, 2024
b4266d9
updated publish-pypi.yml
Jul 14, 2024
c6633ad
Merge pull request #40 from DataFog/dev-local
sidmohan0 Jul 14, 2024
0351a4e
Update publish-pypi.yml
sidmohan0 Jul 14, 2024
44d0bb1
Update publish-pypi.yml
sidmohan0 Jul 14, 2024
66ed686
remove en_spacy_pii_fast from setup.py
Jul 14, 2024
7cf6203
updated en_spacy_pii_fast imports
Jul 14, 2024
213e960
Merge pull request #41 from DataFog/hotfix/spacy-pypi-issue
sidmohan0 Jul 14, 2024
5cbfcb6
- pip instal in publish-pypi
Jul 14, 2024
1e9a024
Merge pull request #42 from DataFog/dev
sidmohan0 Jul 14, 2024
7253932
fix link to the getting started collab notebook
sroy9675 Jul 15, 2024
9891799
Merge pull request #44 from DataFog/bug_fix/readme_link
pselvana Jul 15, 2024
23fda90
remove extraneous debug prints
sroy9675 Jul 19, 2024
23a9e03
Merge pull request #45 from DataFog/patch/debug_prints
sidmohan0 Jul 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
6 changes: 3 additions & 3 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[flake8]
ignore = E203, E266, E501, W503
max-line-length = 80
ignore = E203, E266, E501, W503, B006, B007, B008, F401, C416, B950, B904
max-line-length = 88
max-complexity = 18
select = B,C,E,F,W,T4,B9
select = B,C,E,F,W,T4,B9
42 changes: 0 additions & 42 deletions .github/workflows/dev-cicd-tests.yml

This file was deleted.

64 changes: 64 additions & 0 deletions .github/workflows/dev-cicd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: dev-cicd-setup-and-test

on:
push:
branches:
- dev
pull_request:
branches:
- dev

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files

build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Tesseract OCR
run: |
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
sudo apt-get update
sudo apt-get install -y tesseract-ocr libtesseract-dev
tesseract --version
dpkg -l | grep tesseract
- name: Verify Tesseract Installation
run: |
which tesseract
tesseract --list-langs
- name: Install Dependencies
run: |
pip install -U pip
pip install -e .
pip install tox just pre-commit
- name: Run Tests with tox
run: tox -- --cov datafog --cov-report xml --cov-report term --codeblocks
- name: Submit to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
flags: unittests
name: codecov-umbrella
64 changes: 64 additions & 0 deletions .github/workflows/feature-cicd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: feature-cicd-setup-and-test

on:
push:
branches:
- feature/*
pull_request:
branches:
- feature/*

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files

build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Tesseract OCR
run: |
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
sudo apt-get update
sudo apt-get install -y tesseract-ocr libtesseract-dev
tesseract --version
dpkg -l | grep tesseract
- name: Verify Tesseract Installation
run: |
which tesseract
tesseract --list-langs
- name: Install Dependencies
run: |
pip install -U pip
pip install -e .
pip install tox just pre-commit
- name: Run Tests with tox
run: tox -- --cov datafog --cov-report xml --cov-report term --codeblocks
- name: Submit to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
flags: unittests
name: codecov-umbrella
42 changes: 0 additions & 42 deletions .github/workflows/main-cicd-tests.yml

This file was deleted.

64 changes: 64 additions & 0 deletions .github/workflows/main-cicd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: main-cicd-setup-and-test

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files

build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Tesseract OCR
run: |
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
sudo apt-get update
sudo apt-get install -y tesseract-ocr libtesseract-dev
tesseract --version
dpkg -l | grep tesseract
- name: Verify Tesseract Installation
run: |
which tesseract
tesseract --list-langs
- name: Install Dependencies
run: |
pip install -U pip
pip install -e .
pip install tox just pre-commit
- name: Run Tests with tox
run: tox -- --cov datafog --cov-report xml --cov-report term --codeblocks
- name: Submit to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
flags: unittests
name: codecov-umbrella
45 changes: 45 additions & 0 deletions .github/workflows/publish-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: PyPI Release

on:
workflow_dispatch:
inputs:
version:
description: "Version to release (e.g., 1.2.3)"
required: true
confirm_tests:
description: "Confirm all tests have passed"
type: boolean
required: true

jobs:
release:
runs-on: ubuntu-latest
if: github.event.inputs.confirm_tests == 'true'
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build twine
- name: Build package
run: python -m build
- name: Create GitHub Release
env:
GITHUB_TOKEN: ${{ secrets.pypi }}
run: |
git config user.name github-actions
git config user.email github-actions@github.com
git tag v${{ github.event.inputs.version }}
git push origin v${{ github.event.inputs.version }}
gh release create v${{ github.event.inputs.version }} --generate-notes
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: twine upload dist/*
18 changes: 12 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,24 @@ build/
*.egg-info/
.pytest_cache/
.tox/
/src/datafog/__pycache__/
/src/datafog/pii_tools/__pycache__/
/tests/__pycache__/
/tests/scratch.py
src/datafog/__pycache__/
src/datafog/pii_tools/__pycache__/
tests/__pycache__/
tests/scratch.py
tests/.datafog_env/
node_modules/
datafog_debug.log
sotu_2023.txt
.DS_Store
/venv/
venv/
datafog-python/datafog/processing/image_processing/__pycache__/
datafog-python/datafog/processing/text_processing/__pycache__/
datafog-python/datafog/services/__pycache__/
datafog-python/datafog/processing/__pycache__/
datafog-python/datafog/__pycache__/

.env
coverage.xml
htmlcov/
.venv/
node_modules/
.DS_Store
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,25 @@ repos:
hooks:
- id: isort
args: [--profile=black]
exclude: .venv

- repo: https://github.com/psf/black
rev: 24.2.0
hooks:
- id: black
language_version: python3
exclude: .venv

- repo: https://github.com/pycqa/flake8
rev: 7.0.0
hooks:
- id: flake8
args: [--max-line-length=88] # Match Black's line length
additional_dependencies: [flake8-bugbear, flake8-comprehensions]
exclude: .venv

- repo: https://github.com/pre-commit/mirrors-prettier
rev: v4.0.0-alpha.8
hooks:
- id: prettier
exclude: .venv
Loading
Loading