Skip to content

Commit

Permalink
Merge branch 'main' into issue-886-pyarrow
Browse files Browse the repository at this point in the history
  • Loading branch information
fealho authored Oct 28, 2024
2 parents 00e7e0d + e15cb50 commit 513d4d1
Show file tree
Hide file tree
Showing 12 changed files with 232 additions and 23 deletions.
52 changes: 52 additions & 0 deletions .github/workflows/release_notes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Release Notes Generator

on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to merge release notes into.'
required: true
default: 'main'
version:
description:
'Version to use for the release. Must be in format: X.Y.Z.'
date:
description:
'Date of the release. Must be in format YYYY-MM-DD.'

jobs:
releasenotesgeneration:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install requests==2.31.0
- name: Generate release notes
env:
GH_ACCESS_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}
run: >
python -m scripts.release_notes_generator
-v ${{ inputs.version }}
-d ${{ inputs.date }}
- name: Create pull request
id: cpr
uses: peter-evans/create-pull-request@v4
with:
token: ${{ secrets.GH_ACCESS_TOKEN }}
commit-message: Release notes for v${{ inputs.version }}
author: "github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>"
committer: "github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>"
title: v${{ inputs.version }} Release Notes
body: "This is an auto-generated PR to update the release notes."
branch: release-notes
branch-suffix: short-commit-hash
base: ${{ inputs.branch }}
2 changes: 1 addition & 1 deletion .github/workflows/static_code_analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
python -m pip install --upgrade pip
python -m pip install bandit==1.7.7
- name: Save code analysis
run: bandit -r . -x ./tests -f txt -o static_code_analysis.txt --exit-zero
run: bandit -r . -x ./tests,./scripts -f txt -o static_code_analysis.txt --exit-zero
- name: Create pull request
id: cpr
uses: peter-evans/create-pull-request@v4
Expand Down
10 changes: 10 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# History

## v1.13.0 - 2024-10-08

### New Features

* Align text/id sdtypes to the SDV library - Issue [#880](https://github.com/sdv-dev/RDT/issues/880)

### Internal

* Add workflow to generate release notes - Issue [#889](https://github.com/sdv-dev/RDT/issues/889) by @amontanez24

## v1.12.4 - 2024-09-05

This release enables the `create_anonymized_columns` method to support multi-column transformers.
Expand Down
2 changes: 1 addition & 1 deletion latest_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Faker==30.0.0
Faker==30.6.0
copulas==0.11.1
numpy==2.0.2
pandas==2.2.3
Expand Down
7 changes: 4 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ collect_ignore = ['pyproject.toml']
exclude_lines = ['NotImplementedError()']

[tool.bumpversion]
current_version = "1.12.5.dev0"
current_version = "1.13.1.dev0"
parse = '(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\.(?P<release>[a-z]+)(?P<candidate>\d+))?'
serialize = [
'{major}.{minor}.{patch}.{release}{candidate}',
Expand Down Expand Up @@ -205,10 +205,11 @@ select = [
# print statements
"T201",
# pandas-vet
"PD"
"PD",
# numpy 2.0
"NPY201"
]
ignore = [
"E501",
# pydocstyle
"D107", # Missing docstring in __init__
"D417", # Missing argument descriptions in the docstring, this is a bug from pydocstyle: https://github.com/PyCQA/pydocstyle/issues/449
Expand Down
2 changes: 1 addition & 1 deletion rdt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

__author__ = 'DataCebo, Inc.'
__email__ = 'info@sdv.dev'
__version__ = '1.12.5.dev0'
__version__ = '1.13.1.dev0'


import sys
Expand Down
3 changes: 1 addition & 2 deletions rdt/transformers/pii/anonymizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,7 @@ def __init__(
self.provider_name = provider_name if provider_name else 'BaseProvider'
if self.provider_name != 'BaseProvider' and function_name is None:
raise TransformerInputError(
'Please specify the function name to use from the '
f"'{self.provider_name}' provider."
f"Please specify the function name to use from the '{self.provider_name}' provider."
)

self.function_name = function_name if function_name else 'lexify'
Expand Down
152 changes: 152 additions & 0 deletions scripts/release_notes_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
"""Script to generate release notes."""

import argparse
import os
from collections import defaultdict

import requests

LABEL_TO_HEADER = {
'feature request': 'New Features',
'bug': 'Bugs Fixed',
'internal': 'Internal',
'maintenance': 'Maintenance',
'customer success': 'Customer Success',
'documentation': 'Documentation',
'misc': 'Miscellaneous',
}
ISSUE_LABELS = [
'documentation',
'maintenance',
'internal',
'bug',
'feature request',
'customer success',
]
ISSUE_LABELS_ORDERED_BY_IMPORTANCE = [
'feature request',
'customer success',
'bug',
'documentation',
'internal',
'maintenance',
]
NEW_LINE = '\n'
GITHUB_URL = 'https://api.github.com/repos/sdv-dev/rdt'
GITHUB_TOKEN = os.getenv('GH_ACCESS_TOKEN')


def _get_milestone_number(milestone_title):
url = f'{GITHUB_URL}/milestones'
headers = {'Authorization': f'Bearer {GITHUB_TOKEN}'}
query_params = {'milestone': milestone_title, 'state': 'all', 'per_page': 100}
response = requests.get(url, headers=headers, params=query_params, timeout=10)
body = response.json()
if response.status_code != 200:
raise Exception(str(body))

milestones = body
for milestone in milestones:
if milestone.get('title') == milestone_title:
return milestone.get('number')

raise ValueError(f'Milestone {milestone_title} not found in past 100 milestones.')


def _get_issues_by_milestone(milestone):
headers = {'Authorization': f'Bearer {GITHUB_TOKEN}'}
# get milestone number
milestone_number = _get_milestone_number(milestone)
url = f'{GITHUB_URL}/issues'
page = 1
query_params = {'milestone': milestone_number, 'state': 'all'}
issues = []
while True:
query_params['page'] = page
response = requests.get(url, headers=headers, params=query_params, timeout=10)
body = response.json()
if response.status_code != 200:
raise Exception(str(body))

issues_on_page = body
if not issues_on_page:
break

# Filter our PRs
issues_on_page = [issue for issue in issues_on_page if issue.get('pull_request') is None]
issues.extend(issues_on_page)
page += 1

return issues


def _get_issues_by_category(release_issues):
category_to_issues = defaultdict(list)

for issue in release_issues:
issue_title = issue['title']
issue_number = issue['number']
issue_url = issue['html_url']
line = f'* {issue_title} - Issue [#{issue_number}]({issue_url})'
assignee = issue.get('assignee')
if assignee:
login = assignee['login']
line += f' by @{login}'

# Check if any known label is marked on the issue
labels = [label['name'] for label in issue['labels']]
found_category = False
for category in ISSUE_LABELS:
if category in labels:
category_to_issues[category].append(line)
found_category = True
break

if not found_category:
category_to_issues['misc'].append(line)

return category_to_issues


def _create_release_notes(issues_by_category, version, date):
title = f'## v{version} - {date}'
release_notes = f'{title}{NEW_LINE}{NEW_LINE}'

for category in ISSUE_LABELS_ORDERED_BY_IMPORTANCE + ['misc']:
issues = issues_by_category.get(category)
if issues:
section_text = (
f'### {LABEL_TO_HEADER[category]}{NEW_LINE}{NEW_LINE}'
f'{NEW_LINE.join(issues)}{NEW_LINE}{NEW_LINE}'
)

release_notes += section_text

return release_notes


def update_release_notes(release_notes):
"""Add the release notes for the new release to the ``HISTORY.md``."""
file_path = 'HISTORY.md'
with open(file_path, 'r') as history_file:
history = history_file.read()

token = '# HISTORY\n\n'
split_index = history.find(token) + len(token) + 1
header = history[:split_index]
new_notes = f'{header}{release_notes}{history[split_index:]}'

with open(file_path, 'w') as new_history_file:
new_history_file.write(new_notes)


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-v', '--version', type=str, help='Release version number (ie. v1.0.1)')
parser.add_argument('-d', '--date', type=str, help='Date of release in format YYYY-MM-DD')
args = parser.parse_args()
release_number = args.version
release_issues = _get_issues_by_milestone(release_number)
issues_by_category = _get_issues_by_category(release_issues)
release_notes = _create_release_notes(issues_by_category, release_number, args.date)
update_release_notes(release_notes)
4 changes: 2 additions & 2 deletions static_code_analysis.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Run started:2024-09-05 19:41:22.889700
Run started:2024-10-09 15:39:00.488390

Test results:
No issues identified.

Code scanned:
Total lines of code: 5543
Total lines of code: 5629
Total lines skipped (#nosec): 0
Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Expand Down
17 changes: 6 additions & 11 deletions tests/unit/test_hyper_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ def test_validate_config_not_unique_field(self):

# Run
error_msg = re.escape(
'Error: Invalid config. Please provide unique keys for the sdtypes ' 'and transformers.'
'Error: Invalid config. Please provide unique keys for the sdtypes and transformers.'
)
with pytest.raises(InvalidConfigError, match=error_msg):
HyperTransformer._validate_config(config)
Expand Down Expand Up @@ -858,8 +858,7 @@ def test_set_config_already_fitted(self, mock_warnings):

# Assert
expected_warnings_msg = (
'For this change to take effect, please refit your data using '
"'fit' or 'fit_transform'."
"For this change to take effect, please refit your data using 'fit' or 'fit_transform'."
)
mock_warnings.warn.assert_called_once_with(expected_warnings_msg)

Expand Down Expand Up @@ -2511,8 +2510,7 @@ def test_update_transformers_fitted(self, mock_warnings):

# Assert
expected_message = (
"For this change to take effect, please refit your data using 'fit' "
"or 'fit_transform'."
"For this change to take effect, please refit your data using 'fit' or 'fit_transform'."
)

mock_warnings.warn.assert_called_once_with(expected_message)
Expand Down Expand Up @@ -2921,8 +2919,7 @@ def test_update_sdtypes_fitted(self, mock_warnings, mock_logger):

# Assert
expected_message = (
"For this change to take effect, please refit your data using 'fit' "
"or 'fit_transform'."
"For this change to take effect, please refit your data using 'fit' or 'fit_transform'."
)
user_message = (
'The transformers for these columns may change based on the new sdtype.\n'
Expand Down Expand Up @@ -3470,8 +3467,7 @@ def test_remove_transformers_fitted(self, mock_warnings):

# Assert
expected_warnings_msg = (
'For this change to take effect, please refit your data using '
"'fit' or 'fit_transform'."
"For this change to take effect, please refit your data using 'fit' or 'fit_transform'."
)
mock_warnings.warn.assert_called_once_with(expected_warnings_msg)
assert ht.field_transformers == {
Expand Down Expand Up @@ -3558,8 +3554,7 @@ def test_remove_transformers_by_sdtype(self, mock_warnings):
'column3': None,
}
expected_warnings_msg = (
'For this change to take effect, please refit your data using '
"'fit' or 'fit_transform'."
"For this change to take effect, please refit your data using 'fit' or 'fit_transform'."
)
mock_warnings.warn.assert_called_once_with(expected_warnings_msg)

Expand Down
2 changes: 1 addition & 1 deletion tests/unit/transformers/pii/test_anonymizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ def test___init__no_function_name(self):
"""
# Run / Assert
expected_message = (
'Please specify the function name to use from the ' "'credit_card' provider."
"Please specify the function name to use from the 'credit_card' provider."
)
with pytest.raises(TransformerInputError, match=expected_message):
AnonymizedFaker(provider_name='credit_card', locales=['en_US', 'fr_FR'])
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/transformers/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ def test_get_input_sdtype_raises_warning(self, mock_get_supported_sdtypes):

# Run
expected_message = (
'`get_input_sdtype` is deprecated. Please use ' '`get_supported_sdtypes` instead.'
'`get_input_sdtype` is deprecated. Please use `get_supported_sdtypes` instead.'
)
with pytest.warns(FutureWarning, match=expected_message):
input_sdtype = BaseTransformer.get_input_sdtype()
Expand Down

0 comments on commit 513d4d1

Please sign in to comment.