Skip to content

Conversation

bdewilde
Copy link
Collaborator

@bdewilde bdewilde commented Aug 30, 2019

Description

  • Improved, extended, and added data augmentation transform functions
    • word-level transforms can now be limited to words of particular part(s) of speech
    • char-level transforms, broadly analogous to word-level, are now available for substituting, inserting, swapping, and deleting individual characters
  • Added an Augmenter class to combine multiple transforms and randomly apply them to spaCy Docs in a variety of ways
  • Refactored augmentation "utils" code, and changed a field on the AugTok named tuple
  • Added tests for all of this

Motivation and Context

My first pass on data augmentation (PR #268) was okay, but woefully insufficient. This brings the sub-package up to a passable level.

How Has This Been Tested?

Wrote a bunch of tests and ran the code locally, and everything checks out.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • [TODO] My change requires a change to the documentation, and I have updated it accordingly.

@bdewilde bdewilde marked this pull request as ready for review August 30, 2019 14:58
@bdewilde bdewilde merged commit 9d3526a into develop Aug 30, 2019
@bdewilde bdewilde deleted the feature/improve-data-augmentation branch August 30, 2019 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant