Releases: tensorflow/text
v2.0.0
Major Updates
- Added a regex_split op.
- Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
- Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
- Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.
v1.15.0
Major Updates
- Added a regex_split op.
- Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
- Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
- Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.
v2.0.0-rc0
Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, this (previously 1.0) release is 2.0, and we will be skiping straight to 1.15 for the next 1.x release to support TF 1.15.
Major Updates:
- SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
- New ToDense Keras layer for RaggedTensor conversion
- Pipeline for generating a Wordpiece Vocabulary has been added to tools.
- New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
- New BertTokenizer which mimics the preprocessing performed in the original BERT model.
- New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
- Many previously released ops have been added TensorFlow Serving model server. Please see https://github.com/tensorflow/serving for more information.
Minor Updates:
- API docs have received an update that should make finding relevant information easier.
- Wordpiece: Add support for splitting unknown characters
- Wordpiece: Add support for max characters per token
- Wordshape: Fix finding of currency symbols.
- Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
- Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
- Build environment: Updated to match core TF's update.
v1.15.0-rc0
Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, we are skipping straight to v1.15 for our TF 1.15 support.
Major Updates:
- SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
- New ToDense Keras layer for RaggedTensor conversion
- Pipeline for generating a Wordpiece Vocabulary has been added to tools.
- New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
- New BertTokenizer which mimics the preprocessing performed in the original BERT model.
- New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
Minor Updates:
- API docs have received an update that should make finding relevant information easier.
- Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
- Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
- Build environment: Updated to match core TF's update.
- Many previously released ops have been added TensorFlow Serving model server, and should be in a coming 1.x release. Please see https://github.com/tensorflow/serving for more information.
v0.1.0 (TF 1.14)
Minor Updates:
- Wordpiece: Add support for splitting unknown characters
- Wordpiece: Add support for max characters per token
- Wordshape: Fix finding of currency symbols
v0.1.0-rc2
This is a TF 1.14 compatible release that has everything in TF.Text 1.0.0-beta2. Most ongoing TF.Text development is for TensorFlow 2.0, but we wanted to provide a library for those that have not transitioned yet.
v1.0.0-beta2
Major updates:
- Fixes problem in build from beta1
v1.0.0-beta1
Major updates:
- Include data necessary for normalize & case folding ops.
Minor updates:
- Update unit tests to work with Python3.
- Fix some wordpiece corner case bugs.
- Wordpiece efficiency improvements.
- Add missing shape_fn to Wordpiece's tokenizeWithOffsets.
v1.0.0-beta0
Initial prerelease for TF.Text library.