Skip to content

Releases: tensorflow/text

v2.0.0

08 Nov 20:22
Compare
Choose a tag to compare

Major Updates

  • Added a regex_split op.
  • Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
  • Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
  • Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

v1.15.0

08 Nov 20:21
Compare
Choose a tag to compare

Major Updates

  • Added a regex_split op.
  • Fixes a bug in case_fold_utf8 and normalize_utf8 ops where they were unable to locate the ICU data file.
  • Fixed a problem with the BertTokenizer where it was using merge_dims which is unreleased for the corresponding version of TensorFlow.
  • Updated the BertTokenizer to use regex_split to match the exact regex used by original BERT.

v2.0.0-rc0

19 Oct 00:08
Compare
Choose a tag to compare

Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, this (previously 1.0) release is 2.0, and we will be skiping straight to 1.15 for the next 1.x release to support TF 1.15.

Major Updates:

  • SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
  • New ToDense Keras layer for RaggedTensor conversion
  • Pipeline for generating a Wordpiece Vocabulary has been added to tools.
  • New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
  • New BertTokenizer which mimics the preprocessing performed in the original BERT model.
  • New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
  • Many previously released ops have been added TensorFlow Serving model server. Please see https://github.com/tensorflow/serving for more information.

Minor Updates:

  • API docs have received an update that should make finding relevant information easier.
  • Wordpiece: Add support for splitting unknown characters
  • Wordpiece: Add support for max characters per token
  • Wordshape: Fix finding of currency symbols.
  • Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
  • Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
  • Build environment: Updated to match core TF's update.

v1.15.0-rc0

19 Oct 00:13
Compare
Choose a tag to compare

Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, we are skipping straight to v1.15 for our TF 1.15 support.

Major Updates:

  • SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
  • New ToDense Keras layer for RaggedTensor conversion
  • Pipeline for generating a Wordpiece Vocabulary has been added to tools.
  • New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
  • New BertTokenizer which mimics the preprocessing performed in the original BERT model.
  • New Detokenizer abstract class has been added to the TF.Text Tokenizer API.

Minor Updates:

  • API docs have received an update that should make finding relevant information easier.
  • Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
  • Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
  • Build environment: Updated to match core TF's update.
  • Many previously released ops have been added TensorFlow Serving model server, and should be in a coming 1.x release. Please see https://github.com/tensorflow/serving for more information.

v0.1.0 (TF 1.14)

09 Oct 00:03
Compare
Choose a tag to compare

Minor Updates:

  • Wordpiece: Add support for splitting unknown characters
  • Wordpiece: Add support for max characters per token
  • Wordshape: Fix finding of currency symbols

v0.1.0-rc2

01 Aug 19:40
Compare
Choose a tag to compare
v0.1.0-rc2 Pre-release
Pre-release

This is a TF 1.14 compatible release that has everything in TF.Text 1.0.0-beta2. Most ongoing TF.Text development is for TensorFlow 2.0, but we wanted to provide a library for those that have not transitioned yet.

v1.0.0-beta2

01 Aug 19:16
Compare
Choose a tag to compare
v1.0.0-beta2 Pre-release
Pre-release

Major updates:

  • Fixes problem in build from beta1

v1.0.0-beta1

23 Jul 18:53
Compare
Choose a tag to compare
v1.0.0-beta1 Pre-release
Pre-release

Major updates:

  • Include data necessary for normalize & case folding ops.

Minor updates:

  • Update unit tests to work with Python3.
  • Fix some wordpiece corner case bugs.
  • Wordpiece efficiency improvements.
  • Add missing shape_fn to Wordpiece's tokenizeWithOffsets.

v1.0.0-beta0

10 Jun 04:29
Compare
Choose a tag to compare
v1.0.0-beta0 Pre-release
Pre-release

Initial prerelease for TF.Text library.