Skip to content

Commit 7ffb569

Browse files
authored
Merge pull request #36 from paulgc/master
Project import generated by Copybara.
2 parents bd35a86 + 6517ed3 commit 7ffb569

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1820
-1138
lines changed

README.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ TF Data Validation includes:
2020
* An anomalies viewer so that you can see what features have anomalies and
2121
learn more in order to correct them.
2222

23-
For instructions on using TFDV, see the [get started guide](g3doc/get_started.md)
23+
For instructions on using TFDV, see the [get started guide](https://github.com/tensorflow/data-validation/blob/master/g3doc/get_started.md)
2424
and try out the [example notebook](https://colab.research.google.com/github/tensorflow/data-validation/blob/master/examples/chicago_taxi/chicago_taxi_tfdv.ipynb).
2525

2626
Caution: TFDV may be backwards incompatible before version 1.0.
@@ -34,14 +34,6 @@ The recommended way to install TFDV is using the
3434
pip install tensorflow-data-validation
3535
```
3636

37-
TFDV 0.9.0 currently requires TensorFlow Transform 0.9.0. Make sure to
38-
force install Transform 0.9.0 after installing TFDV, using the following
39-
command.
40-
41-
```bash
42-
pip install tensorflow_transform==0.9.0
43-
```
44-
4537
## Installing from source
4638

4739
### 1. Prerequisites
@@ -55,7 +47,7 @@ directions](https://www.scipy.org/scipylib/download.html).
5547

5648
#### Install Bazel
5749

58-
If bazel is not installed on your system, install it now by following [these
50+
If Bazel is not installed on your system, install it now by following [these
5951
directions](https://bazel.build/versions/master/docs/install.html).
6052

6153
### 2. Clone the TFDV repository
@@ -115,7 +107,8 @@ other *untested* combinations may also work.
115107

116108
|tensorflow-data-validation |tensorflow |apache-beam[gcp]|
117109
|---------------------------|--------------|----------------|
118-
|GitHub master |nightly (1.x) |2.6.0 |
110+
|GitHub master |nightly (1.x) |2.8.0 |
111+
|0.11.0 |1.11 |2.8.0 |
119112
|0.9.0 |1.9 |2.6.0 |
120113

121114
## Questions

RELEASE.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,27 @@
1-
# Current version (not yet released; still in development)
1+
# Release 0.11.0
22

33
## Major Features and Improvements
44

5+
* Add option to infer feature types from schema when generating statistics over CSV data.
6+
* Add utility method `set_domain` to set the domain of a feature in the schema.
7+
* Add option to compute weighted statistics by providing a weight feature.
58
* Add a PTransform for decoding TF examples.
6-
* Add utility methods to write and load the schema protocol buffer.
9+
* Add utility methods `write_schema_text` and `load_schema_text` to write and load the schema protocol buffer.
710
* Add option to compute statistics over a sample.
8-
* Add support for computing weighted common statistics.
11+
* Optimize performance of statistics computation (~2x improvement on benchmark datasets).
912

1013
## Bug Fixes and Other Changes
1114

15+
* Depends on `apache-beam[gcp]>=2.8,<3`.
16+
* Depends on `tensorflow-transform>=0.11,<0.12`.
17+
* Depends on `tensorflow-metadata>=0.9,<0.10`.
1218
* Fix bug in clearing oneof domain\_info field in Feature proto.
1319
* Fix overflow error for large integers by casting them to STRING type.
1420
* Added API docs.
1521

1622
## Breaking changes
1723

24+
* Requires pre-installed `tensorflow>=1.11,<2`.
1825
* Make tf.Example decoder to represent a feature with no value list as a
1926
missing value (None).
2027
* Make StatsOptions as a class.

WORKSPACE

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@ workspace(name = "tensorflow_data_validation")
99
# reliable downloads.
1010
load("//tensorflow_data_validation:repo.bzl", "tensorflow_http_archive")
1111

12+
# v1.11.0
1213
tensorflow_http_archive(
1314
name = "org_tensorflow",
14-
sha256 = "696c4906d6536ed8d9f8f13c4927d3ccf36dcf3e13bb352ab80cba6b1b9038d4",
15-
git_commit = "25c197e02393bd44f50079945409009dd4d434f8",
15+
sha256 = "025b47263af34475dc75da40c76a87934a70f69611e9b0b88445d65730f0fc73",
16+
git_commit = "c19e29306ce1777456b2dbb3a14f511edf7883a8",
1617
)
1718

1819
# TensorFlow depends on "io_bazel_rules_closure" so we need this here.

g3doc/api_docs/python/_toc.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,12 @@ toc:
2424
path: /tfx/data_validation/api_docs/python/tfdv/get_feature
2525
- title: infer_schema
2626
path: /tfx/data_validation/api_docs/python/tfdv/infer_schema
27+
- title: load_schema_text
28+
path: /tfx/data_validation/api_docs/python/tfdv/load_schema_text
2729
- title: load_statistics
2830
path: /tfx/data_validation/api_docs/python/tfdv/load_statistics
31+
- title: set_domain
32+
path: /tfx/data_validation/api_docs/python/tfdv/set_domain
2933
- title: StatsOptions
3034
path: /tfx/data_validation/api_docs/python/tfdv/StatsOptions
3135
- title: TFExampleDecoder
@@ -36,3 +40,5 @@ toc:
3640
path: /tfx/data_validation/api_docs/python/tfdv/validate_statistics
3741
- title: visualize_statistics
3842
path: /tfx/data_validation/api_docs/python/tfdv/visualize_statistics
43+
- title: write_schema_text
44+
path: /tfx/data_validation/api_docs/python/tfdv/write_schema_text

g3doc/api_docs/python/index.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@
1414
* <a href="./tfdv/get_domain.md"><code>tfdv.get_domain</code></a>
1515
* <a href="./tfdv/get_feature.md"><code>tfdv.get_feature</code></a>
1616
* <a href="./tfdv/infer_schema.md"><code>tfdv.infer_schema</code></a>
17+
* <a href="./tfdv/load_schema_text.md"><code>tfdv.load_schema_text</code></a>
1718
* <a href="./tfdv/load_statistics.md"><code>tfdv.load_statistics</code></a>
19+
* <a href="./tfdv/set_domain.md"><code>tfdv.set_domain</code></a>
1820
* <a href="./tfdv/validate_statistics.md"><code>tfdv.validate_statistics</code></a>
19-
* <a href="./tfdv/visualize_statistics.md"><code>tfdv.visualize_statistics</code></a>
21+
* <a href="./tfdv/visualize_statistics.md"><code>tfdv.visualize_statistics</code></a>
22+
* <a href="./tfdv/write_schema_text.md"><code>tfdv.write_schema_text</code></a>

g3doc/api_docs/python/tfdv.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ Init module for TensorFlow Data Validation.
1313

1414
[`class DecodeCSV`](./tfdv/DecodeCSV.md): Decodes CSV records into an in-memory dict representation.
1515

16-
[`class GenerateStatistics`](./tfdv/GenerateStatistics.md): Public API for generating data statistics.
16+
[`class GenerateStatistics`](./tfdv/GenerateStatistics.md): API for generating data statistics.
1717

18-
[`class StatsOptions`](./tfdv/StatsOptions.md): Options for generating data statistics.
18+
[`class StatsOptions`](./tfdv/StatsOptions.md): Options for generating statistics.
1919

2020
[`class TFExampleDecoder`](./tfdv/TFExampleDecoder.md): A decoder for decoding TF examples into tf data validation datasets.
2121

@@ -37,9 +37,15 @@ Init module for TensorFlow Data Validation.
3737

3838
[`infer_schema(...)`](./tfdv/infer_schema.md): Infer schema from the input statistics.
3939

40+
[`load_schema_text(...)`](./tfdv/load_schema_text.md): Loads the schema stored in text format in the input path.
41+
4042
[`load_statistics(...)`](./tfdv/load_statistics.md): Loads data statistics proto from file.
4143

44+
[`set_domain(...)`](./tfdv/set_domain.md): Sets the domain for the input feature in the schema.
45+
4246
[`validate_statistics(...)`](./tfdv/validate_statistics.md): Validate the input statistics against the provided input schema.
4347

4448
[`visualize_statistics(...)`](./tfdv/visualize_statistics.md): Visualize the input statistics using Facets.
4549

50+
[`write_schema_text(...)`](./tfdv/write_schema_text.md): Writes input schema to a file in text format.
51+

0 commit comments

Comments
 (0)