You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 11, 2022. It is now read-only.
We have identified an issue with Dataflow jobs reading from TextIO with compression type set to GZIP or BZIP2, potentially losing data during processing.
This is a silent issue so you will not see any error messages or visible symptoms. The problem occurs under the following circumstances: Using the Dataflow SDK for Java 1.6.0, reading compressed files, and setting the compression mode using withCompressionType to either GZIP or BZIP2.
Current known workarounds:
Recommended option: Use AUTO mode instead of GZIP or BZIP2 mode.
Use withCompressionType(CompressionType.AUTO) or leave it unset (it is the default) with the TextIO source. NOTE: compressed files must have .gz or .bz2 (case-insensitive) extension for this to work.
Switch to version 1.5.1 of the Dataflow SDK for Java. If you are using mvn, this can be done by specifying version 1.5.1 in your pom.xml
We are actively working to resolve this and will update this issue with all developments.