Skip to content

Unable to dump unbounded PubSub content to gs:// bucket #42

@turboT4

Description

@turboT4

Hi there,

I'm completely new to Apache Beam and its programming model is quite surprising, and while trying to workaround a Parquet writer while reading from a PubSub Writer I can't wrap my head around the following...

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubsubToAvro.java taking that template as base.

Seems that AvroIO has support for windowed writes to buckets such as gs://my-bucket/YYYY/MM/DD, being 'YYYY' variables automatically filled at runtime by the AvroIO handler.

Is there any way to achieve this using ParquetIO? The only bits of Parquet I've seen are the following ones, but none of them write by date...

Tried a first approach and, even if the code compile and runs one event after another, I can't get it to run in local with DirectRunner and against a bucket. The code I've got so far is the following one.

SOLVED

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions