Skip to content

Some data is getting missed while using CassandraIO Read  #375

@nagamanikanta-sml

Description

@nagamanikanta-sml

Related Template(s)

CassandraToBigtable.java

What happened?

We are doing a dataflow batch job to read data from a cassandra database and write it to bigtable.

We are using the CassandraToBigtable.java template and found that some the data is missing while reading from the Cassandra database.

Pipeline p = Pipeline.create(options);

// Create a factory method to inject the CassandraRowMapperFn to allow custom type mapping.
SerializableFunction<Session, Mapper> cassandraObjectMapperFactory = new CassandraRowMapperFactory(options.getCassandraTable(), options.getCassandraKeyspace());

CassandraIO.Read<Row> source =
    CassandraIO.<Row>read()
        .withHosts(hosts)
        .withPort(options.getCassandraPort())
        .withKeyspace(options.getCassandraKeyspace())
        .withTable(options.getCassandraTable())
        .withMapperFactoryFn(cassandraObjectMapperFactory)
        .withEntity(Row.class)
        .withCoder(SerializableCoder.of(Row.class));
        // .withEntity(User.class)
        // .withCoder(SerializableCoder.of(User.class));

After the above step we tried to print the data that is being read and found that some rows are missing, to be precise 15 rows ( first 5 and last 10) are missing out of 512 rows.

Beam Version

Newer than 2.35.0

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions