Skip to content

Iterating BigQuery query results produces incorrect number of rows #506

Closed
@polleyg

Description

@polleyg

This is in relation to post on Stack Overflow post here.

As per the sample code here, a user should be be able to paginate the results of a BigQuery query using the following code structure:

while (result != null) {
      Iterator<List<FieldValue>> iter = result.iterateAll();
      while (iter.hasNext()) {
        List<FieldValue> row = iter.next();
        //do something with row/data
        System.out.println(row);
      }
      result = result.getNextPage();
    }
  }

However, when the result set is large (in my tests >31,000 rows), more rows are returned/iterated even though calling result.getTotalRows() returns the correct/expected number of rows.

For example, I have a table with 85,250 rows (9.45 MB). When I query it via the Java API, and use the code from the example above, it actually iterates 160,296 times.

Even if I limit the result set in the query using limit 5000, and set setPageSize(1000L), then it iterates 15,000 times e.g:

QueryRequest queryRequest = QueryRequest
                .newBuilder("SELECT * FROM [<my-project-id>:<dataset>.<table_with_85250_rows>] limit 5000")
                .setUseLegacySql(true)
                .setPageSize(1000L)
                .build();
        QueryResponse response = bigQuery.query(queryRequest);
        QueryResult result = response.getResult();
        System.out.println("Total rows: " + result.getTotalRows());
        Integer rowNumber = 1;
        while(result != null){
            Iterator<List<FieldValue>> iter = result.iterateAll();
            while(iter.hasNext()){
                List<FieldValue> row = iter.next();
                System.out.println("Row: " + rowNumber + ", with number of columns: " + row.size());
                rowNumber++;
            }
            result = result.getNextPage();
        }

Output:

Total rows: 5000
Row: 1, with number of columns: 11
Row: 2, with number of columns: 11
Row: 3, with number of columns: 11
Row: 4, with number of columns: 11
Row: 5, with number of columns: 11
Row: 6, with number of columns: 11
Row: 7, with number of columns: 11
Row: 8, with number of columns: 11
Row: 9, with number of columns: 11
Row: 10, with number of columns: 11
Row: 11, with number of columns: 11
[.....]
Row: 14997, with number of columns: 11
Row: 14998, with number of columns: 11
Row: 14999, with number of columns: 11
Row: 15000, with number of columns: 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions