Skip to content

Iterating BigQuery query results produces incorrect number of rows #506

Closed
@polleyg

Description

This is in relation to post on Stack Overflow post here.

As per the sample code here, a user should be be able to paginate the results of a BigQuery query using the following code structure:

while (result != null) {
      Iterator<List<FieldValue>> iter = result.iterateAll();
      while (iter.hasNext()) {
        List<FieldValue> row = iter.next();
        //do something with row/data
        System.out.println(row);
      }
      result = result.getNextPage();
    }
  }

However, when the result set is large (in my tests >31,000 rows), more rows are returned/iterated even though calling result.getTotalRows() returns the correct/expected number of rows.

For example, I have a table with 85,250 rows (9.45 MB). When I query it via the Java API, and use the code from the example above, it actually iterates 160,296 times.

Even if I limit the result set in the query using limit 5000, and set setPageSize(1000L), then it iterates 15,000 times e.g:

QueryRequest queryRequest = QueryRequest
                .newBuilder("SELECT * FROM [<my-project-id>:<dataset>.<table_with_85250_rows>] limit 5000")
                .setUseLegacySql(true)
                .setPageSize(1000L)
                .build();
        QueryResponse response = bigQuery.query(queryRequest);
        QueryResult result = response.getResult();
        System.out.println("Total rows: " + result.getTotalRows());
        Integer rowNumber = 1;
        while(result != null){
            Iterator<List<FieldValue>> iter = result.iterateAll();
            while(iter.hasNext()){
                List<FieldValue> row = iter.next();
                System.out.println("Row: " + rowNumber + ", with number of columns: " + row.size());
                rowNumber++;
            }
            result = result.getNextPage();
        }

Output:

Total rows: 5000
Row: 1, with number of columns: 11
Row: 2, with number of columns: 11
Row: 3, with number of columns: 11
Row: 4, with number of columns: 11
Row: 5, with number of columns: 11
Row: 6, with number of columns: 11
Row: 7, with number of columns: 11
Row: 8, with number of columns: 11
Row: 9, with number of columns: 11
Row: 10, with number of columns: 11
Row: 11, with number of columns: 11
[.....]
Row: 14997, with number of columns: 11
Row: 14998, with number of columns: 11
Row: 14999, with number of columns: 11
Row: 15000, with number of columns: 11

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions