Skip to content

Datastore: fetching by incremental offset and constant limit fails #4689

Closed
@frbattid

Description

@frbattid

Hi all,

This is my setup:

$ cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
$
$ python --version
Python 2.7.5
$
$ pip freeze | grep google
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
google-api-core==0.1.2
google-auth==1.2.1
google-cloud==0.32.0
google-cloud-bigquery==0.28.0
google-cloud-bigquery-datatransfer==0.1.0
google-cloud-bigtable==0.28.1
google-cloud-container==0.1.0
google-cloud-core==0.27.1
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.28.4
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.17.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==1.0.0
google-cloud-vision==0.29.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
grpc-google-iam-v1==0.11.4
proto-google-cloud-datastore-v1==0.90.4
proto-google-cloud-error-reporting-v1beta1==0.15.3
proto-google-cloud-logging-v2==0.91.3

I'm running the following code:

from google.cloud import datastore

gproject = 'myproject'
gnamespace = ''  # default one
gkind = 'mykind'
datastore_client = datastore.Client(project=gproject, namespace=gnamespace)
query = datastore_client.query(kind=gkind)

print('Getting total amount of entities')
total_amount = len(list(query.fetch()))
print('Total amount of entities is {}'.format(total_amount))

offset = 0
limit = 1000

while True:
    print('Reading entities from {} to {}'.format(offset, offset + limit - 1))
    length = len(list(query.fetch(offset=offset, limit=limit)))

    if length == 0:
        print('Number of entities finally read is 0')
        break

    print('Number of entities finally read is {}'.format(length))
    offset += length

It produces the following output:

$ python test.py 
Getting total amount of entities
E0104 12:41:57.505551523    2926 ev_epollex_linux.cc:1482]   Skipping epollex becuase GRPC_LINUX_EPOLL is not defined.
E0104 12:41:57.505581590    2926 ev_epoll1_linux.cc:1261]    Skipping epoll1 becuase GRPC_LINUX_EPOLL is not defined.
E0104 12:41:57.505588636    2926 ev_epollsig_linux.cc:1761]  Skipping epollsig becuase GRPC_LINUX_EPOLL is not defined.
Total amount of entities is 281600
Reading entities from 0 to 999
Number of entities finally read is 1000
Reading entities from 1000 to 1999
Number of entities finally read is 1000
Reading entities from 2000 to 2999
Number of entities finally read is 1000
Reading entities from 3000 to 3999
Number of entities finally read is 0
$

As you can see, the total amount of entities is 281600, thus I expect to "iterate" on them by using an incremental offset and a constant limit. It works fine for the first 3000 entities, then it returns no more entities. It always fails in the same offset.

Of course, we have the GRPC_LINUX_EPOLL (error? warning?) messages... I've seen this is related somehow to grpcio, since downgrading its version makes such messages disappear. My current version is 1.8.2:

$ pip freeze | grep grpcio
grpcio==1.8.2

Any help would be really appreciated. Thanks!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions