Description
I'm branching this issue from the (currently resolved) issue #2871. The title covers the request, but to add some context see below.
Many python libraries (pandas, biopython, pillow, etc.) support reading from a 'python file object' which canonically are objects that implement the io.IOBase interface. If we could open a blob as a file object, we could do operations as shown in the example below where I'm just reading a few lines from a (possibly very large) file.
from Bio import SeqIO
from google.cloud.storage import blob
with blob.Blob('/data/example.fastq', 'mybucket').open('rb') as fileobj:
for i, rec in enumerate(SeqIO.parse(fileobj, 'fastq')):
print(rec.seq)
print(' name=%s\n annotations=%r' % (rec.name, rec.annotations))
if i > 5:
break
google-resumable-media
covers some of the need expressed in this bug, but it does not satisfy users who need blobs to be parsed by libraries expecting standard file objects. The standard advice might be to download files first but that advice ignores the expense and time of downloading large files when indexed operations are available.
Some work has been done towards this end - the following two repositories have solutions. That said, I think most people would be far more happy to adopt a solution from this project since it has a far larger community and more active maintenance / governance.