Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add position() #62

Closed
alyst opened this issue Aug 12, 2018 · 3 comments
Closed

add position() #62

alyst opened this issue Aug 12, 2018 · 3 comments

Comments

@alyst
Copy link

alyst commented Aug 12, 2018

It would be nice to have Base.position(io::TranscodingStream) to return the current position in the transcoded stream.
There's already state.total_out, so it should not be impossible to keep track of the transcoded bytes since the last reset (+myabe adjust it w.r.t unread()).

The use case.
I came across this issue once again when trying to convert MLDatasets.jl to use CodecZlib.jl instead of the aging GZip.jl.
The gzipped files there are collections of images, each image occupying the same size. So, given image index, we can tell its exact position in the decoding stream.
When implementing readimages(io, image_indices::AbstractVector), it was natural to define readimage(io, image_index).
To work, readimage() just needs to know the current stream position, so that it can calculate how many bytes to skip to get to the specified image.
Without position(io) one would have to keep track of this information somewhere externally, which looks like an unnecessary complication to me.

@damiendr
Copy link

damiendr commented Sep 4, 2018

I would also like having position() on transcoding streams, so that I could use them with a parser that does something like this:

len = read(io, UInt32)
offset = position(io)
while position(io) < offset + len
   read_item(io)
end

@zgornel
Copy link

zgornel commented Sep 16, 2018

position seems to work for the Noop() stream however one has to flush the stream first:

julia> using TranscodingStreams, Serialization
       fid = TranscodingStream(Noop(), open("./__deleteme__.bin","w"))
       serialize(fid, 1)
       @show position(fid)
       flush(fid)
       @show position(fid)
# position(fid) = -9
# position(fid) = 9

@laborg
Copy link

laborg commented Oct 10, 2018

Looking at CSV.jl again, it appears that implementing seek would be necessary too.

@bicycle1885 : I don't want to pressure you, but do you know when you will find time? With this information dependent packages could easier decide whether or not to create an intermediate work around. (e.g. load everything into a IOBuffer for RDatasets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants