Description
After coming back to it to add some features, I'm not happy with the LineReaderAt
interface and to some extent the use of io.ReaderAt
. This is mostly for text patches, since io.ReaderAt
is actually an ideal interface for the needs of binary patches.
Things I don't like:
- It's hard to know if you are at the end of the input or not. You have to read a minimal amount of data at what you think is the end offset and see if you get more data or an
io.EOF
. - It's hard to know how large the input is. As above, you have to read at what you think the length is and see if you get more data or an
io.EOF
. - The implementation of
LineReaderAt
wrapping anio.ReaderAt
feels complicated, but maybe this is inevitable when you need to build a line index dynamically - It's hard to control the memory usage when reading lines because you can set a number of lines, but have no control over the size of each line.
Any solution needs to solve the following constraints:
- Support random access to lines. Strict apply could work without this, but it's required for fuzzy apply, where you slowly backtrack through the file to find a match.
- Is a standard library type or can be created from a standard library type, the more widely implemented the better.
- Allows end users some control over performance and memory usage for special cases.
Things I've considered:
-
io.ReaderAt
andLineReaderAt
: this works well for binary applies (it's the minimal method needed to implement them), but has the problems outlined above for text applies. -
io.ReadSeeker
: this enables the same features asio.ReaderAt
(and is implemented by the same standard library types) but the position tracking andRead
function make some things (like copying) easier. Since I don't plan to support concurrent use of the same source, I'm not sure if there's a major difference between usingRead
andSeek
versus usingReadAt
. -
[]byte
: this is simple and supports random access, but doesn't allow much flexibility. The whole source must be in memory and the apply functions will compute the line index as needed even if there was a more efficient way to get it. On the other hand, it reduces the need for internal buffers, so the number of allocations is probably lower. For what it's worth,git
takes this approach and reads the full source file into memory for applies.
In my usage so far, everything is already in memory for other reasons, so the []byte
might be the simplest. Or maybe io.ReaderAt
is the correct interface and I just need a better abstraction on top of it for line operations.