Reconsider source inputs for apply functions

After coming back to it to add some features, I'm not happy with the `LineReaderAt` interface and to some extent the use of `io.ReaderAt`. This is mostly for text patches, since `io.ReaderAt` is actually an ideal interface for the needs of binary patches.

Things I don't like:

- It's hard to know if you are at the end of the input or not. You have to read a minimal amount of data at what you think is the end offset and see if you get more data or an `io.EOF`.
- It's hard to know how large the input is. As above, you have to read at what you think the length is and see if you get more data or an `io.EOF`.
- The implementation of `LineReaderAt` wrapping an `io.ReaderAt` feels complicated, but maybe this is inevitable when you need to build a line index dynamically
- It's hard to control the memory usage when reading lines because you can set a number of lines, but have no control over the size of each line.

Any solution needs to solve the following constraints:

- Support random access to lines. Strict apply could work without this, but it's required for fuzzy apply, where you slowly backtrack through the file to find a match.
- Is a standard library type or can be created from a standard library type, the more widely implemented the better.
- Allows end users some control over performance and memory usage for special cases.

Things I've considered:

- `io.ReaderAt` and `LineReaderAt`: this works well for binary applies (it's the minimal method needed to implement them), but has the problems outlined above for text applies.

- `io.ReadSeeker`: this enables the same features as `io.ReaderAt` (and is implemented by the same standard library types) but the position tracking and `Read` function make some things (like copying) easier. Since I don't plan to support concurrent use of the same source, I'm not sure if there's a major difference between using `Read` and `Seek` versus using `ReadAt`.

- `[]byte`: this is simple and supports random access, but doesn't allow much flexibility. The whole source must be in memory and the apply functions will compute the line index as needed even if there was a more efficient way to get it. On the other hand, it reduces the need for internal buffers, so the number of allocations is probably lower. For what it's worth, `git` takes this approach and reads the full source file into memory for applies.

In my usage so far, everything is already in memory for other reasons, so the `[]byte` might be the simplest. Or maybe `io.ReaderAt` is the correct interface and I just need a better abstraction on top of it for line operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconsider source inputs for apply functions #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reconsider source inputs for apply functions #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions