-
Notifications
You must be signed in to change notification settings - Fork 7
Adding byteoffset and line no. to callbacks. #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi moos, Thanks for your interest. I like your improvement in the callbacks but I disagree with the line counter because this can be easily done from outside; it's not necessary to pass it as a callback parameter. Tomorrow I'll take a look at the rest of the code. |
|
I've reviewed the code and I've added only the callbacks improvement. The offsets and the line counter can be handled from outside the class with a simple counter. New version is 0.2.1. Also, take into account that 1 byte is not always 1 char because a utf8 character can be encoded in one or more bytes. You were incrementing the byte offset counter by 1 when the stream had an encoding. |
|
I don't think the offset is trivial, at least for the 'line' event. I suppose they could attach a 'character' or 'byte' event and count #bytes then subtract the current line length to get the line offset, but that's too inefficient and not quite elegant, imho. You're right about the byte-count for utf8. I knew it needed more testing. :) |
|
I've tought a little bit more and you're right. I'll add the offset in all the events. I'm testing how to do that for utf8 encoded chars. EDIT: For testing, you can download the project and execute the |
|
v0.2.2 will have the offset parameter |
|
I finally got a chance to test this and noticed your changes behave differently than mine. Your byte offset returns the post-offset, ie, the offset at the end of the entity just read. Mine is the offset at the beginning of the entity. So 'character' event should return a 0 offset on the first character of the file, 1 on the second, etc. Similarly first 'line' event should return 0 offset (ie, beginning of the line). I think this is more useful and customary. |
|
I return the next byte offset after the reading is done to ease the reading of a file. The offset will point to the next byte to read. If you want to know the previous offset it's easy, just create a variable an initialize it to 0 and increment it. byteOffset example: var BufferedReader = require ("../build/buffered-reader");
console.log ("First line, expected offset: 21");
console.log ("Second line, expected offset: 61\n");
var o = 0;
new BufferedReader ("bmp", { encoding: "utf8" })
.on ("error", function (error){
console.log (error);
})
.on ("line", function (line, byteOffset){
console.log ("start: " + o + ", end: " + (byteOffset - 1));
o += byteOffset;
})
.read ();Result: First line, expected offset: 21 start: 0, end: 20 |
|
Thanks for the suggestion. For me getting the offset of the current event is important, but others may have different use case. I might suggest renaming it to nextOffset or endOffset or at least clarifying in the docs as I think it'll confuse others as well. |
Hi Gabriel,
Thanks for this nice package. I needed a way to get the byte offset of the line I was reading, so I wend ahead and added the byte offset to the callback for the 'byte', 'character', 'buffer', and 'line' events. 'line' event also gets the line no.
Sorry for the white-space mix up, I have it set to no-tabs. If you do a 'git diff --ignore-all-space' you'll see just the changes.
Also I did a bit of optimization on the event callbacks. Don't know if it makes much difference, but better to be on the safe side. (I'm reading multi MB files).
Should probably add test cases if you decide to incorporate these.
Best,
moos
Here's the diff less white-space clutter: