-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: add indexOf method #160
Conversation
3d12cad
to
6d4cdc7
Compare
6d4cdc7
to
cc46164
Compare
Adds a `.indexOf` method to `Buffer`, which borrows semantics from both `Array.prototype.indexOf` and `String.prototype.indexOf`. `Buffer.prototype.indexOf` can be invoked with a Buffer, a string or a number as the needle. If the needle a Buffer or string, it will find the first occurrence of this sequence of bytes. If the needle is a number, it will find the first occurrence of this byte. Reviewed-by: Sam Rijs <srijs@airpost.net> Fixes: nodejs#95 PR-URL: nodejs#160
cc46164
to
984a21f
Compare
Though this function might be useful as it is, I think |
@@ -332,6 +332,16 @@ Buffer.prototype.fill = function fill(val, start, end) { | |||
return this; | |||
}; | |||
|
|||
Buffer.prototype.indexOf = function indexOf(needle, pos) { | |||
if (typeof needle === 'number') { | |||
needle = new Buffer([needle]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating a buffer for a single byte seems too much. Also it's unclear that only one byte numbers are accepted. What if I want to search for 16 byte number? How to specify endianness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intentionally follows the semantics of Array.prototype.indexOf
.
It can thus only search for an element of the array. If you wanted to search for a sequence of elements (bytes), you can supply a Buffer as the needle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkurchatkin As I understand it, the backing slow buffer will have already been allocated, so this shouldn't be too bad of an allocation.
We could amortize the cost of repeated calls by having a statically available single byte buffer into which we swap the value, which might be a nice optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about special functions on internal
for each case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with a hybrid interface, I also think different specialised functions would be a cleaner approach.
That depends on how you look at it. The situation is that there is It might be that I am missing something, in that case don't hesitate to point it out :) |
I think that function that finds a buffer inside of another buffer is generally more useful than function that finds a byte inside a buffer. But the spec says that Typed Arrays should behave just like arrays of numbers. And IMO
obj.on('data', function (data) {
// what is data?
// in node it's a `Buffer`
// in browser or ES6-compatible environment it's a `Uint8Array`
// but they are compatible (to some extent), so we don't really care
}); It is maybe too late for full compatibility, but we shouldn't make things worse for no good reason. |
What about using something more efficient for multi-byte needles? I'm thinking of boyer-moore or a similar algorithm. |
boyer-moore-horspool ++ |
However, Boyer-Moore-Horspool's worst case is O(n*m) whereas Boyer-Moore runs in linear time. |
While I would be in favour of using a more sophisticated approach than naive searching, for the case that we go with Go with the current one (hybrid), or choose only one of I think @vkurchatkin has some valid points, I'd be interested to hear what some other people think of it. |
This will make writing parsers so much better ;) |
Checking in. @vkurchatkin I think the ship has sailed re: divergence from Uint8Array. We have already greatly diverged from the TypedArray spec in terms of public methods, and buffer-browserify shows how Buffers can be implemented on top of typed arrays for in-browser use. @srijs There seems to be a preference for implementing this function in terms of Boyer-Moore, would you be interested in continuing this PR in that direction? |
@chrisdickinson Yes, I'd be interested in going that direction. I will start with the implementation in the next couple of days. |
@srijs Ping. Any progress? |
@@ -764,6 +764,13 @@ buffer. | |||
var b = new Buffer(50); | |||
b.fill("h"); | |||
|
|||
### buf.indexOf(value[, fromIndex]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly something like byteOffset
would be better than fromIndex
. The latter makes it sound like the character placement, but that's incorrect for multi-byte strings.
@srijs Thanks for doing this work. There are several optimizations that could be implemented, but it'd take me a while to explain how to do them. Mind if I implement this myself, and I could throw in regex support as well. :) |
+100 |
Adds a
.indexOf
method toBuffer
, which borrows semantics fromboth
Array.prototype.indexOf
andString.prototype.indexOf
.Buffer.prototype.indexOf
can be invoked with a Buffer, a stringor a number as the needle. If the needle a Buffer or string, it will
find the first occurrence of this sequence of bytes. If the needle is
a number, it will find the first occurrence of this byte.
Fixes #95.
If you're unsure about or not happy with the semantics, let's discuss :)