-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdlib: Add BIF binary:split/2 and binary:split/3 #771
Conversation
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
2f79777
to
7ff7665
Compare
./otp_build all -a failed:
I am a script, I am not human |
I'm unable to reproduce the error reported by @OTP-Maintainer when building from a clean checkout on OS X and SmartOS. I'm assuming it was a resource limitation error while running the automated jenkins build. |
The PR with its speedup looks good, but we don't like all the code duplication. In the spirit to inhibit code bloating the beam, I think some time should be spent on refactoring. To lead the way I have done one refactoring of the "backend" for split https://github.com/sverker/otp/commits/sverk/binary_split_bif I see two approaches for further code reduce:
|
@sverker Thank you for refactoring the bloated code. I admit that I had more code duplicated than was necessary while I was debugging each of the 4 branches a split could follow (global/non-global and BM/AC). When I was implementing this BIF, I wasn't sure whether touching all of the match/matches implementation or trying to keep this BIF isolated was better code/contribution etiquette, so I opted for the latter. I do agree, however, that it would make more sense to deduplicate the very similar match/matches/split underlying implementations.
This option makes the most sense to me, in part because of my lack of knowledge explained below.
I don't entirely understand how everything would function code-organization-wise, especially the BIF traps. Would there just be a single trap for match/matches/split? I'm probably not envisioning things here as you intend, but it will probably click and make sense for me later 😄 For your sverk/binary_split_bif branch, would you like me to merge that into my branch so your changes are present in this pull request, or do you want to wait until any refactoring has been finalized? |
Fetching pull requests from github failed:
I am a script, I am not human |
02cf385
to
87b90a2
Compare
@sverker I just rebased everything onto maint with with my refactor changes included. I'm not sure why I didn't initially understand about your suggestion for the unified trap, but I believe I have what you were suggesting implemented now. Here is the list of changes in the latest commit:
The net change in lines of code from efdc3f1 for All of the |
Looks good. Nittpick #1: Third argument to do_binary_find should be NULL instead of THE_NON_VALUE. You get Nittpick #2: It would nice if you made a struct something like this:
and then use that as type for state_ptr. I think that will make the code nicer when returning |
* Use NULL instead of THE_NON_VALUE for non-Eterm variable. * Add BinaryFindState_bignum struct to avoid unnecessary type casting.
@sverker I just committed changes for nitpicks 1 and 2, hopefully that accomplishes what you had in mind. Let me know if you notice anything else that needs changing. |
Oh, you didn't read my mind ;-) |
@sverker That makes sense and looks much cleaner now. Your commit has been merged into this PR. |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Finally merged to master in 1670e9d. I also joined some of the commits. |
@sverker Awesome, thank you! |
While working on an Erlang implementation of the dimroc/etl-language-comparison#10 project, @josevalim pointed out that one of the performance issues was the pure Erlang implementation of
binary:split/2
andbinary:split/3
: [erlang-questions] Trying to understand the performance impact of binary:split/3.Rough estimates on my machine show a 6x speed up for
binary:split/2
and 4x speed upbinary:split/3
between the pure Erlang and native BIF implementation.The implementation itself uses the same Boyer Moore and Aho Corasick algorithms used in
binary:match/2,3
andbinary:matches/2,3
.Update 2015-06-25: Rebased for OTP-18.0 and added support for the
trim_all
option added in #517. Rough estimates show a 6-10x speed up forbinary:split/3
when using thetrim_all
option.