Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linl33 v2 #678

Merged
merged 1 commit into from
Feb 1, 2024
Merged

Add linl33 v2 #678

merged 1 commit into from
Feb 1, 2024

Conversation

linl33
Copy link
Contributor

@linl33 linl33 commented Jan 31, 2024

Check List:

  • You have run ./mvnw verify and the project builds successfully
  • Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • All formatting changes by the build are committed
  • Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • Output matches that of calculate_average_baseline.sh
  • For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)
  • Execution time: 3.5s, ~100ms improvement compared to previous version
  • Execution time of reference implementation:

@linl33 linl33 marked this pull request as ready for review January 31, 2024 09:49
@linl33 linl33 force-pushed the linl33-submit branch 6 times, most recently from 893730b to 1ed0358 Compare January 31, 2024 15:58
Copy link
Owner

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's faster for the standard key set, but takes unduly lon for 10K keys (see create_measurements_3.sh). Could you take a look, would be sad to see it with DNF on the 10K leaderboard (I am evaluating the top 25 entries against that key seet).

Comment on lines +59 to +62
if [ -f ${{ format('src/main/java-22/dev/morling/onebrc/CalculateAverage_{0}.java', github.event.pull_request.user.login || '') }} ]; then
sdk install java 22.ea.32-open || true
sdk use java 22.ea.32-open
fi
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed, the evaluate.sh script takes care of installing all required JDKs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ci.sh installs the JDKs but it runs after maven so without that line, my class under java-22/ wouldn't be compiled.

Here's the original run without the change
https://github.com/gunnarmorling/1brc/actions/runs/7724135717/job/21055662835

@linl33
Copy link
Contributor Author

linl33 commented Feb 1, 2024

The 10k dataset takes ~40s on my laptop (6 cores, no HT) now. Still not fast but I don't think there's anything quick I can do to fix it now.

@tivrfoa
Copy link
Contributor

tivrfoa commented Feb 1, 2024

It's faster for the standard key set, but takes unduly lon for 10K keys (see create_measurements_3.sh). Could you take a look, would be sad to see it with DNF on the 10K leaderboard (I am evaluating the top 25 entries against that key seet).

Hi @gunnarmorling !

One suggestion for when you run the 10k:

  • make 5000 cities with < 50 bytes
  • make 5000 cities with >= 50 bytes
  • have at least one city with 100 bytes

Because the 10k is what this challenge is really asking.
But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

The top 10 for the 10k might be really different. =)

@gunnarmorling
Copy link
Owner

gunnarmorling commented Feb 1, 2024

One suggestion for when you run the 10k

It's pretty much doing that (see create_measurements_3.sh which creates that file).

Because the 10k is what this challenge is really asking.
But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

Yes, agreed. I had not anticipated this, and in hindsight I should have not exposed the specific key set. But by the time folks started to optimize for the 413 key set and I realized it, I felt it was too late for changing the evaluation key set, as it would have created lots of frustration. In the end, people did what they were asked to do, so that's totally fair :)

The top 10 for the 10k might be really different. =)

Yes, which is why there is that separate leaderboard for the 10K keyset which I update occassionally by running the top 25 or so entries from the regular leaderboard against that keyset. Now there could be entries which score lower on the standard leaderboard but would be in the top 25 for 10K, but I felt that's acceptable, given it's just a side eval. Learning a lot myself on how to run these kinds of things :)

The reason I am asking submitters of top entries like this one to also make sure they do reasonably well with 10K is to make sure that they pass this test, so it's another probe on correctness.

@gunnarmorling
Copy link
Owner

Thanks a lot for the update, @linl33. This looks good now. A bit faster on the standard key set and also comes in at ~19 sec for 10K with the correct results. Thx for participating in 1BRC and creating one of the top entries!

Benchmark 1: timeout -v 300 ./calculate_average_linl33.sh 2>&1
  Time (mean ± σ):      2.810 s ±  0.056 s    [User: 17.796 s, System: 0.807 s]
  Range (min … max):    2.670 s …  2.865 s    10 runs

Summary
  linl33: trimmed mean 2.8200656749800004, raw times 2.8377631459800003,2.85637806698,2.67035041898,2.8544735929800003,2.8047393789800004,2.86473951498,2.81122164398,2.8182784189800003,2.77623454898,2.80143660298

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:02.820 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_linl33.java)| 22.ea.32-open | [Li Lin](https://github.com/linl33) | uses Unsafe |

@gunnarmorling gunnarmorling merged commit 9a27939 into gunnarmorling:main Feb 1, 2024
1 check passed
@linl33 linl33 deleted the linl33-submit branch February 2, 2024 02:23
@linl33
Copy link
Contributor Author

linl33 commented Feb 2, 2024

@gunnarmorling Thank you for organizing this event!

@gunnarmorling
Copy link
Owner

Hey @linl33!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

@linl33
Copy link
Contributor Author

linl33 commented Feb 12, 2024

Hi @gunnarmorling,

My 2 random numbers are

12533 32231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants