Add linl33 v2 #678

linl33 · 2024-01-31T09:49:08Z

Check List:

You have run ./mvnw verify and the project builds successfully
Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
All formatting changes by the build are committed
Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
Output matches that of calculate_average_baseline.sh
For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)

Execution time: 3.5s, ~100ms improvement compared to previous version
Execution time of reference implementation:

gunnarmorling

It's faster for the standard key set, but takes unduly lon for 10K keys (see create_measurements_3.sh). Could you take a look, would be sad to see it with DNF on the 10K leaderboard (I am evaluating the top 25 entries against that key seet).

gunnarmorling · 2024-01-31T17:27:14Z

.github/workflows/maven.yml

+          if [ -f ${{ format('src/main/java-22/dev/morling/onebrc/CalculateAverage_{0}.java', github.event.pull_request.user.login || '') }} ]; then
+            sdk install java 22.ea.32-open || true
+            sdk use java 22.ea.32-open
+          fi


This should not be needed, the evaluate.sh script takes care of installing all required JDKs.

test_ci.sh installs the JDKs but it runs after maven so without that line, my class under java-22/ wouldn't be compiled.

Here's the original run without the change
https://github.com/gunnarmorling/1brc/actions/runs/7724135717/job/21055662835

linl33 · 2024-02-01T02:30:25Z

The 10k dataset takes ~40s on my laptop (6 cores, no HT) now. Still not fast but I don't think there's anything quick I can do to fix it now.

tivrfoa · 2024-02-01T02:46:00Z

It's faster for the standard key set, but takes unduly lon for 10K keys (see create_measurements_3.sh). Could you take a look, would be sad to see it with DNF on the 10K leaderboard (I am evaluating the top 25 entries against that key seet).

Hi @gunnarmorling !

One suggestion for when you run the 10k:

make 5000 cities with < 50 bytes
make 5000 cities with >= 50 bytes
have at least one city with 100 bytes

Because the 10k is what this challenge is really asking.
But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

The top 10 for the 10k might be really different. =)

gunnarmorling · 2024-02-01T10:24:44Z

One suggestion for when you run the 10k

It's pretty much doing that (see create_measurements_3.sh which creates that file).

Because the 10k is what this challenge is really asking.
But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

Yes, agreed. I had not anticipated this, and in hindsight I should have not exposed the specific key set. But by the time folks started to optimize for the 413 key set and I realized it, I felt it was too late for changing the evaluation key set, as it would have created lots of frustration. In the end, people did what they were asked to do, so that's totally fair :)

The top 10 for the 10k might be really different. =)

Yes, which is why there is that separate leaderboard for the 10K keyset which I update occassionally by running the top 25 or so entries from the regular leaderboard against that keyset. Now there could be entries which score lower on the standard leaderboard but would be in the top 25 for 10K, but I felt that's acceptable, given it's just a side eval. Learning a lot myself on how to run these kinds of things :)

The reason I am asking submitters of top entries like this one to also make sure they do reasonably well with 10K is to make sure that they pass this test, so it's another probe on correctness.

gunnarmorling · 2024-02-01T10:27:53Z

Thanks a lot for the update, @linl33. This looks good now. A bit faster on the standard key set and also comes in at ~19 sec for 10K with the correct results. Thx for participating in 1BRC and creating one of the top entries!

Benchmark 1: timeout -v 300 ./calculate_average_linl33.sh 2>&1
  Time (mean ± σ):      2.810 s ±  0.056 s    [User: 17.796 s, System: 0.807 s]
  Range (min … max):    2.670 s …  2.865 s    10 runs

Summary
  linl33: trimmed mean 2.8200656749800004, raw times 2.8377631459800003,2.85637806698,2.67035041898,2.8544735929800003,2.8047393789800004,2.86473951498,2.81122164398,2.8182784189800003,2.77623454898,2.80143660298

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:02.820 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_linl33.java)| 22.ea.32-open | [Li Lin](https://github.com/linl33) | uses Unsafe |

linl33 · 2024-02-02T02:25:36Z

@gunnarmorling Thank you for organizing this event!

gunnarmorling · 2024-02-11T12:32:31Z

Hey @linl33!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

linl33 · 2024-02-12T07:09:59Z

Hi @gunnarmorling,

My 2 random numbers are

12533 32231

linl33 marked this pull request as ready for review January 31, 2024 09:49

linl33 force-pushed the linl33-submit branch 6 times, most recently from 893730b to 1ed0358 Compare January 31, 2024 15:58

gunnarmorling reviewed Jan 31, 2024

View reviewed changes

linl33 force-pushed the linl33-submit branch from 1ed0358 to 4675717 Compare February 1, 2024 02:28

Add linl33 v2

f14a050

linl33 force-pushed the linl33-submit branch from 4675717 to f14a050 Compare February 1, 2024 02:44

gunnarmorling merged commit 9a27939 into gunnarmorling:main Feb 1, 2024
1 check passed

linl33 deleted the linl33-submit branch February 2, 2024 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add linl33 v2 #678

Add linl33 v2 #678

linl33 commented Jan 31, 2024

gunnarmorling left a comment

gunnarmorling Jan 31, 2024

linl33 Feb 1, 2024

linl33 commented Feb 1, 2024

tivrfoa commented Feb 1, 2024

gunnarmorling commented Feb 1, 2024 •

edited

Loading

gunnarmorling commented Feb 1, 2024

linl33 commented Feb 2, 2024

gunnarmorling commented Feb 11, 2024

linl33 commented Feb 12, 2024

Add linl33 v2 #678

Add linl33 v2 #678

Conversation

linl33 commented Jan 31, 2024

Check List:

gunnarmorling left a comment

Choose a reason for hiding this comment

gunnarmorling Jan 31, 2024

Choose a reason for hiding this comment

linl33 Feb 1, 2024

Choose a reason for hiding this comment

linl33 commented Feb 1, 2024

tivrfoa commented Feb 1, 2024

gunnarmorling commented Feb 1, 2024 • edited Loading

gunnarmorling commented Feb 1, 2024

linl33 commented Feb 2, 2024

gunnarmorling commented Feb 11, 2024

linl33 commented Feb 12, 2024

gunnarmorling commented Feb 1, 2024 •

edited

Loading