-
Notifications
You must be signed in to change notification settings - Fork 197
Bump StarRocks to 3.4 and remove DISTRIBUTED BY HASH(UserID)
#355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DISTRIBUTED BY HASH(UserID)
DOWNLOAD_URL=https://releases.starrocks.io/starrocks/StarRocks-3.0.0-preview.tar.gz | ||
set -e | ||
|
||
VERSION=3.4.2-ubuntu-amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can make our lifes easier and use the Starrocks docker image, see https://github.com/ClickHouse/JSONBench/tree/main/starrocks (install.sh / uninstall.sh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I thought about it, but decided to not make "radical" changes at once.
@@ -39,7 +40,7 @@ sleep 30 | |||
# Prepare Data | |||
cd ../ | |||
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz' | |||
gzip -d hits.tsv.gz | |||
gzip -f -d hits.tsv.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a problem with the previous gzip call which is fixed by force (-f
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running in a shell it will ask for confirmation if file already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow me to revert this and make a separate PR which adds -f
to all usage of gzip in ClickBench.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DUPLICATE KEY (CounterID, EventDate, UserID, EventTime, WatchID) | ||
DISTRIBUTED BY HASH(UserID) BUCKETS 192 | ||
) | ||
DUPLICATE KEY (CounterID, EventDate, UserID, EventTime, WatchID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a section about tuning in the instructions:
https://github.com/ClickHouse/ClickBench#installation-and-fine-tuning
Maybe we should include stock Starrocks and tuned Starrocks. (something similar is done for ClickHouse, see clickhouse/results/*)
|
||
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/ | ||
# NOTE: with latest java-24 the FE crashes and 9030 endpoint is broken, but 17 is used in the official docker images | ||
sudo yum install -y java-17-amazon-corretto-devel mariadb105 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All benchmark scripts in ClickBench assume Ubuntu (apt). I'll fix this.
# Install | ||
wget $DOWNLOAD_URL | ||
wget -q https://releases.starrocks.io/starrocks/StarRocks-$VERSION.tar.gz -O StarRocks-$VERSION.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I'll revert -q
for consistency reasons (#325)
--> #359 |
I was looking at some queries where StarRocks was magnitude faster, and turns out this was due to
DISTRIBUTED BY HASH(UserID) BUCKETS 192
, which looks like a hack. So I've replaced it withDISTRIBUTED BY RANDOM BUCKETS 1
. The comparison for this change can be found this.So after this change StarRocks became slightly slower.
Also bump the StarRocks version to latest and fix scripts here and there.