Skip to content

Bump StarRocks to 3.4 and remove DISTRIBUTED BY HASH(UserID) #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 4, 2025

Conversation

azat
Copy link
Contributor

@azat azat commented Apr 27, 2025

I was looking at some queries where StarRocks was magnitude faster, and turns out this was due to DISTRIBUTED BY HASH(UserID) BUCKETS 192, which looks like a hack. So I've replaced it with DISTRIBUTED BY RANDOM BUCKETS 1. The comparison for this change can be found this.

So after this change StarRocks became slightly slower.

Also bump the StarRocks version to latest and fix scripts here and there.

@azat azat changed the title Bump StarRocks results Bump StarRocks to 3.4 and remove DISTRIBUTED BY HASH(UserID) Apr 27, 2025
DOWNLOAD_URL=https://releases.starrocks.io/starrocks/StarRocks-3.0.0-preview.tar.gz
set -e

VERSION=3.4.2-ubuntu-amd64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can make our lifes easier and use the Starrocks docker image, see https://github.com/ClickHouse/JSONBench/tree/main/starrocks (install.sh / uninstall.sh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I thought about it, but decided to not make "radical" changes at once.

@@ -39,7 +40,7 @@ sleep 30
# Prepare Data
cd ../
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -f -d hits.tsv.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a problem with the previous gzip call which is fixed by force (-f)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running in a shell it will ask for confirmation if file already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow me to revert this and make a separate PR which adds -f to all usage of gzip in ClickBench.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DUPLICATE KEY (CounterID, EventDate, UserID, EventTime, WatchID)
DISTRIBUTED BY HASH(UserID) BUCKETS 192
)
DUPLICATE KEY (CounterID, EventDate, UserID, EventTime, WatchID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a section about tuning in the instructions:
https://github.com/ClickHouse/ClickBench#installation-and-fine-tuning

Maybe we should include stock Starrocks and tuned Starrocks. (something similar is done for ClickHouse, see clickhouse/results/*)


export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/
# NOTE: with latest java-24 the FE crashes and 9030 endpoint is broken, but 17 is used in the official docker images
sudo yum install -y java-17-amazon-corretto-devel mariadb105
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All benchmark scripts in ClickBench assume Ubuntu (apt). I'll fix this.

# Install
wget $DOWNLOAD_URL
wget -q https://releases.starrocks.io/starrocks/StarRocks-$VERSION.tar.gz -O StarRocks-$VERSION.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'll revert -q for consistency reasons (#325)

@rschu1ze
Copy link
Member

rschu1ze commented May 4, 2025

--> #359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants