Skip to content

Run gzip with -f #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion aurora-mysql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Load the data

```
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

mysql -h "${FQDN}" -u admin --password="${PASSWORD}" test < create.sql

Expand Down
2 changes: 1 addition & 1 deletion aurora-postgresql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Load the data

```
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

psql -U postgres -h "${FQDN}" -t -c 'CREATE DATABASE test'
psql -U postgres -h "${FQDN}" test -t < create.sql
Expand Down
2 changes: 1 addition & 1 deletion bigquery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ source .bashrc
Load the data:
```
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

time bq load --source_format CSV --allow_quoted_newlines=1 test.hits hits.csv
```
Expand Down
2 changes: 1 addition & 1 deletion bytehouse/NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ Will try CSV.

```
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz
```

Now it started to work:
Expand Down
2 changes: 1 addition & 1 deletion bytehouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ export warehouse='test'

```
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz
```

Load the data:
Expand Down
2 changes: 1 addition & 1 deletion chdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ pip install --break-system-packages chdb==2.2.0b1

# Load the data
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz
./load.py

# Run the queries
Expand Down
2 changes: 1 addition & 1 deletion citus/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sudo apt-get install -y postgresql-client
sudo docker run -d --name citus -p 5432:5432 -e POSTGRES_PASSWORD=mypass citusdata/citus:11.0

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

echo "*:*:*:*:mypass" > .pgpass
chmod 400 .pgpass
Expand Down
2 changes: 1 addition & 1 deletion clickhouse/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ clickhouse-client < create"$SUFFIX".sql
if [ ! -f hits.tsv ]
then
wget --no-verbose --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
fi

clickhouse-client --time --query "INSERT INTO hits FORMAT TSV" < hits.tsv
Expand Down
2 changes: 1 addition & 1 deletion cloudberry/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ elif [[ $1 == 'test' ]]; then
chmod +x /home/gpadmin/run.sh
chown gpadmin:gpadmin /home/gpadmin/*
if [[ $2 != 'no_dl' ]]; then sudo -iu gpadmin wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'; fi
if [[ $2 != 'no_dl' ]]; then sudo -iu gpadmin gzip -d hits.tsv.gz; fi
if [[ $2 != 'no_dl' ]]; then sudo -iu gpadmin gzip -d -f hits.tsv.gz; fi
sudo -iu gpadmin chmod 777 ~ hits.tsv
sudo -iu gpadmin psql -d postgres -f /home/gpadmin/create.sql
sudo -iu gpadmin nohup gpfdist &
Expand Down
2 changes: 1 addition & 1 deletion cratedb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ do
done

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz' -O /tmp/hits.tsv.gz
gzip -d /tmp/hits.tsv.gz
gzip -d -f /tmp/hits.tsv.gz
chmod 444 /tmp/hits.tsv

psql -U crate -h localhost --no-password -t < $CREATE_FILE
Expand Down
2 changes: 1 addition & 1 deletion doris/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ mysql -h 127.0.0.1 -P9030 -uroot hits <"$ROOT"/create.sql
# Download data
if [[ ! -f hits.tsv.gz ]] && [[ ! -f hits.tsv ]]; then
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
fi

# Load data
Expand Down
2 changes: 1 addition & 1 deletion druid/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ echo "druid.query.groupBy.maxMergingDictionarySize=5000000000" >> apache-druid-$
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

./apache-druid-${VERSION}/bin/post-index-task --file ingest.json --url http://localhost:8081

Expand Down
2 changes: 1 addition & 1 deletion duckdb-memory/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pip install --break-system-packages duckdb==1.1.3 psutil
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

# Run the queries

Expand Down
2 changes: 1 addition & 1 deletion elasticsearch/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ curl -k -X PUT "https://localhost:9200/hits?pretty" -u "elastic:${PASSWORD}" -H

# Download and unzip dataset
wget https://datasets.clickhouse.com/hits_compatible/hits.json.gz
gzip -d hits.json.gz
gzip -d -f hits.json.gz

# Prepare Elasticsearch for large bulk insert. To do the large upload, you have to break up JSON file into smaller files to prevent 'curl' from OOM while doing it, and adjust ELasticsearch HTTP upload size minimum. This creates roughly 250M files (note it takes a while)
split -l 10000000 hits.json hits_
Expand Down
2 changes: 1 addition & 1 deletion greenplum/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ sudo chmod 777 /gpmaster /gpdata1 /gpdata2 /gpdata3 /gpdata4 /gpdata5 /gpdata6 /
gpinitsystem -ac gpinitsystem_singlenode
export MASTER_DATA_DIRECTORY=/gpmaster/gpsne-1/
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
chmod 777 ~ hits.tsv
psql -d postgres -f create.sql
nohup gpfdist &
Expand Down
2 changes: 1 addition & 1 deletion heavyai/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ sudo systemctl enable heavydb
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz
chmod 777 ~ hits.csv

sudo bash -c "echo 'allowed-import-paths = [\"/home/ubuntu/\"]' > /var/lib/heavyai/heavy.conf_"
Expand Down
2 changes: 1 addition & 1 deletion hyper/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sudo apt-get install -y python3-pip
pip install --break-system-packages tableauhyperapi

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

./load.py

Expand Down
2 changes: 1 addition & 1 deletion infobright/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sudo docker run -it --rm --network host mysql:5 mysql --host 127.0.0.1 --port 50
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

# ERROR 2 (HY000) at line 1: Wrong data or column definition. Row: 93557187, field: 100.
head -n 90000000 hits.tsv > hits90m.tsv
Expand Down
2 changes: 1 addition & 1 deletion locustdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ sudo apt-get install -y g++ capnproto libclang-14-dev
cargo build --features "enable_rocksdb" --features "enable_lz4" --release

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

target/release/repl --load hits.csv --db-path db

Expand Down
2 changes: 1 addition & 1 deletion mariadb-columnstore/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ mysql --password="${PASSWORD}" --host 127.0.0.1 test < create.sql
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

time mysql --password="${PASSWORD}" --host 127.0.0.1 test -e "
LOAD DATA LOCAL INFILE 'hits.tsv' INTO TABLE hits
Expand Down
2 changes: 1 addition & 1 deletion mariadb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ sudo service mariadb restart
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

sudo mariadb -e "CREATE DATABASE test"
sudo mariadb test < create.sql
Expand Down
2 changes: 1 addition & 1 deletion monetdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ sudo apt-get install -y expect
./query.expect "$(cat create.sql)"

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
chmod 777 ~ hits.tsv

./query.expect "COPY INTO hits FROM '$(pwd)/hits.tsv' USING DELIMITERS '\t'"
Expand Down
2 changes: 1 addition & 1 deletion mongodb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ time mongosh --quiet --eval 'db.hits.createIndex({"ClientIP": 1, "WatchID": 1, "
#################################
# Load data and import
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

# Use mongo import to load data into mongo. By default numInsertionWorkers is 1 so change to half of VM where it would be run
#time mongoimport --collection hits --type tsv hits.tsv --fieldFile=create.txt --columnsHaveTypes --numInsertionWorkers=8
Expand Down
2 changes: 1 addition & 1 deletion mysql-myisam/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ sudo service mysql restart
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

sudo mysql -e "CREATE DATABASE test"
sudo mysql test < create.sql
Expand Down
2 changes: 1 addition & 1 deletion mysql/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ sudo service mysql restart
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

sudo mysql -e "CREATE DATABASE test"
sudo mysql test < create.sql
Expand Down
2 changes: 1 addition & 1 deletion oxla/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ sudo DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential
echo "Download dataset."
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
echo "Unpack dataset."
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz
mkdir data
mv hits.csv data

Expand Down
2 changes: 1 addition & 1 deletion pg_duckdb-indexed/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ set -eux
#sudo apt-get install -y postgresql-client

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

memory=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
threads=$(nproc)
Expand Down
2 changes: 1 addition & 1 deletion pg_duckdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ set -eux
#sudo apt-get install -y postgresql-client

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

memory=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
threads=$(nproc)
Expand Down
2 changes: 1 addition & 1 deletion pgpro_tam/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ fi
psql -h 127.0.0.1 -U postgres -t < create/"$CREATE_FILE".sql

#get and unpack hits.tsv
sudo docker exec pgpro_tam bash -c "cd /tmp && wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz' && gzip -d hits.tsv.gz"
sudo docker exec pgpro_tam bash -c "cd /tmp && wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz' && gzip -d -f hits.tsv.gz"

#insert data to table
if [ "$1" == "parquet_fd_parall" ] ; then
Expand Down
2 changes: 1 addition & 1 deletion pinot/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ sleep 30
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

# Pinot was unable to load data as a single file wihout any errors returned. We have to split the data
split -d --additional-suffix .tsv --verbose -n l/100 hits.tsv parts
Expand Down
2 changes: 1 addition & 1 deletion postgresql-indexed/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ EOF
sudo systemctl restart postgresql@$PGVERSION-main

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

sudo -u postgres psql -t -c 'CREATE DATABASE test'
sudo -u postgres psql test -t <create.sql
Expand Down
2 changes: 1 addition & 1 deletion postgresql/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ EOF
sudo systemctl restart postgresql@$PGVERSION-main

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

sudo -u postgres psql -t -c 'CREATE DATABASE test'
sudo -u postgres psql test -t <create.sql
Expand Down
2 changes: 1 addition & 1 deletion questdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ questdb/bin/questdb.sh start
# Import the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

curl -G --data-urlencode "query=$(cat create.sql)" 'http://localhost:9000/exec'

Expand Down
2 changes: 1 addition & 1 deletion selectdb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ mysql -h 127.0.0.1 -P9030 -uroot hits <"$ROOT"/create.sql
# Download data
if [[ ! -f hits.tsv.gz ]] && [[ ! -f hits.tsv ]]; then
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
fi

# Load data
Expand Down
2 changes: 1 addition & 1 deletion siglens/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ cd ..

echo "Download and unzip dataset"
wget --continue https://datasets.clickhouse.com/hits_compatible/hits.json.gz
gzip -d hits.json.gz
gzip -d -f hits.json.gz

# Add the _index line and fix the UserID from string to num and preprocesses the dataset for loading
python3 fix_hits.py
Expand Down
2 changes: 1 addition & 1 deletion singlestore/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ sudo docker exec -it memsql-ciab memsql -p"${ROOT_PASSWORD}"
# Load the data

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
sudo docker cp hits.tsv memsql-ciab:/

sudo docker exec -it memsql-ciab memsql -p"${ROOT_PASSWORD}" -e "CREATE DATABASE test"
Expand Down
2 changes: 1 addition & 1 deletion sqlite/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sudo apt-get install -y sqlite3
sqlite3 mydb < create.sql

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz
gzip -d -f hits.csv.gz

time sqlite3 mydb '.import --csv hits.csv hits'
wc -c mydb
Expand Down
2 changes: 1 addition & 1 deletion starrocks/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ sleep 30
# Prepare Data
cd ../
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

# Create Table
mysql -h 127.0.0.1 -P9030 -uroot -e "CREATE DATABASE hits"
Expand Down
2 changes: 1 addition & 1 deletion tablespace/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sudo apt-get update
sudo apt-get install -y postgresql-client

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
chmod 777 ~ hits.tsv

psql "host=$HOSTNAME port=5432 dbname=csdb user=csuser password=$PASSWORD sslmode=require" < create.sql
Expand Down
2 changes: 1 addition & 1 deletion tembo-olap/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sudo apt-get update
sudo apt-get install -y postgresql-client

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
chmod 777 ~ hits.tsv

psql postgresql://postgres:$PASSWORD@$HOSTNAME:5432 -t -c 'CREATE DATABASE test'
Expand Down
2 changes: 1 addition & 1 deletion timescale-cloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

```bash
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz

export $CONNECTION_STRING=...
time ./load.sh
Expand Down
2 changes: 1 addition & 1 deletion timescaledb-no-columnstore/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ sudo -u postgres psql -c "CREATE DATABASE nocolumnstore"
sudo -u postgres psql nocolumnstore -c "CREATE EXTENSION timescaledb WITH VERSION '2.17.2';"

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
sudo chmod og+rX ~
chmod 777 hits.tsv

Expand Down
2 changes: 1 addition & 1 deletion timescaledb/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ sudo -u postgres psql test -c "CREATE EXTENSION timescaledb WITH VERSION '2.17.2

# Import the data
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
sudo chmod og+rX ~
chmod 777 hits.tsv

Expand Down
2 changes: 1 addition & 1 deletion umbra/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ sudo apt-get install -y postgresql-client gzip

rm -rf hits.tsv
wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits.tsv.gz
gzip -d -f hits.tsv.gz
chmod 777 hits.tsv

rm -rf umbra-25-01-23.tar.xz umbra
Expand Down
Loading