-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
style: article/section 📝New article or section requestNew article or section requestsystem: storage 🛢Docs related to storageDocs related to storage
Milestone
Description
What would you like to see added?
Caveat!
Our understanding is that s5cmd uses md5 hashes to verify binary content integrity during uploads only, not downloads. For more intricate verification another tool will be required (e.g. of metadata or using another hash). A later post in this issue documents how to use rclone check.
Notes
--statshows total files transferred, failed, successful, at the end of the job--numworkers=$SLURM_CPUS_ON_NODEis perfect for a single-node job--endpoint-url=https://s3.lts.rc.uab.edu/is required for our S3 endpointmvwill remove the file from the source!cpis what we want until we've verified the files on the destination
Tests
Tests with 8 cpus and 8 GB memory on c0168:
- 39 files @ 1 GiB each: ~5.1 gbps
- 1000 files @ 10 MiB each: ~0.95 gbps
Tests with 100 cpus and 200 GB memory on c0202 (amd-hdr100)
- 1000 files @ 10 MiB each: ~8.0 gbps
Example
Sample commands to get timing and s5cmd cp (in a script):
#!/bin/bash
start_time="$(date -u +%s.%N)"
s5cmd --stat \
--numworkers=$SLURM_CPUS_ON_NODE \
--endpoint-url=https://s3.lts.rc.uab.edu/ \
cp \
SOURCE_PATH \
s3://DESTINATION_PATH/
end_time="$(date -u +%s.%N)"
elapsed="$(bc <<<"$end_time-$start_time")"
echo "Total of $elapsed seconds elapsed for process"Other thoughts
We don't fully understand the cp flag --concurrency.
There are also open questions about the Rados Gateway frontend configuration.
- file with config stuff is
ceph.confhttps://docs.ceph.com/en/latest/radosgw/config-ref/#ceph-object-gateway-config-reference - Max copy concurrency https://docs.ceph.com/en/latest/radosgw/config-ref/#confval-rgw_max_copy_obj_concurrent_io
- Max HTTP requests https://docs.ceph.com/en/latest/radosgw/config-ref/#confval-rgw_max_concurrent_requests
Metadata
Metadata
Assignees
Labels
style: article/section 📝New article or section requestNew article or section requestsystem: storage 🛢Docs related to storageDocs related to storage