You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The streaming dataloader is also particularly useful when your dataset has been moved to a central location.
111
-
For example:
112
-
```bash
113
112
# This will do the same thing, but stream data to {local} from {remote}.
114
113
# The remote path can be a filesystem or object store URI.
115
-
python ../common/text_data.py /tmp/cache-c4 ./my-copy-c4 # stream from filesystem, e.g. a slow NFS volume to fast local disk
116
-
python ../common/text_data.py /tmp/cache-c4 s3://my-bucket/my-copy-c4 # stream from object store
114
+
python ../common/text_data.py --local_path /tmp/cache-c4 --remote_path ./my-copy-c4 --tokenizer bert-base-uncased# stream from filesystem, e.g. a slow NFS volume to fast local disk
115
+
#python ../common/text_data.py --local_path /tmp/cache-c4 --remote_path s3://my-bucket/my-copy-c4 --tokenizer bert-base-uncased # stream from object store
117
116
```
118
117
119
118
With our data prepared, we can now start training.
0 commit comments