This repository was archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 549
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
Dshuttle integration Plan #4599
Copy link
Copy link
Open
Labels
Description
P0: Integrate with PAI
Code freeze: 9.31 Endgame: 10.12
Deploy
- Make dshuttle as k8s PV/PVC, leverage PAI storage solution. @Binyang2014
1. PAI service config
2. Add alluxio.fuse into /etc/updatedb.conf
3. Expose Dshuttle API to frontend
4. Add dshuttle type in rest-server
5 Refine UI display, need @yiyione help
6. A tool to let customer preload data to dshuttle - Limit/Bound Dshuttle resource usage. @Binyang2014
Memory high usage is cause by grpc flow-control issue, and can be mitigate by change default config. Seems 6GB~8GB for CSI is enough - Doc for Dshuttle configuration, and how to change hived config. @Binyang2014 Qianxi
Robust
- Stable interface to upload data to Dshuttle Qianxi
- Figure out Dshuttle failure pattern (Master should not failure) binxuan
1. Worker down/rejoin when running jobs P0
case: job read all data from Dshuttle/partial from Dshuttle/All from UFS.
Expected behavior:
- User job can continue running without any failure.
- Rejoined worker node can serve the request.
- Missing file will read from UFS
2. Client daemon failure when running jobs P1
- One fuse daemon failed will not affect other job running on same node
3. Worker down/rejoin when preload data P1
Expected behavior:
- Failed worker will not block uploading process
- Rejoined worker will continue serve the task - Test for running jobs with:
1. Consume all data from UFS
2. Consume all data from Dshuttle
3. Partial in Dshuttle
User experience
- Show under-file-system for end user
- Provide e-2-e benchmark binxuan
P1
- Integrate with scheduler to preload data to Dshuttle
- API to show folder load percentage in DShuttle
- Provide a suitable way to preload data, and let user know the data is available in Dshuttle
- Provide a CLI to let user force sync meta data with UFS.
- Cache policy improvement
- Cross job dataLoader optimizer
- Display Dshuttle write type (write through/write back) to end user