-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-1081] Make the folder /data/ as abstract data folder #105
Comments
Hi @arita37 Absolute paths are already supported by all Please have a look here to see how to define such datasets in tha Working with big datasets typically involves working with cloud solutions. For that case, |
Hello, This is more a generic design perspective (not just the path), Let me precise my point :
|
Hello @arita37 Thank you for feedback! Let me address the comments.
But non of the points above seem to relate to the original question: abstract If you have massive files and complex ways of accessing them, you have a few options:
Does it help somehow? It would be nice to know more about your use-cases and requirements. |
Hello,
Thanks for replying
Question
We are evaluating vs MLFlow
which a pipeline for ML. Thats, why having the data folder (doc example ) with the code was strange from prod view.
1) this is not clear when packaged in Docker
which folder are kept and which one removed ?
2) model storage and serialization is not clear.
Same for pre-processing with states (ie clustering, )
How this is handle ? Or do we have to do it manually ?
In theory, you should have created a folder for model storage (ie same style than data).
|
Hi @arita37, I'll leave @Flid to handle the bulk of this conversation but I'll just make a few comments about MLflow.
|
|
Thanks for your prompt reply.
1) noted,
Although from software Engi. Perspective
data/ should be abstract (ie virtual)...
Suppose the code references
some csv into data/ folder : so it runs fine on local....
But, when it is transfered to docker , it failed
because data/ is not transfered.
Framework should try to enforce better “best practice”.... (at least showning the docs).
|
|
What is missing for more production :
|
|
|
Hi @arita37, I want to see if I can create some actionable items from this issue and then close it with the appropriate tasks. So I have a query about your 3rd point: And to your 4th point: |
@arita37 It would be great to get more input from you when you have time. For now, I'll close this issue but I'll be happy to re-open it when you're ready. |
Description
Usually, we have to deal with very large datasets (ie >100 Go) and text type,
storing in the folder /data/ is not possible.
The text was updated successfully, but these errors were encountered: