-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dataframes] Call ray.init() on ray.dataframe import #1626
Conversation
you might want to try catch this just in case the user has already initialized ray in the code. Also, what would be the ideal user experience in the cluster setting where ray has already started? |
Wrapped it in a try/except now; thanks! The current experience is that it prints out the webui url. Is there some way to avoid this? By checking that ray has been initialized without a call to ray.init? |
Test PASSed. |
Test PASSed. |
I would actually advise against this. You might actually not want ray to take up all of the available CPU resources. To answer your question, you would probably want to do something like Backtracking a little, what's the purpose of this PR? Perhaps there is a more elegant fix. |
We are expecting our users to be traditional Pandas users, and one of the main drivers of our system is that you only have to change the import statement. I don't think most users will care about reducing their number of CPUs below the default. I don't see a problem with it for now, but I do agree that there are cases where users won't want access to all CPUs. We can add some docs for it later, but for now I think it's ok. |
I see. Do you expect that the traditional user not to use multiple nodes then? |
Right now we aren't supporting it, but eventually, yes. It's not a big deal for changes. We will put something more effective together later so customization is possible. |
I see - thanks for clarifying. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks @kunalgosar
We probably don't want the workers calling When a data frame remote function is imported by a worker, the worker will try to do |
Right now we're only supporting dataframes locally until we can get a |
Right, I'm referring to worker "processes" as opposed to worker "machines". |
Is there some way to check that only the main thread on the main process runs the init function? It seems like the worker.py init check only looks at the name of the thread that is calling it. |
Ensure ray is initialized when dataframes are imported.
The behavior here is interesting because the dataframe import is sent to all the workers. We need to ensure that the init is only run on the main thread.