Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 17.10
- Ray installed from (source or binary): Binary
- Ray version: 0.40
- Python version: Python 3.6.5
- Exact command to reproduce:
Describe the problem
ray/worker.py:compute_function_id checks whether main has the file attribute to decide whether to include the source code in the function hash. When starting a Python executable using python -m mypkg.executable
, dependencies in mypkg can be imported before main is loaded, resulting in this check failing (and source code being excluded) in the driver. When it comes to running the code, however, main exists. This can lead to function IDs differing between the driver and worker in nested parallelism settings.
The error I get is similar to that of #1446 so it may have a common cause. After diagnosing this I think I've become convinced that it is best for executables to be in the top-level and not nested in the package structure; I'm posting this mostly as a warning to others as it is far from obvious what caused this error.
I think we can make this more robust by using a try-except statement rather than guessing when the source code is present; I'll submit a PR.
Source code / logs
This bug is only triggered with a particular package structure, so this needs a few files. First, create a directory foo containing:
# __init__.py
from foo import bar
# main.py
import ray
from foo import bar
if __name__ == '__main__':
ray.init(redirect_worker_output=True)
bar.run()
# bar.py
import ray
@ray.remote
def f(x):
return x
def g(x):
return f.remote(x)
@ray.remote
def h(x):
return ray.get(g(x))
def run():
print(ray.get(h.remote(42)))
Running this produces:
python -m foo.main
Process STDOUT and STDERR is being redirected to /tmp/raylogs/.
Waiting for redis server at 127.0.0.1:39255 to respond...
Waiting for redis server at 127.0.0.1:20998 to respond...
Starting local scheduler with the following resources: {'CPU': 12, 'GPU': 2}.
======================================================================
View the web UI at http://localhost:8889/notebooks/ray_ui70409.ipynb?token=d6d894f5459ef32b1986d4cdf348574421d1787c71960fef
======================================================================
Traceback (most recent call last):
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/adam/dev/ray/foo/main.py", line 7, in <module>
bar.run()
File "/home/adam/dev/ray/foo/bar.py", line 15, in run
print(ray.get(h.remote(42)))
Remote function foo.bar.h failed with:
Traceback (most recent call last):
File "/home/adam/dev/ray/foo/bar.py", line 12, in h
return ray.get(g(x))
File "/home/adam/dev/ray/foo/bar.py", line 8, in g
return f.remote(x)
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/site-packages/ray/worker.py", line 2602, in func_call
return _submit(args=args, kwargs=kwargs)
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/site-packages/ray/worker.py", line 2622, in _submit
resources=resources)
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/site-packages/ray/worker.py", line 2430, in _submit_task
return global_worker.submit_task(function_id, *args, **kwargs)
File "/home/adam/bin/anaconda3/envs/mypirl/lib/python3.6/site-packages/ray/worker.py", line 580, in submit_task
self.task_driver_id.id()][function_id.id()]
KeyError: b'\x8d\x04\x11\xd9\xea\xd4\xbe\xad\x90\x0c\x0b\x10\x0f^\x0cq\xcc\xe3\xb1\x07'