-
Couldn't load subscription status.
- Fork 85
Description
Setup Medaka v0.8.1 to run with GPU, but it crashes consistently get this error during runtime Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR.
I'm seeing references to gpu_options.allow_growth = True online but not sure how that would be implemented with this code.
System:
Ubuntu 18.04
Cuda 10.1
tensorflow-gpu 1.12 (also tried 1.14 and 2.0.0-beta1)
2019-08-07 22:35:13.583084: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-07 22:35:13.584900: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/home/dmdrown/medaka/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.8.1', 'console_scripts', 'medaka')()
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/medaka-0.8.1-py3.6-linux-x86_64.egg/medaka/medaka.py", line 363, in main
args.func(args)
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/medaka-0.8.1-py3.6-linux-x86_64.egg/medaka/inference.py", line 462, in predict
tag_name=args.tag_name, tag_value=args.tag_value, tag_keep_missing=args.tag_keep_missing
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/medaka-0.8.1-py3.6-linux-x86_64.egg/medaka/inference.py", line 388, in run_prediction
class_probs = model.predict_on_batch(x_data)
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1294, in predict_on_batch
outputs = self.predict_function(inputs)
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/home/dmdrown/medaka/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Fail to find the dnn implementation.
[[{{node bidirectional/CudnnRNN_1}}]]
[[classify/truediv/_123]]
(1) Unknown: Fail to find the dnn implementation.
[[{{node bidirectional/CudnnRNN_1}}]]