Description
System information
- docker containers : tfra/dev_container:latest-tf2.15.1-python3.9 CUDA 12.3 CUDNN 8.9
- tensorflow-2.16.2 kears 3.8
- TensorFlow-Recommenders-Addons: 0.8.0
- Python version: 3.9.7
- Is GPU used? YES (NVIDIA H20)
above all are installed by "pip install tensorflow==2.16.2 tensorflow-recommenders-addons==0.8.0"
and reinstall horovod with "HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod"
Describe the bug
[1,0]:Traceback (most recent call last):
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 782, in
[1,0]: app.run(main)
[1,0]: File "/usr/local/lib/python3.9/site-packages/absl/app.py", line 308, in run
[1,0]: _run_main(main, args)
[1,0]: File "/usr/local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
[1,0]: sys.exit(main(argv))
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 770, in main
[1,0]: train()
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 631, in train
[1,0]: optimizer = de.DynamicEmbeddingOptimizer(optimizer, synchronous=True)
[1,0]: File "/usr/local/lib/python3.9/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py", line 859, in DynamicEmbeddingOptimizer
[1,0]: raise Exception(f"Optimizer type is not supported! got {str(type(self))}")
[1,0]:Exception: Optimizer type is not supported! got <class 'keras.src.optimizers.adam.Adam'>
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
Code to reproduce the issue
use this demo "https://github.com/tensorflow/recommenders-addons/blob/master/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py" without any change
Other info / logs
run this demo with bash -x start.sh and detail info is as follow:
- rm -rf ./export_dir
++ nvidia-smi --query-gpu=name --format=csv,noheader
++ wc -l - gpu_num=8
- export gpu_num
- horovodrun -np 8 python movielens-1m-keras-with-horovod.py --mode=train --model_dir=./model_dir --export_dir=./export_dir --steps_per_epoch=20000 --shuffle=True
I also run with "python movielens-1m-keras-with-horovod.py --mode=train --model_dir=./model_dir --export_dir=./export_dir --steps_per_epoch=20000 --shuffle=True" and got the same error.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.