Skip to content

demo"movielens-1m-keras-with-horovo.py" run failed with "Exception: Optimizer type is not supported! got <class 'keras.src.optimizers.adam.Adam'>" #487

Open
@kingofstorm

Description

@kingofstorm

System information

  • docker containers : tfra/dev_container:latest-tf2.15.1-python3.9 CUDA 12.3 CUDNN 8.9
  • tensorflow-2.16.2 kears 3.8
  • TensorFlow-Recommenders-Addons: 0.8.0
  • Python version: 3.9.7
  • Is GPU used? YES (NVIDIA H20)

above all are installed by "pip install tensorflow==2.16.2 tensorflow-recommenders-addons==0.8.0"
and reinstall horovod with "HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod"
Describe the bug

[1,0]:Traceback (most recent call last):
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 782, in
[1,0]: app.run(main)
[1,0]: File "/usr/local/lib/python3.9/site-packages/absl/app.py", line 308, in run
[1,0]: _run_main(main, args)
[1,0]: File "/usr/local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
[1,0]: sys.exit(main(argv))
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 770, in main
[1,0]: train()
[1,0]: File "/mnt/mfs/mfs6/cvr_tf_fm/code_complie/recommenders-addons-0.8.0/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py", line 631, in train
[1,0]: optimizer = de.DynamicEmbeddingOptimizer(optimizer, synchronous=True)
[1,0]: File "/usr/local/lib/python3.9/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py", line 859, in DynamicEmbeddingOptimizer
[1,0]: raise Exception(f"Optimizer type is not supported! got {str(type(self))}")
[1,0]:Exception: Optimizer type is not supported! got <class 'keras.src.optimizers.adam.Adam'>

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

Code to reproduce the issue
use this demo "https://github.com/tensorflow/recommenders-addons/blob/master/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py" without any change

Other info / logs
run this demo with bash -x start.sh and detail info is as follow:

  • rm -rf ./export_dir
    ++ nvidia-smi --query-gpu=name --format=csv,noheader
    ++ wc -l
  • gpu_num=8
  • export gpu_num
  • horovodrun -np 8 python movielens-1m-keras-with-horovod.py --mode=train --model_dir=./model_dir --export_dir=./export_dir --steps_per_epoch=20000 --shuffle=True

I also run with "python movielens-1m-keras-with-horovod.py --mode=train --model_dir=./model_dir --export_dir=./export_dir --steps_per_epoch=20000 --shuffle=True" and got the same error.

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions