Skip to content

CPython profiler broken with TensorFlow 2.17.0 code in Python 3.12.1+ #122029

Closed
@YingqinSU

Description

@YingqinSU

Bug report

Bug description:

Lately, I've been testing IA code on various Python interpreters and their corresponding profilers across multiple platforms. After multiple attempts, I've noticed that CPython profilers consistently fail to analyze the following code.

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=2)

val_loss, val_acc = model.evaluate(x_test, y_test)
print(val_loss, val_acc)

model.save("epic_num_reader.keras")

predictions = model.predict([x_test])

print(predictions)

I've been testing an IA code on different Python interpreters and their respective profilers across multiple platforms. Specifically, I've been working with a simple TensorFlow+Keras code that classifies number image inputs. Interestingly, I found that the code works well with IntelPython, which uses Python 3.9.19 as its latest version. When I tested the code on multiple versions of CPython, I noticed that the profiler works well and returns information for CPython versions less than 3.12.1. However, since CPython 3.12.1, the code crashes with an error.

2024-07-19 14:53:51.996897: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-19 14:53:51.997294: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-19 14:53:51.999400: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-19 14:53:52.005233: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-19 14:53:52.014843: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-19 14:53:52.017590: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-19 14:53:52.025009: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 615, in <module>
    main()
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 604, in main
    runctx(code, globs, None, options.outfile, options.sort)
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 101, in runctx
    return _Utils(Profile).runctx(statement, globals, locals, filename, sort)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 64, in runctx
    prof.runctx(statement, globals, locals)
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 424, in runctx
    exec(cmd, globals, locals)
  File "mnist_number.py", line 1, in <module>
    import tensorflow as tf
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/__init__.py", line 47, in <module>
    from tensorflow._api.v2 import __internal__
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/__init__.py", line 8, in <module>
    from tensorflow._api.v2.__internal__ import autograph
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/autograph/__init__.py", line 8, in <module>
    from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in <module>
    from tensorflow.python.autograph.utils import ag_logging
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/autograph/utils/__init__.py", line 17, in <module>
    from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/autograph/utils/context_managers.py", line 19, in <module>
    from tensorflow.python.framework import ops
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/framework/ops.py", line 50, in <module>
    from tensorflow.python.eager import context
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/eager/context.py", line 37, in <module>
    from tensorflow.python.eager import execute
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/eager/execute.py", line 21, in <module>
    from tensorflow.python.framework import dtypes
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/framework/dtypes.py", line 308, in <module>
    resource = DType(types_pb2.DT_RESOURCE)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/framework/dtypes.py", line 81, in __init__
    self._handle_data = handle_data
    ^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 209, in trace_dispatch_i
    if self.dispatch[event](self, frame, t):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.12.1/lib/python3.12/profile.py", line 293, in trace_dispatch_return
    assert frame is self.cur[-2].f_back, ("Bad return", self.cur[-3])
AssertionError: ('Bad return', ('/home/user/.pyenv/versions/3.12.1/lib/python3.12/site-packages/tensorflow/python/framework/dtypes.py', 1, '<module>'))

I ran the code on my laptop, which has a Tiger Lake architecture and no NVIDIA GPU or Tensor Cores. As I didn't recompile the TensorFlow library. Therefore, it's expected to see warning messages related to the lack of AVX512 and GPU acceleration.

Test Environnement:
WSL2 Ubuntu 20.04
- Python 3.12.4
Manjaro
- Python 3.9.19
- PYthon 3.10.14
- Python 3.11.9
- Python 3.12.0
- Python 3.12.1
- Python 3.12.2
- Python 3.12.3
- Python 3.12.4
Fedora
- Python 3.12.4

For Python versions prior to 3.12.1, I only received warning messages, and the profiler worked as expected. However, since upgrading to 3.12.1, I've started encountering AssertError issues. Interestingly, I've compared the profile.py file between versions 3.12.0 and 3.12.1, and they appear to be identical. It's possible that the introduction of PEP 695 in Python 3.12 is causing this occasional error.

While waiting for your response, I wish you a good day.

Aaron SU

CPython versions tested on:

3.9, 3.10, 3.11, 3.12

Operating systems tested on:

Linux, Windows

Linked PRs

Metadata

Metadata

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions