Skip to content

Comments

DQN refactor#1419

Merged
lanctot merged 12 commits intogoogle-deepmind:masterfrom
alexunderch:dqn_refactor
Jan 22, 2026
Merged

DQN refactor#1419
lanctot merged 12 commits intogoogle-deepmind:masterfrom
alexunderch:dqn_refactor

Conversation

@alexunderch
Copy link
Contributor

  • Refactored DQN versions with jax and torch that pass the tests
  • Simpler tree-based replay buffers
  • Changed the example name to dqn_ to improve visibility

Still TBD:

@alexunderch alexunderch changed the title Initial working versions DQN refactor Dec 21, 2025
@alexunderch
Copy link
Contributor Author

I decided to use different flax and jax versions to get access to the latest nnx api.

jax==0.8.1
flax==0.12.1
optax==0.2.6
orbax-checkpoint==0.11.31

@lanctot
Copy link
Collaborator

lanctot commented Jan 2, 2026

I decided to use different flax and jax versions to get access to the latest nnx api.

jax==0.8.1
flax==0.12.1
optax==0.2.6
orbax-checkpoint==0.11.31

Are each of them still supported on Python >= 3.10?

@alexunderch
Copy link
Contributor Author

Docs say that for >=3.10 current top versions are fully compatible...

@alexunderch
Copy link
Contributor Author

Yes, it now fails because of the versions.

@lanctot, what's better for you, to upgrade versions in a separate PR or with this/AlphaZero PR?

@lanctot
Copy link
Collaborator

lanctot commented Jan 3, 2026

Yes, it now fails because of the versions.

@lanctot, what's better for you, to upgrade versions in a separate PR or with this/AlphaZero PR?

I'm still not sure if updating versions will work. Sometimes the cascade of dependencies leads to a problem and it makes it impossible if the Python versions are too old. But let me try in a separate PR.

@lanctot
Copy link
Collaborator

lanctot commented Jan 3, 2026

jax==0.8.1

This one requires >= 3.11, from https://pypi.org/project/jax/0.8.1/

flax==0.12.1

This one too (>= 3.11), from https://pypi.org/project/flax/0.12.1/

optax==0.2.6

This one is ok (>= 3.10), https://pypi.org/project/optax/0.2.6/

orbax-checkpoint==0.11.31

This one is ok too (>= 3.10), from: https://pypi.org/project/orbax-checkpoint/0.11.31/

@lanctot
Copy link
Collaborator

lanctot commented Jan 3, 2026

Ok...... so what do we do?

Well, 3.10 is causing other problems too ( see #1424 ).

I just checked. Seems like Colab is now using 3.12. Normally I wait until EOL before removing support for a version, but that's quite far away (October '26). In this case it's causing multiple issues so I'd be happy to remove it early.

Give me a few weeks. I have to check with a few people (most notably the Kaggle Game Arena who are relying on a stable OpenSpiel for their environments). And I could have sworn that the version of Colab I used for the LLM imitation learning was 3.10, so maybe it's lower for the TPU kernels or maybe I was just mistaken.

@lanctot
Copy link
Collaborator

lanctot commented Jan 3, 2026

@alexunderch

The order I suggest is the following:

  • Wait for me to confirm that removing 3.10 support is ok
  • I will then remove 3.10 support and update github, will let you know when that's done
  • Then, let's update the AlphaZero PR, add it to the tests, and merge it since it's been around for a really long time and looking good
  • Then, we come back to this one and the other ones that use the newer nnx (like deep CFR etc.)

Sound good to you?

@lanctot
Copy link
Collaborator

lanctot commented Jan 3, 2026

Moving the discussion of removing Python 3.10 to a new issue: #1425

@alexunderch
Copy link
Contributor Author

Okay, I will do EVA as well. Haven't heard of the algorithm

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 14, 2026

DQN breakout (torch and jax)
image
DQN tic-tac-toe
image

For Kuhn poker, however, NFSP converges to different exploitabilty values, but it's for SGD:
image

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 14, 2026

@lanctot I took away eva test for pytorch because I am not actually sure that the implementation is OK. It may trigger a sampling from an empty buffer which is incompatible to my implementation of it. Not to go deeper into the algo, I decided to come back to it later if it's necessary.

Moreover, jax versions for DQN and NFSP aren't the fastest, to he honest. This morning, I've checked, they don't trigger additional recompilations, so there might be a bottleneck due to interchangable use of pure python and jax. I think we'll get there, I asked a Flax team member if we can do much about that. But we should match pytorch pefr sooner or later across all impls.

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 15, 2026

@lanctot for Kuhn poker, exploitabilty of both, jax and torch implementations, is below 0.06 (0.02-0.04). Also, I stripped it a little, reusing some of DQN modules.

also, for colabs, I replaced !pip (installs the dependencies in a default local env) with %pip (that should utilise the venv where the kernel is ran from). Could've been a problem if run the notebooks locally.

Technically, if the tests pass, the PR is in mergable condition. I don't think that I can marginally improve the performance without future flax development updates or restructuring the code, which I don't think that might be a good idea to do right away.

@alexunderch
Copy link
Contributor Author

I am sorry, I am not yet used run ALL tests before commits. Will be better

@lanctot
Copy link
Collaborator

lanctot commented Jan 15, 2026

I am sorry, I am not yet used run ALL tests before commits. Will be better

Don't worry about it -- I just have to press a button. But there might be a delay sometimes if I'm in meetings etc.

@lanctot lanctot added the imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! label Jan 17, 2026
@lanctot
Copy link
Collaborator

lanctot commented Jan 19, 2026

Hi @alexunderch,

Thanks for this PR, it's great!

However, I need to request one thing going forward, before we import we need to catch all the Goolge lint errors. This one has many and it's costing us too much time.

Specifically, can you apply Step 9 from this: https://github.com/google-deepmind/open_spiel/blob/master/docs/developer_guide.md#adding-a-game

I will highlight a few of the common issues so you can cross-check that the Python linter with the most recent pylintrc catches them.

@alexunderch
Copy link
Contributor Author

I see, I was worried about the linter comments, thank you for the correction. I update this and my other active PRs ASAP.

@alexunderch
Copy link
Contributor Author

@lanctot also sorry for reiterating, but can we add pylint and some precommit stuff with a separate PR, kind of finishing this effort #1071

@lanctot
Copy link
Collaborator

lanctot commented Jan 19, 2026

@lanctot also sorry for reiterating, but can we add pylint and some precommit stuff with a separate PR, kind of finishing this effort #1071

Yes I replied on that thread too. Very open to it at this point if someone gets a minimal version working. See the reply on that thread.

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 19, 2026

@lanctot I checked all the files I've changed with the official google's pylintrc

No errors now, only some formatting issues...

@lanctot
Copy link
Collaborator

lanctot commented Jan 19, 2026

@lanctot I checked all the files I've changed with the official google's pylintrc

No errors now, only some formatting issues...

I think you have to fix all the formatting issues too. They turn to errors preventing the import on our side.

I highlighted one example I saw in the commit.

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 19, 2026

Like trialling spaces and tabs, right?

@lanctot
Copy link
Collaborator

lanctot commented Jan 19, 2026

Like trialling spaces and tabs, right?

Those need to go too. I commented on an example above.

@alexunderch alexunderch requested a review from lanctot January 19, 2026 18:49
@alexunderch
Copy link
Contributor Author

sorry, I should read before I click various buttons

@alexunderch
Copy link
Contributor Author

@lanctot I think you can check internally, I fixed all trailing lines and argument inconsistensies with a more advanced linter, it now should match the doc standard (more or less)...

@lanctot
Copy link
Collaborator

lanctot commented Jan 21, 2026

@lanctot I think you can check internally, I fixed all trailing lines and argument inconsistensies with a more advanced linter, it now should match the doc standard (more or less)...

Still a lot of issues. It's ok, I'll do it for this one and we'll prioritize ensuring that #1071 catched everything and importing it going forward.

@alexunderch
Copy link
Contributor Author

alexunderch commented Jan 21, 2026

Wait, so you want to say that missing doc-strings also count? I think I can fix them.

D101 Missing docstring in public class
  --> open_spiel/python/jax/nfsp.py:47:7
   |
47 | class MODE(enum.Enum):
   |       ^^^^
48 |   BEST_RESPONSE = 0
49 |   AVERAGE_POLICY = 1
   |

Sorry, I just try to understand what should I do to make your life (and my in the end) easier... Because the only things I have ever done is black formatting because I was reliant on external workflows...

@lanctot lanctot added the merged internally The code is now submitted to our internal repo and will be merged in the next github sync. label Jan 22, 2026
@lanctot lanctot merged commit bbc9c1a into google-deepmind:master Jan 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants