DQN refactor by alexunderch · Pull Request #1419 · google-deepmind/open_spiel

alexunderch · 2025-12-21T19:22:03Z

Refactored DQN versions with jax and torch that pass the tests
Simpler tree-based replay buffers
Changed the example name to dqn_ to improve visibility

Still TBD:

jax implementation is still very slow with plain python, see Significant performance difference of NNX relative to equinox google/flax#4045, will fix with different improvements, like caching
the only tests are on breakthrough, @lanctot are there any other representative games: tic-tac-toe, hanabi?..

alexunderch · 2025-12-30T10:30:45Z

I decided to use different flax and jax versions to get access to the latest nnx api.

jax==0.8.1
flax==0.12.1
optax==0.2.6
orbax-checkpoint==0.11.31

lanctot · 2026-01-02T11:26:31Z

I decided to use different flax and jax versions to get access to the latest nnx api.
jax==0.8.1
flax==0.12.1
optax==0.2.6
orbax-checkpoint==0.11.31

Are each of them still supported on Python >= 3.10?

alexunderch · 2026-01-02T11:31:04Z

Docs say that for >=3.10 current top versions are fully compatible...

alexunderch · 2026-01-02T11:45:09Z

Yes, it now fails because of the versions.

@lanctot, what's better for you, to upgrade versions in a separate PR or with this/AlphaZero PR?

lanctot · 2026-01-03T11:35:46Z

Yes, it now fails because of the versions.

@lanctot, what's better for you, to upgrade versions in a separate PR or with this/AlphaZero PR?

I'm still not sure if updating versions will work. Sometimes the cascade of dependencies leads to a problem and it makes it impossible if the Python versions are too old. But let me try in a separate PR.

lanctot · 2026-01-03T11:48:38Z

jax==0.8.1

This one requires >= 3.11, from https://pypi.org/project/jax/0.8.1/

flax==0.12.1

This one too (>= 3.11), from https://pypi.org/project/flax/0.12.1/

optax==0.2.6

This one is ok (>= 3.10), https://pypi.org/project/optax/0.2.6/

orbax-checkpoint==0.11.31

This one is ok too (>= 3.10), from: https://pypi.org/project/orbax-checkpoint/0.11.31/

lanctot · 2026-01-03T11:54:06Z

Ok...... so what do we do?

Well, 3.10 is causing other problems too ( see #1424 ).

I just checked. Seems like Colab is now using 3.12. Normally I wait until EOL before removing support for a version, but that's quite far away (October '26). In this case it's causing multiple issues so I'd be happy to remove it early.

Give me a few weeks. I have to check with a few people (most notably the Kaggle Game Arena who are relying on a stable OpenSpiel for their environments). And I could have sworn that the version of Colab I used for the LLM imitation learning was 3.10, so maybe it's lower for the TPU kernels or maybe I was just mistaken.

lanctot · 2026-01-03T11:59:06Z

@alexunderch

The order I suggest is the following:

Wait for me to confirm that removing 3.10 support is ok
I will then remove 3.10 support and update github, will let you know when that's done
Then, let's update the AlphaZero PR, add it to the tests, and merge it since it's been around for a really long time and looking good
Then, we come back to this one and the other ones that use the newer nnx (like deep CFR etc.)

Sound good to you?

lanctot · 2026-01-03T12:08:20Z

Moving the discussion of removing Python 3.10 to a new issue: #1425

alexunderch · 2026-01-10T18:18:00Z

Okay, I will do EVA as well. Haven't heard of the algorithm

alexunderch · 2026-01-14T10:01:09Z

DQN breakout (torch and jax)

DQN tic-tac-toe

For Kuhn poker, however, NFSP converges to different exploitabilty values, but it's for SGD:

alexunderch · 2026-01-14T10:04:02Z

@lanctot I took away eva test for pytorch because I am not actually sure that the implementation is OK. It may trigger a sampling from an empty buffer which is incompatible to my implementation of it. Not to go deeper into the algo, I decided to come back to it later if it's necessary.

Moreover, jax versions for DQN and NFSP aren't the fastest, to he honest. This morning, I've checked, they don't trigger additional recompilations, so there might be a bottleneck due to interchangable use of pure python and jax. I think we'll get there, I asked a Flax team member if we can do much about that. But we should match pytorch pefr sooner or later across all impls.

alexunderch · 2026-01-15T16:24:59Z

@lanctot for Kuhn poker, exploitabilty of both, jax and torch implementations, is below 0.06 (0.02-0.04). Also, I stripped it a little, reusing some of DQN modules.

also, for colabs, I replaced !pip (installs the dependencies in a default local env) with %pip (that should utilise the venv where the kernel is ran from). Could've been a problem if run the notebooks locally.

Technically, if the tests pass, the PR is in mergable condition. I don't think that I can marginally improve the performance without future flax development updates or restructuring the code, which I don't think that might be a good idea to do right away.

alexunderch · 2026-01-15T17:38:43Z

I am sorry, I am not yet used run ALL tests before commits. Will be better

lanctot · 2026-01-15T17:44:23Z

I am sorry, I am not yet used run ALL tests before commits. Will be better

Don't worry about it -- I just have to press a button. But there might be a delay sometimes if I'm in meetings etc.

lanctot · 2026-01-19T13:30:55Z

Hi @alexunderch,

Thanks for this PR, it's great!

However, I need to request one thing going forward, before we import we need to catch all the Goolge lint errors. This one has many and it's costing us too much time.

Specifically, can you apply Step 9 from this: https://github.com/google-deepmind/open_spiel/blob/master/docs/developer_guide.md#adding-a-game

I will highlight a few of the common issues so you can cross-check that the Python linter with the most recent pylintrc catches them.

alexunderch · 2026-01-19T13:38:33Z

I see, I was worried about the linter comments, thank you for the correction. I update this and my other active PRs ASAP.

open_spiel/python/jax/dqn.py

open_spiel/python/jax/boltzmann_dqn.py

alexunderch · 2026-01-19T13:44:58Z

@lanctot also sorry for reiterating, but can we add pylint and some precommit stuff with a separate PR, kind of finishing this effort #1071

lanctot · 2026-01-19T14:09:29Z

@lanctot also sorry for reiterating, but can we add pylint and some precommit stuff with a separate PR, kind of finishing this effort #1071

Yes I replied on that thread too. Very open to it at this point if someone gets a minimal version working. See the reply on that thread.

alexunderch · 2026-01-19T16:12:08Z

@lanctot I checked all the files I've changed with the official google's pylintrc

No errors now, only some formatting issues...

lanctot · 2026-01-19T16:44:02Z

@lanctot I checked all the files I've changed with the official google's pylintrc

No errors now, only some formatting issues...

I think you have to fix all the formatting issues too. They turn to errors preventing the import on our side.

I highlighted one example I saw in the commit.

alexunderch · 2026-01-19T16:50:02Z

Like trialling spaces and tabs, right?

open_spiel/python/jax/boltzmann_dqn.py

lanctot · 2026-01-19T16:51:36Z

Like trialling spaces and tabs, right?

Those need to go too. I commented on an example above.

alexunderch · 2026-01-19T18:50:00Z

sorry, I should read before I click various buttons

alexunderch · 2026-01-21T18:27:47Z

@lanctot I think you can check internally, I fixed all trailing lines and argument inconsistensies with a more advanced linter, it now should match the doc standard (more or less)...

lanctot · 2026-01-21T19:00:11Z

@lanctot I think you can check internally, I fixed all trailing lines and argument inconsistensies with a more advanced linter, it now should match the doc standard (more or less)...

Still a lot of issues. It's ok, I'll do it for this one and we'll prioritize ensuring that #1071 catched everything and importing it going forward.

alexunderch · 2026-01-21T19:01:47Z

Wait, so you want to say that missing doc-strings also count? I think I can fix them.

D101 Missing docstring in public class
  --> open_spiel/python/jax/nfsp.py:47:7
   |
47 | class MODE(enum.Enum):
   |       ^^^^
48 |   BEST_RESPONSE = 0
49 |   AVERAGE_POLICY = 1
   |

Sorry, I just try to understand what should I do to make your life (and my in the end) easier... Because the only things I have ever done is black formatting because I was reliant on external workflows...

Initial working versions

b4b057d

alexunderch changed the title ~~Initial working versions~~ DQN refactor Dec 21, 2025

jax NFSP and DQN refactors (slow)

9a48c6e

lanctot mentioned this pull request Jan 3, 2026

Remove Python 3.10 support, update abseil version + change uses of MutexLock #1424

Merged

lanctot mentioned this pull request Jan 3, 2026

Proposal: Remove Python 3.10 support earlier than EOL #1425

Closed

alexunderch mentioned this pull request Jan 9, 2026

Python examples and games (what should we do) #1434

Closed

alexunderch added 2 commits January 10, 2026 19:42

DQN + NFSP + Boltzmann + colabs

764163e

Typos

181d0ca

alexunderch added 3 commits January 14, 2026 10:33

Merge remote-tracking branch 'upstream/master' into dqn_refactor

205d46b

Releasable versions of DQN+NFSP

943cc75

Fixed some tests

245ec4d

Fixed NFSP convergence

904a1be

Refactoring typo

5cfcc2f

lanctot added the imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! label Jan 17, 2026

lanctot requested changes Jan 19, 2026

View reviewed changes

open_spiel/python/jax/dqn.py Outdated Show resolved Hide resolved

open_spiel/python/jax/dqn.py Outdated Show resolved Hide resolved

open_spiel/python/jax/dqn.py Outdated Show resolved Hide resolved

open_spiel/python/jax/boltzmann_dqn.py Outdated Show resolved Hide resolved

Fixed linter errors

709aea3

lanctot reviewed Jan 19, 2026

View reviewed changes

open_spiel/python/jax/boltzmann_dqn.py Outdated Show resolved Hide resolved

Linter fixes

b284272

alexunderch requested a review from lanctot January 19, 2026 18:49

More docstring fixes

1ac5a10

lanctot added the merged internally The code is now submitted to our internal repo and will be merged in the next github sync. label Jan 22, 2026

lanctot merged commit bbc9c1a into google-deepmind:master Jan 22, 2026
10 checks passed

Comments

Conversation

alexunderch commented Dec 21, 2025

Uh oh!

alexunderch commented Dec 30, 2025

Uh oh!

lanctot commented Jan 2, 2026

Uh oh!

alexunderch commented Jan 2, 2026

Uh oh!

alexunderch commented Jan 2, 2026

Uh oh!

lanctot commented Jan 3, 2026

Uh oh!

lanctot commented Jan 3, 2026

Uh oh!

lanctot commented Jan 3, 2026

Uh oh!

lanctot commented Jan 3, 2026

Uh oh!

lanctot commented Jan 3, 2026

Uh oh!

alexunderch commented Jan 10, 2026

Uh oh!

alexunderch commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Jan 15, 2026

Uh oh!

lanctot commented Jan 15, 2026

Uh oh!

lanctot commented Jan 19, 2026

Uh oh!

alexunderch commented Jan 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexunderch commented Jan 19, 2026

Uh oh!

lanctot commented Jan 19, 2026

Uh oh!

alexunderch commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lanctot commented Jan 19, 2026

Uh oh!

alexunderch commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lanctot commented Jan 19, 2026

Uh oh!

alexunderch commented Jan 19, 2026

Uh oh!

alexunderch commented Jan 21, 2026

Uh oh!

lanctot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexunderch commented Jan 14, 2026 •

edited

Loading

alexunderch commented Jan 14, 2026 •

edited

Loading

alexunderch commented Jan 15, 2026 •

edited

Loading

alexunderch commented Jan 19, 2026 •

edited

Loading

alexunderch commented Jan 19, 2026 •

edited

Loading

lanctot commented Jan 21, 2026 •

edited

Loading

alexunderch commented Jan 21, 2026 •

edited

Loading