Skip to content

Using convolutional sparse autoencoders to detect concepts in chess-playing agent. Mind control ability implemented and experimented with.

Notifications You must be signed in to change notification settings

viktorvesely/sparse-autoencoder-chess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leela 0 chess Mind control using Convolutional Sparse autoencoder

Motivation

The alignment of AI behaviour with human behaviour is an ongoing challenge in the machine learning field. Whilst technique that use external factors (extra loss term or input/output manipulation) exist, they treat the ML models as black box. Sparse autoencoders try to explain the neural activations of the networks (thus, staying very faithful to the actual reasons) by making the model's neurons monosemantic (single neuron has a single meaning). Such neurons can be used for 2 purposes:

  1. Explanations - if a particular mono-semantic neuron is active we know that the network is thinking about that concept
  2. Behaviour steering - to actually align the network behaviour with human behaviour we can artificially increase or decrease the activation of the mono-semantic neuron, which either surpress unwanted behaviour or encourages wanted traits of the model.

Explanation

Very short explanation of the Convolutional sparse autoencoder (CSAE) is provided here. For full explanation please take a look at our report.

CSAE is a neural network with two parts: encoder and decoder. The job of encoder is two take neural activations of Leela chess zero (Lc0) which are polysemantic (single neuron participates in multiple concepts) and project them to a larger (10x) latent space where each neuron is mono-semantic (the activations become sparse). The job of decoder is to undo this operation and restore the original activations (necessary for training and behaviour steering). The CSAE is trained to reconstruct the original activations faithfully and to make the latent space as sparse as possible (L1 activation penalty).

Data/weights Download link

Everytime we reference that the user needs to download something it will be located in this google drive link https://drive.google.com/drive/folders/1M56h5VsIQI1l4wj905DPdt54SbwtxWmT?usp=sharing

Project overview

Files that are not mentioned here are in the reposity due to legacy reasons and are not important for running the project.

  • ./altered_chess_trajectories.html Output of ./lc0/mind_control.py, open in any web-browser it will display the normal uninfluenced trajectory (8 moves) and the altered trajectory side to side. It will display 10 random games.

  • ./cnn_habrok.sh Launching script for CSAE training on Habrok

  • ./cnnsae_simple.py Architecture of the CSAE, this is the final architecture we used in our report.

  • ./cnnsae_small.py [Obsolete] Failed version of CSAE which treats every channel almost_ independently (very similar to depth-wise separable CNNs)

  • ./cnnsae.py [Obsolete] Failed version of CSAE which was implemented based on @article{luo2017convolutional, title={Convolutional sparse autoencoders for image classification}, author={Luo, Wei and Li, Jun and Yang, Jian and Xu, Wei and Zhang, Jian}, journal={IEEE transactions on neural networks and learning systems}, volume={29}, number={7}, pages={3289--3294}, year={2017}, publisher={IEEE} }

  • ./csae_train.py Used for trainining CSAE

  • ./mau.py (mau = Most active unit) used for generation of chess games which have similar concept vector (latent vector)

  • ./mau.ipynb Used for analysis of the most active units (acquired from test database)

  • ./numpy_loader.py Used to lazily load training/test data

  • ./output.html Output of mau.py, open in a browser of your choice, it will display around 15 games with similar concept vector

  • ./stock.exe Stockfish engine used for evaluating positions. DOES NOT WORK ON LINUX.

  • ./tensorboard.py A bit misleading name. Used for training/experiments folder management

  • ./data/altered_trajectories will contain trajectories with behaviour steering (need to launch ./lc0/mind_control.py)

  • ./data/fens Training fens. You need to download this and create the folder (download from Gdrive).

  • ./data/fens_test Testing fens. You need to download this and create the folder (download from Gdrive).

  • ./data/our_activations Training LC0 activations. You need to download this and create the folder (download from Gdrive).

  • ./data/our_test Test LC0 activations. You need to download this and create the folder (download from Gdrive).

  • ./data/test_latents CSAE Latent spaces generasted from test Lc0 activations. You need to download this and create the folder (download from Gdrive).

  • ./data/test_trajectories Normal uninfluenced trajectories, generated by Lc0 on test fens with 8 ply depth. You need to download this and create the folder (download from Gdrive).

  • ./data/mau.parquet Database of 10 most active units (in the CSAE latent space) generated from test FENs. You need to download this file from Gdrive.

  • ./experiments/CNN_SAE_factor10 Contains the best checkpoint for our CSAE. Create the experiments folder and download the CNN_SAE_factor10 from gdrive

  • ./lc0/ Separate sub-project just for manipulation of Lc0.

  • ./lc0/activations.py Used to generate Lc0 activations and to generate train/test fens.

  • ./lc0/cnnsae_simple.py Duplicate CSAE architecture necessary for the Lc0 sub-package.

  • ./lc0/mind_control.py Used for live Lc0 behaviour steering, generates altered_trajectories

  • ./lc0/t79.onnx Lc0 network (architecture and weights).

  • ./viktor/lichess_db_standard_rated_2024-08.pgn.zst lichess database of chess games, used in generation training and testing fens. Can be downloaded here https://database.lichess.org (2024 August). Download it and place it to the folder with the same name.

Requirements

This project only supports windows not linux (none of are group members have linux). The only thing wich works also on linux is training and inference of CSAE (but nothing related to live manipulation inference on Lc0). There are 2 reasons for this: stockhish.exe is build for windows only (this one should be easy to replace), and the lczerolens package required very specific manual code editing fix (which will be explained here and works only on windows).

Read the previous section (project overview) and follow the instructions carefully as placing wrong file to wrong place will result in error.

Unfortunately, due to version conflicts, this project has 2 packages: Sub-package ./lc0/ and the rest. Both of them require separate installations.

Main package (CSAE)

  1. You have to use python 3.12 or above (due to its superior type hinting system,)
  2. Create virtual environment
  3. Activate the environment
  4. Install Pytorch according to your hardware version (this projects supports cpu and cuda).
  5. Install the ./requirements.txt using pip install

Lc0 package

  1. You have to use python 3.11 for this lczerolens package to work (it doesn't work on 3.12, fault of the package author)
  2. Create a separate virtual environment.
  3. Activate the environment
  4. Run (with the active environment) ./lc0/install.bat
  5. Install Pytorch according to your hardware version (this projects supports cpu and cuda).
  6. Install the ./lc0/requirements.txt using pip install
  7. Now comes the tricky part. Navigate to .\lc0\lens\Lib\site-packages\onnx2torch\utils\safe_shape_inference.py
  8. Replace this code that creates a temporary file
with tempfile.NamedTemporaryFile(dir=Path(onnx_model_or_path).parent) as tmp_model_file:
    #     return _shape_inference_by_model_path(onnx_model_or_path, output_path=tmp_model_file.name, **kwargs)

with this code

with tempfile.NamedTemporaryFile(dir=Path(onnx_model_or_path).parent, delete=False) as tmp_model_file:
        tmp_model_path = tmp_model_file.name
        tmp_model_file.close()  # Release lock on Windows

        try:
            return _shape_inference_by_model_path(onnx_model_or_path, output_path=tmp_model_path, **kwargs)
        finally:
            Path(tmp_model_path).unlink(missing_ok=True)

otherwise the onnx model Lc0 won't load (fault of the package author).

Training

  1. Activate the main virtual environment.
  2. Launch training using
python csae_train.py --name [experiment_name] --lsfactor [int] --sparsity [float]

The lsfactor determines how much will the latent space expand with respect to the input space (default 10). sparsity determines the weight of the sparsity loss term (default 5.0).

The trained model will appear in ./experiments/[timestamp]_[experiment_name]. Training takes around 8 hours on Habrok.

Test

Evaluation metrics, latent space activations and MAU dataframe can be acquired by running

python test_inference.py --checkpoint [timestamp]_[experiment_name]

where you need to provide the checkpoint folder (leave empty for the best CSAE).

Similar games generation

To generate a batch (10-20) of similar chess games (acccording to the CSAE latent space vector) run

python mau.py

It will generate ./output.htm which you need to open in any browser. The file will contain few games and a move that Lc0 would play at that game (the game on the left, the game + move on the right). Stockfish evaluation of the game is also provided on the bottom. The games correspond to a concept vector displayed on top. Rerun the script to get new group.

Mind control

Once you have acquired some interesting concept vectors (I will provide a few bellow). Swith to the separate Lc0 virtual environment. Go to ./lc0/mind_control.py, search for "get_latent_manipulator(", set the indices and the desired value the feature vector should be set to (factor). One possible example:

get_latent_manipulator(105472, 57536, 63744, 61049, factor=2)

This will set the latent vectors indices 105472, 57536, etc to 2. Afterwards launch

python ./lc0/mind_control.py

This will go over the all 50K of test FENs and will force the concept vector on Lc0 and will let the altered Lc0 play 8 plies (both for white and black).

To view random altered trajectories side to side with the unaltered. Run the following script.

python ./gen_altered.py

it will generate ./altered_chess_trajectories.html with 10 random pairs of trajectories (open in any browser). Re-run the script to get another 10 pairs.

Plagiarism disclaimer

We based this project on our previous deep-learning project. However, we have changed almost everything since then (the precise differentiation points are in the report), most notably:

  1. We have used completely different architecture for the CSAE
  2. We have created a concept interpretability pipeline
  3. We have created a mind-control pipeline (with a completely novel an better lc0 model)

Other libraries

The project was inspired by paper

@article{poupart2024contrastive,
  title={Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents},
  author={Poupart, Yoann},
  journal={arXiv preprint arXiv:2406.04028},
  year={2024}
}

Our report will include the explicit novelty our project brings. The entire code in the repo was produced by us (since the project is doing a different taks) except the small part of trajectory sampling (located in ./lc0/activations.py) which we took insipration from the said paper.

LLMs

LLM was used only once to create the HTML code templates in ./mau.py and ./gen_altered.py We have also used llm to look for mistakes in our code but honestly, I don't remember where exactly was this.

About

Using convolutional sparse autoencoders to detect concepts in chess-playing agent. Mind control ability implemented and experimented with.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •