This repository collects all relevant resources about interpretability in LLMs
-
Updated
Nov 1, 2024
This repository collects all relevant resources about interpretability in LLMs
Implementation of the stacked denoising autoencoder in Tensorflow
Pytorch implementations of various types of autoencoders
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Tensorflow Examples
Experiments with Adversarial Autoencoders using Keras
Sparse Auto Encoder and regular MNIST classification with mini batch's
Repository of Deep Propensity Network - Sparse Autoencoder(DPN-SA) to calculate propensity score using sparse autoencoder
Multi-Layer Sparse Autoencoders
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"
Collection of autoencoder models in Tensorflow
Explore visualization tools for understanding Transformer-based large language models (LLMs)
Implemented semi-supervised learning for digit recognition using Sparse Autoencoder
A resource repository of sparse autoencoders for large language models
exploration WYSIWYG editor
Neural Network Architcture | ISI Kolkata
This repository contains Python codes for Autoenncoder, Sparse-autoencoder, HMM, Expectation-Maximization, Sum-product Algorithm, ANN, Disparity map, PCA.
Interpret and control dense embedding via sparse autoencoder.
Sparse Autoencoder based on the Unsupervised Feature Learning and Deep Learning tutorial from the Stanford University
Folder contains implementation of Multi layer feed forward networks, Autoencoders, Sparse Autoencoders and many..
Add a description, image, and links to the sparse-autoencoder topic page so that developers can more easily learn about it.
To associate your repository with the sparse-autoencoder topic, visit your repo's landing page and select "manage topics."