RobertKirk

Robert Kirk RobertKirk

PhD student at @ucl-dark. Interested in understanding LLM fine-tuning, AI safety and (super)alignment.

47 followers · 9 following

Achievements

Highlights

Stars

ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Jupyter Notebook 462 41 Updated Jan 15, 2025

lucasmaystre / choix

Inference algorithms for models based on Luce's choice axiom

Jupyter Notebook 164 28 Updated Dec 4, 2024

steering-vectors / steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

Python 81 7 Updated Nov 21, 2024

RobertKirk / tinystories-wrappers

Code for the TinyStories experiments from "Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks".

Jupyter Notebook 5 1 Updated Dec 18, 2023

facebookresearch / rlfh-gen-div

This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity

Python 40 6 Updated Jan 19, 2024

jordwest / news-feed-eradicator

A browser extension that deletes your news feed and replaces it with a nice quote

TypeScript 1,234 287 Updated Nov 28, 2024

Farama-Foundation / chatarena

ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.

Python 1,402 135 Updated May 27, 2024

wofr06 / lesspipe

lesspipe - display more with less

Perl 497 51 Updated Jan 4, 2025

dbalatero / VimMode.spoon

Adds vim keybindings to all OS X inputs

Lua 712 33 Updated Apr 11, 2023

adamjermyn / toy_model_interpretability

Python 11 3 Updated Nov 21, 2022

TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models

Python 1,768 319 Updated Jan 22, 2025

huggingface / simulate

🎢 Creating and sharing simulation environments for embodied and synthetic data research

Python 190 13 Updated Oct 19, 2023

allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences

Python 2,257 192 Updated Mar 1, 2024

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,566 473 Updated Jan 8, 2024

wattlebird / ranking

Python 86 6 Updated Jun 1, 2023

jessevig / bertviz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Python 7,088 793 Updated Aug 24, 2023

TomFrederik / unseal

Mechanistic Interpretability for Transformer Models

Python 49 6 Updated Jun 1, 2022

openai / lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Python 1,269 163 Updated Jul 25, 2023

anthropics / hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,668 131 Updated Sep 19, 2023

ddebode / tmux-ram

Forked from RobertKirk/tmux-ram

Plug and play RAM percentage and icon indicator for Tmux

Shell 2 Updated Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robert Kirk RobertKirk

Achievements

Achievements

Highlights

Block or report RobertKirk

Stars

ndif-team / nnsight

lucasmaystre / choix

steering-vectors / steering-vectors

RobertKirk / tinystories-wrappers

facebookresearch / rlfh-gen-div

jordwest / news-feed-eradicator

Farama-Foundation / chatarena

wofr06 / lesspipe

dbalatero / VimMode.spoon

adamjermyn / toy_model_interpretability

TransformerLensOrg / TransformerLens

huggingface / simulate

allenai / RL4LMs

CarperAI / trlx

wattlebird / ranking

jessevig / bertviz

TomFrederik / unseal

openai / lm-human-preferences

anthropics / hh-rlhf

ddebode / tmux-ram

XuehaiPan / nvitop

matthewsot / docs-vim

huggingface / trl

facebookresearch / moolib

google-research / rliable

nicklashansen / dmcontrol-generalization-benchmark

google-research / arxiv-latex-cleaner

facebookresearch / minihack

ucl-dark / paired

yuchenlin / rebiber