pr-Mais

👾

Mais Alheraki pr-Mais

👾

Software engineer

584 followers · 45 following

@invertase
Dammam, Saudi Arabia
18:54 (UTC +03:00)
g.dev/mais
@pr_Mais
https://mais.codes

Achievements

x3 x3

Achievements

x3 x3

Organizations

Lists (1)

Sort

Flutter

1 repository

Starred repositories

voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Python 547 59 Updated May 9, 2024

danijar / dreamerv3

Mastering Diverse Domains through World Models

Python 1,431 235 Updated Dec 7, 2024

huggingface / trl

Train transformer language models with reinforcement learning.

Python 10,428 1,345 Updated Dec 23, 2024

facebookresearch / schedule_free

Schedule-Free Optimization in PyTorch

Python 2,022 69 Updated Dec 2, 2024

KhoomeiK / LlamaGym

Fine-tune LLM agents with online reinforcement learning

Python 1,025 46 Updated Mar 19, 2024

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Python 769 47 Updated Dec 26, 2024

eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Python 2,282 189 Updated Aug 11, 2024

fastapi / fastapi

FastAPI framework, high performance, easy to learn, fast to code, ready for production

Python 78,942 6,761 Updated Dec 25, 2024

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Python 9,397 1,734 Updated Dec 21, 2024

jordanbaird / Ice

Powerful menu bar manager for macOS

Swift 15,265 282 Updated Oct 29, 2024

ksaa-nlp / balsam-eval

Python 1 Updated Nov 25, 2024

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Python 7,734 671 Updated Dec 24, 2024

WooooDyy / LLM-Reverse-Curriculum-RL

Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" presented by Zhiheng Xi et al.

Python 80 5 Updated Feb 9, 2024

raghavc / LLM-RLHF-Tuning-with-PPO-and-DPO

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various c…

Python 126 11 Updated Mar 18, 2024