Skip to content
View raymondng76's full-sized avatar
🥺
🥺
  • AI Singapore
  • Singapore

Block or report raymondng76

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Dataset and Benchmark

20 repositories

Code and data for XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence

C 68 3 Updated Jan 21, 2025

Large-scale multi-document summarization dataset and code

Python 282 52 Updated May 8, 2023

Annotated Enron Subject Line Corpus (AESLC)

25 9 Updated Feb 2, 2023

Materials related to our Sinn und Bedeutung 23 paper

R 38 11 Updated May 28, 2020

Evaluating Cross-lingual Sentence Representations

449 44 Updated Aug 30, 2021

NLP Datasets for Indonesian

Python 112 13 Updated Feb 11, 2023

Dataset Catalogue Homepage for Indonesian Languages

JavaScript 7 8 Updated Feb 19, 2024

Measuring Massive Multitask Language Understanding | ICLR 2021

Python 1,330 100 Updated May 28, 2023

Facebook Low Resource (FLoRes) MT Benchmark

Python 722 125 Updated Nov 20, 2023

Expanding natural instructions

Python 980 191 Updated Dec 11, 2023
117 9 Updated Dec 22, 2023

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,694 137 Updated Sep 19, 2023

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Python 69 27 Updated Jan 26, 2022

Library Genesis (libgen) db dumps mirror on ipfs

HTML 47 1 Updated Jul 2, 2024

Large datasets for conversational AI

Python 1,325 173 Updated Nov 16, 2019

Data preparation code for Amber 7B LLM

Python 85 10 Updated May 10, 2024

Bhinneka Korpus: A Collection of Multilingual Parallel Datasets for 5 Indonesian Local Languages

1 Updated Dec 21, 2023

Proxy server to bypass Cloudflare protection

Python 8,808 752 Updated Mar 4, 2025

A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.

176 4 Updated Jul 31, 2024