This project provides a framework for generating and evaluating datasets for natural language processing (NLP) mapping tasks using large language models (LLMs). It supports tasks such as string case conversion and RNA-to-protein translation, and is designed for extensibility and reproducibility.
- src/: Main source code directory
generate_data.py
: Generates datasets for various mapping tasks and saves them in JSONL format.run_eval.py
: Evaluates LLMs on the generated datasets and prints results.utils/llm.py
: Wrapper for interacting with Google GenAI models.utils/
: Contains utility modules for mapping and string comparison.evaluation.py
: Defines mapping classes and evaluation logic.data/
: Contains prompt templates and generated datasets.
- requirements.txt: Minimal required Python packages for running the project.
- environment.yaml: Conda environment specification (optional, for full-featured development).
- run_subtask.py: Unified CLI for generating data and running evaluation.
Install dependencies with pip:
pip install -r requirements.txt
Or create the full environment with conda:
conda env create -f environment.yaml
conda activate nlp-final
Generate datasets for all tasks (default):
python src/generate_data.py
Or specify tasks and size:
python src/generate_data.py --tasks lowercase rna --size 50 --output ./src/data/examples.jsonl
Run evaluation on a generated dataset:
python src/run_eval.py
Or use the unified CLI:
python run_subtask.py --model pro --size 100 --tasks lowercase rna --eval
- 🔑 You must provide a valid API key for Google GenAI in
src/key.secret
. - 📄 Generated datasets are saved in JSONL format in the specified output path.
- 📝 Evaluation prints results with timestamps and confidence scores.
- ⚡ Model-specific behavior:
- For
flash
(Gemini 2.5 Flash), thinking is disabled (thinking_budget=0
). - For
pro
(Gemini 2.5 Pro), minimum thinking is enabled (thinking_budget=128
).
- For
- ➕ Add new mapping tasks by implementing new classes in
evaluation.py
and updatinggenerate_data.py
. - 🛠️ Add new model aliases in
utils/llm.py
as needed.
For further details, see comments in the source files.