AI Language Model Comparison Project

This project aims to compare different AI language models for answering questions in Japanese. The comparison is based on the similarity of the model-generated answers to the reference answers using embedding-based similarity metrics.

Project Overview

The main goal of this project is to evaluate the performance of various AI language models in providing accurate answers to specific questions. The models being compared are:

elyza/Llama-3-ELYZA-JP-8B
rinna/japanese-gpt2-medium
line-corporation/japanese-large-lm-3.6b

The project involves:

Generating answers using different AI language models.
Calculating the similarity between the generated answers and the reference answers.
Comparing the performance of the models based on the similarity scores.

Dependencies

The project requires the following Python packages:

torch
transformers
numpy

You can install these dependencies using the requirements.txt file.

pip install -r requirements.txt

Usage

Generating `requirements.txt`

To generate the requirements.txt file from your current environment, use:

pip freeze > requirements.txt

Running the Project

Set up your environment:
- Ensure you have Python installed.
- Create and activate a virtual environment (optional but recommended).
```
python -m venv venv
source venv/bin/activate  # For Unix or MacOS
# or
.\venv\Scripts\activate  # For Windows
```
Install dependencies:
```
pip install -r requirements.txt
```
Run the main script:
- Make sure you have your Hugging Face API token.
- Replace your_huggingface_api_token with your actual token in the script.
```
python main.py
```

Script Structure

main.py: Main script to run the model comparison.
embedding_utils.py: Utility functions to calculate embedding-based similarity.
model_a.py, model_b.py, model_c.py, model_d.py: Scripts to generate answers from different models.
requirements.txt: List of dependencies.

Generating Answers and Calculating Similarity

The main.py script performs the following steps:

Loads each model and tokenizer.
Generates answers for a set of predefined questions.
Calculates the similarity of the generated answers to the reference answers.
Outputs the results in a markdown table format.

Example Output

The example output includes a markdown table with questions, model-generated answers, and their similarity scores.

Model Answers and Similarity Scores

Model: elyza/Llama-3-ELYZA-JP-8B

Question	Answer	Similarity
日本の首都はどこですか？	という質問に「東京」と答えるのと同じです。...	0.7837
富士山の高さは？	富士山の高さは、3,776 メートルです。...	0.8031

Model: rinna/japanese-gpt2-medium

Question	Answer	Similarity
日本の首都はどこですか？	_ q&a ページ _ q&a _ サポート・お問い合わせ _ ソニー _ ...	0.5881
富士山の高さは？	富士山は、日本の国土のほぼ中央に位置し、日本百名山の一つに数えられる山です。富士山は、日本の国土のほぼ中央に位置し、日本百名山の一つに数えられる山です。...	0.6866

Model: line-corporation/japanese-large-lm-3.6b

Question	Answer	Similarity
日本の首都はどこですか？	日本の首都はどこですか？商品本文: K18YG イエローゴールドパール真珠...	0.6993
富士山の高さは？	富士山の高さは？商品本文: K18YG イエローゴールドパール真珠...	0.7422

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Language Model Comparison Project

Project Overview

Dependencies

Usage

Generating `requirements.txt`

Running the Project

Script Structure

Generating Answers and Calculating Similarity

Example Output

Model Answers and Similarity Scores

Model: elyza/Llama-3-ELYZA-JP-8B

Model: rinna/japanese-gpt2-medium

Model: line-corporation/japanese-large-lm-3.6b

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
embedding_utils.py		embedding_utils.py
main.py		main.py
model_a.py		model_a.py
model_b.py		model_b.py
model_c.py		model_c.py
model_d.py		model_d.py
model_responses.md		model_responses.md
requirements.txt		requirements.txt

eepson123tw/llm-benchmark

Folders and files

Latest commit

History

Repository files navigation

AI Language Model Comparison Project

Project Overview

Dependencies

Usage

Generating requirements.txt

Running the Project

Script Structure

Generating Answers and Calculating Similarity

Example Output

Model Answers and Similarity Scores

Model: elyza/Llama-3-ELYZA-JP-8B

Model: rinna/japanese-gpt2-medium

Model: line-corporation/japanese-large-lm-3.6b

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Generating `requirements.txt`

Packages