Skip to content

Conversation

@harshaljanjani
Copy link
Contributor

This PR adds support for the Mistral-7B-Instruct-v0.3 model.
Mistral Integration -- Support for Mistral's architecture. Updated convert_hf.py to handle these weight configurations during conversion.
For more information: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

Testing Environment (For Repro):

  • GCP VM: t2a-standard-16 (16 vCPU ARM64, 64GB RAM), Debian 12 ARM, Ampere Altra.

Weights generation:

python tools/convert_hf.py mistralai/Mistral-7B-Instruct-v0.3 output/mistral_test
Converting mistralai/Mistral-7B-Instruct-v0.3 to INT8...
Loading checkpoint shards: 100%
[info] Calculated attention_head_dim: 128
█ Processing layer 32/32...
Quantization Summary:
MSE - Mean: 4.18e-08, Max: 3.86e-07, Median: 2.31e-08, Min: 1.00e-09
SNR - Mean: 27.0dB, Max: 37.2dB, Median: 27.6dB, Min: 13.5dB
CosSim - Mean: 0.997932, Max: 0.997932, Median: 0.999134, Min: 0.978131
Processed 322 INT8 tensors, 65 FP16 tensors (0 SNR<20.0dB fallbacks)
  Copied SentencePiece model to tokenizer.model
  Saved SentencePiece vocabulary (ID\ttoken format)
  Note: Could not load full tokenizer config: 'latin-1' codec can't encode character '\u68a6' in position 7: ordinal not in range(256)

Results: Note that Mistral-7B-Instruct-v0.3 is not guaranteed to support tool calling.

╔════════════════════════════════╗
║ Running Engine Tests           ║
╚════════════════════════════════╝
╔════════════════════════════════╗
║   STREAMING & FOLLOW-UP TEST   ║
╚════════════════════════════════╝
[Turn 1]
User: My name is Henry Ndubuaku, how are you?
Assistant:  I am a machine learning model, specifically designed to help answer questions and provide information. I don't have feelings or personal experiences, but I strive to assist you as best as possible.
[Results - Turn 1]
├─ TTFT: 1.06 sec
├─ Prefill: 27.5 toks/sec
├─ Decode: 9.8 toks/sec
└─ RAM: 6840.4 MB
[Turn 2]
User: What is my name?
Assistant:  Your name is Henry Ndubuaku. Is there anything specific you would like me to help with?<0x0A><0x0A>I would be happy to assist! Let me know what task or question needs solving, and I will do my best to provide you with clear guidance or accurate information.<0x0A><0x0A>For instance, if you need help with a math problem, researching a topic for school or work projects (as long it's not plagiarism), brainstorming ideas for writing stories and essays (but keep in mind the final version should come from your own creativity), understanding concepts in subjects such as science, maths, literature etc., learning a new language verbally or reading pronunciation practice sentences aloud in many languages (English is the only one that supports text-to-speech conversion directly but I can give sentences and their phonetic spelling to learn by yourself) , finding recipes or cooking tips are just some examples of what I can do.<0x0A><0x0A>Don't forget to ask follow up questions if something doesn’t look quite right! Collaborating on clarifying information will help you get the best possible answer!
[Results - Turn 2]
├─ TTFT: 1.68 sec
├─ Prefill: 45.7 toks/sec
├─ Decode: 8.8 toks/sec
└─ RAM: 6962.9 MB
✓ PASS │ streaming                  
╔════════════════════════════════╗
║       100 CONTEXT TEST         ║
╚════════════════════════════════╝
Response: 1) Quantum Mechanics is a fundamental theory in physics that describes the behavior of matter and energy at extremely small scales — on the order of atoms and smaller.[1](https://en.wikipedia.org/wiki%3AQuantum_mechanics)<0x0A>> - The mathematical formulation of quantum mechanics includes mathematical entities from vector spaces, matrix theory, differential equations (such as Schrödinger's equation), probability distributions (wave functions), operators (Hamiltonians [→ stopped]
[Results]
├─ TTFT: 3.81 sec
├─ Prefill: 62.5 toks/sec
├─ Decode: 8.6 toks/sec
└─ RAM: 6967.5 MB
└─ Status: PASSED ✓
✓ PASS │ 100_context              
╔════════════════════════════════╗
║        1K CONTEXT TEST         ║
╚════════════════════════════════╝
Response:  These numerical values represent different digits of Pi to varying degrees of precision. For example: 0.00000 represents the first decimal place (integer 1), 3.14159 represents several significant figures up to its fifth decimal place, and so on for subsequent data points with more precise decimals of Pi displayed as the numbers increase. This list goes up to digit number forty-nine in the sequence and could continue ad infinitum with even greater [→ stopped]
[Results]
├─ TTFT: 26.77 sec
├─ Prefill: 45.4 toks/sec
├─ Decode: 5.6 toks/sec
└─ RAM: 9209.4 MB
└─ Status: PASSED ✓
✓ PASS │ 1k_context

cc: @HenryNdubuaku

Signed-off-by: harshaljanjani <harshaljanjani@gmail.com>
Signed-off-by: harshaljanjani <harshaljanjani@gmail.com>
Signed-off-by: harshaljanjani <harshaljanjani@gmail.com>
Signed-off-by: harshaljanjani <harshaljanjani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant