You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same here. Also, the model on huggingface has different names for w1/w2/w3 weights (up_proj, down_proj, gate_proj), and it's not documented anywhere. I guess the model from HF works with the Transformers library, but both this and MLX impl need fixing to use it.
When you say different ones, you mean just add a bunch of random weights? Are these normalized between 0 and 1?
By the way, this thread seems to offer at least a proposed set of params, if anyone gets here and wants an answer: https://github.com/vikhyat/mixtral-inference/issues/3
FileNotFoundError: [Errno 2] No such file or directory: 'Mistral-7B-Instruct-v0.2/params.json'
and I replace the params.json to config.json but still lack of several params such like kv_cache dim etc.
The text was updated successfully, but these errors were encountered: