Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(COMPATIBILITY) [v1.54 Smooth Sampling] - unknown model architecture: 'orion' #638

Closed
SabinStargem opened this issue Jan 25, 2024 · 7 comments

Comments

@SabinStargem
Copy link

I was trying to use 14b Orion LongChat, but it threw an error. Presumably, it is simply an new architecture. Here you go.


Welcome to KoboldCpp - Version 1.54
For command line arguments, please refer to --help


Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: koboldcpp_cublas.dll

Namespace(model=None, model_param='C:/KoboldCPP/Models/14b Orion LongChat - q6k.gguf', port=5001, port_param=5001, host='', launch=True, lora=None, config=None, threads=31, blasthreads=31, highpriority=False, contextsize=32768, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=True, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=['normal', '0', 'mmq'], gpulayers=99, tensor_split=None, onready='', multiuser=1, remotetunnel=False, foreground=False, preloadstory=None, quiet=False, ssl=None)

Loading model: C:\KoboldCPP\Models\14b Orion LongChat - q6k.gguf
[Threads: 31, BlasThreads: 31, SmartContext: False, ContextShift: True]

The reported GGUF Arch is: orion


Identified as LLAMA model: (ver 6)
Attempting to Load...

Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
llama_model_loader: loaded meta data with 21 key-value pairs and 444 tensors from C:\KoboldCPP\Models\14b Orion LongChat - q6k.gLチレ兎rror loading model: unknown model architecture: 'orion'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "koboldcpp.py", line 2519, in
File "koboldcpp.py", line 2366, in main
File "koboldcpp.py", line 310, in load_model
OSError: exception: access violation reading 0x0000000000000064
[28744] Failed to execute script 'koboldcpp' due to unhandled exception!

[process exited with code 1 (0x00000001)]

@LostRuins
Copy link
Owner

I don't think the "Orion" architecture is supported, don't see any references to it. Where and how did you get this model?

@Tangweirui2021
Copy link

I have this problem,too. I convert this model manually following this guide.
https://github.com/OrionStarAI/Orion?tab=readme-ov-file#45-inference-by-llamacpp

@LostRuins
Copy link
Owner

Ah that makes sense. It relies on a pull request ggerganov#5118 that has not yet been merged, so it won't work until that happens.

@Tangweirui2021
Copy link

Tangweirui2021 commented Jan 26, 2024

You are right. I have tried to get the pr before converting the model. Actually, the converting will fail without this pr. The converted model do can run with this pr.

@SabinStargem
Copy link
Author

Here is the GGUF for Orion Longchat 14b.

https://huggingface.co/demonsu/orion-14b-longchat-gguf/tree/main

@LostRuins
Copy link
Owner

Should be fixed now in v1.57, can check?

@Tangweirui2021
Copy link

Yes,it seems work fine. Thanks for your job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants