Description
I was trying to use 14b Orion LongChat, but it threw an error. Presumably, it is simply an new architecture. Here you go.
Welcome to KoboldCpp - Version 1.54
For command line arguments, please refer to --help
Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: koboldcpp_cublas.dll
Namespace(model=None, model_param='C:/KoboldCPP/Models/14b Orion LongChat - q6k.gguf', port=5001, port_param=5001, host='', launch=True, lora=None, config=None, threads=31, blasthreads=31, highpriority=False, contextsize=32768, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=True, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=['normal', '0', 'mmq'], gpulayers=99, tensor_split=None, onready='', multiuser=1, remotetunnel=False, foreground=False, preloadstory=None, quiet=False, ssl=None)
Loading model: C:\KoboldCPP\Models\14b Orion LongChat - q6k.gguf
[Threads: 31, BlasThreads: 31, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: orion
Identified as LLAMA model: (ver 6)
Attempting to Load...
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
llama_model_loader: loaded meta data with 21 key-value pairs and 444 tensors from C:\KoboldCPP\Models\14b Orion LongChat - q6k.gLチレ兎rror loading model: unknown model architecture: 'orion'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "koboldcpp.py", line 2519, in
File "koboldcpp.py", line 2366, in main
File "koboldcpp.py", line 310, in load_model
OSError: exception: access violation reading 0x0000000000000064
[28744] Failed to execute script 'koboldcpp' due to unhandled exception!
[process exited with code 1 (0x00000001)]