Closed as not planned
Description
While running this example my program crashes with the following error:
Generating answer...
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: freq_base = 1000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 288.00 MiB
llama_new_context_with_model: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB
llama_new_context_with_model: CUDA_Host input buffer size = 18.57 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 217.00 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 1.50 MiB
llama_new_context_with_model: graph splits (measure): 2
Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
at LLama.LLamaContext.ApplyPenalty(Int32 logits_i, IEnumerable`1 lastTokens, Dictionary`2 logitBias, Int32 repeatLastTokensCount, Single repeatPenalty, Single alphaFrequency, Single alphaPresence, Boolean penalizeNL) in ~/LLamaSharp/LLama/LLamaContext.cs:line 361
at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext() in ~/LLamaSharp/LLama/LLamaStatelessExecutor.cs:line 109
at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
at ProgramHelper.AnswerQuestion(IKernelMemory memory, String question) in ~/MLBackend/ProgramHelper.cs:line 110
at Program.<Main>$(String[] args) in ~/MLBackend/Program.cs:line 32
at Program.<Main>(String[] args)
I don't believe this was an issue when I was using Mistral but started happening when I switched over to the embedding model specifically the F32 variant.