FIR-714: Updated the SDK Release r0.1.3 #5

atrivedi-tsavoritesi · 2025-06-05T05:06:38Z

Made following changes to support r0.1.3 SDK as follows

Replace txe_blob with txe_function
mlir_ciface_* APIs have changed and now need _host keyword for the APIs to be used
Add appropriate libraries or .o to be inclused for the system to be loaded.

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin# ./run_platform_test.sh
Check if tnApcMgr is running; if it is not, uncomment below line and execute the run_platform_test.sh script.
Running on v0.1.1.tsv30_05_24_2025
[2018-03-09 13:09:07.705099] 274:275 [ info] :: </proj/work/dmohapatra/ubitest/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
[2018-03-09 13:09:08.747] [info] [llama.cpp:56] Execution time: 1023 ms
[2018-03-09 13:09:08.755944] 24054:24054 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:154: Model executed successfully. Validating result...
[2018-03-09 13:09:08.789347] 24054:24054 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:193: PASS [relative err=0.000000, relTol=1.000000e-05]
[2018-03-09 13:09:08.814639] 274:275 [ info] :: </proj/work/dmohapatra/ubitest/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.

Profiling Results (LlamaForCausalLM_Random):

Calls Total(ms) T/call Self(ms) Function

243 498.000 2.049 0.000 [44%] RuntimeHostShim::awaitCommandListCompletion
84 200.874 2.391 200.874 └─ [18%] [ txe_blob_1 ]
32 76.613 2.394 76.613 └─ [ 7%] [ txe_blob_6 ]
16 55.484 3.468 55.484 └─ [ 5%] [ txe_blob_12 ]
8 31.886 3.986 31.886 └─ [ 3%] [ txe_blob_10 ]
8 31.344 3.918 31.344 └─ [ 3%] [ txe_blob_7 ]
8 31.151 3.894 31.151 └─ [ 3%] [ txe_blob_8 ]
8 27.685 3.461 27.685 └─ [ 2%] [ txe_blob_9 ]
17 25.976 1.528 25.976 └─ [ 2%] [ txe_blob_2 ]
17 25.918 1.525 25.918 └─ [ 2%] [ txe_blob_5 ]
17 25.910 1.524 25.910 └─ [ 2%] [ txe_blob_3 ]
17 25.807 1.518 25.807 └─ [ 2%] [ txe_blob_4 ]
8 23.993 2.999 23.993 └─ [ 2%] [ txe_blob_11 ]
3 6.017 2.006 6.017 └─ [ 1%] [ txe_blob_0 ]
1 35.000 35.000 35.000 [ 3%] RuntimeHostShim::finalize
188 34.000 0.181 34.000 [ 3%] RuntimeHostShim::copy
1 16.000 16.000 16.000 [ 1%] RuntimeHostShim::initialize
13 1.000 0.077 1.000 [ 0%] RuntimeHostShim::loadBlob
573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate
573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList
922 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList
13 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
33 0.000 0.000 0.000 [ 0%] RuntimeHostShim::stridedCopy

3532 1121.000 0.317 1121.000 [100%] TOTAL

register_backend: registered backend Tsavorite (1 devices)
register_device: registered device Tsavorite (txe)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (CPU)
load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-tsavorite.so
load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-cpu.so
build: 5472 (88c19dd) with gcc (GCC) 13.3.0 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any

TXE Device MEMORY Summary llama_model_load_from_file_impl: llama_model_loader: llama_model_loader: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - kv 15: llama_model_loader: - kv 16: llama_model_loader: - kv 17: llama_model_loader: - kv 18: llama_model_loader: - kv 19: llama_model_loader: - kv 20: llama_model_loader: - kv 21: llama_model_loader: - kv 22: llama_model_loader: - kv 23: llama_model_loader: - kv 24: llama_model_loader: - kv 25: llama_model_loader: - kv 26: llama_model_loader: - kv 27: llama_model_loader: - kv 28: llama_model_loader: - kv 29: llama_model_loader: - kv 30: llama_model_loader: - kv 31: llama_model_loader: - kv 32: llama_model_loader: - kv 33: llama_model_loader: - kv 34: llama_model_loader: - kv 35: llama_model_loader: - kv 36: llama_model_loader: - kv 37: llama_model_loader: - type print_info: file format print_info: file type = all F32
print_info: file size load: special_eos_id load: special tokens cache size = 6
load: token to piece print_info: arch print_info: vocab_only = 0
print_info: n_ctx_train = 2048
print_info: n_embd = 2048
print_info: n_layer = 22
print_info: n_head = 32
print_info: n_head_kv = 4
print_info: n_rot = 64
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 8
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps print_info: f_norm_rms_eps print_info: f_clamp_kqv print_info: f_max_alibi_bias print_info: f_logit_scale print_info: f_attn_scale print_info: n_ff = 5632
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling print_info: freq_base_train print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 2048
print_info: rope_finetuned print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 1B
print_info: model params print_info: general.name print_info: vocab type = SPM
print_info: n_vocab print_info: n_merges = 0
print_info: BOS token print_info: EOS token print_info: EOT token print_info: UNK token = 0 ''
print_info: PAD token print_info: LF token print_info: EOG token = 2 ''
print_info: EOG token print_info: max token length = 48
load_tensors: loading total 134217728 and free 134217728
using device Tsavorite (txe) - 128 MiB free
loaded meta data with 38 key-value pairs and 201 tensors from /tsi/akapoor/ggml/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf (version GGUF V3 (latest))
Dumping metadata keys/values. Note: KV overrides do not apply in this output.
general.architecture str = llama
general.type str = model
general.name str = Tiny Llama v0.3 FP32
general.size_label str = 1.1B
general.license str = apache-2.0
general.dataset.count u32 = 3
general.dataset.0.name str = SlimPajama 627B
general.dataset.0.organization str = Cerebras
general.dataset.0.repo_url str = https://huggingface.co/cerebras/SlimP...
general.dataset.1.name str = Starcoderdata
general.dataset.1.organization str = Bigcode
general.dataset.1.repo_url str = https://huggingface.co/bigcode/starco...
general.dataset.2.name str = Oasst_Top1_2023 08 25
general.dataset.2.version str = 08-25
general.dataset.2.organization str = OpenAssistant
general.dataset.2.repo_url str = https://huggingface.co/OpenAssistant/...
general.languages arr[str,1] = ["en"]
llama.block_count u32 = 22
llama.context_length u32 = 2048
llama.embedding_length u32 = 2048
llama.feed_forward_length u32 = 5632
llama.attention.head_count u32 = 32
llama.attention.head_count_kv u32 = 4
llama.rope.freq_base f32 = 10000.000000
llama.attention.layer_norm_rms_epsilon f32 = 0.000010
general.file_type u32 = 0
llama.vocab_size u32 = 32003
llama.rope.dimension_count u32 = 64
tokenizer.ggml.model str = llama
tokenizer.ggml.pre str = default
tokenizer.ggml.tokens arr[str,32003] = ["", "~~", "~~", "<0x00>", "<...
tokenizer.ggml.scores arr[f32,32003] = [-1000.000000, -1000.000000, -1000.00...
tokenizer.ggml.token_type arr[i32,32003] = [3, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
tokenizer.ggml.bos_token_id u32 = 1
tokenizer.ggml.eos_token_id u32 = 2
tokenizer.ggml.unknown_token_id u32 = 0
tokenizer.ggml.padding_token_id u32 = 32000
general.quantization_version u32 = 2
f32: 201 tensors
= GGUF V3 (latest)
= 4.10 GiB (32.00 BPW)
is not in special_eog_ids - the tokenizer config may be incorrect
cache size = 0.1684 MB
= llama
= 0.0e+00
= 1.0e-05
= 0.0e+00
= 0.0e+00
= 0.0e+00
= 0.0e+00
= linear
= 10000.0
= unknown
= 1.10 B
= Tiny Llama v0.3 FP32
= 32003
= 1 ''
= 2 ''
= 32002 '<|im_end|>'
= 32000 '[PAD]'
= 13 '<0x0A>'
= 32002 '<|im_end|>'
model tensors, this can take a while... (mmap = true)

TXE Device MEMORY Summary total 134217728 and free 134217728
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/23 layers to GPU
load_tensors: CPU_Mapped model buffer size = 4196.40 MiB
..........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 12288
llama_context: n_ctx_per_seq = 12288
llama_context: n_batch = 1024
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (12288) > n_ctx_train (2048) -- possible training context overflow
[2018-03-09 13:09:30.379734] 274:275 [ info] :: </proj/work/dmohapatra/ubitest/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
llama_context: CPU output buffer size = 0.12 MiB
llama_kv_cache_unified: CPU KV buffer size = 264.00 MiB
llama_kv_cache_unified: size = 264.00 MiB ( 12288 cells, 22 layers, 1 seqs), K (f16): 132.00 MiB, V (f16): 132.00 MiB
ggml_backend_tsavorite_buffer_type_alloc_buffer is called from llama data Loader

ANoop Allocating memory from tsi_alloc with size 15732736

Allocating memory from tsi_alloc with size 15732736 starting memory 0xfffe77cb3080

Address of Newly Created BUffer 0xfffe77cb3080 and size 15732736
llama_context: tsavorite compute buffer size = 15.00 MiB
llama_context: CPU compute buffer size = 808.01 MiB
llama_context: graph nodes = 798
llama_context: graph splits = 223 (with bs=512), 137 (with bs=1)
common_init_from_params: setting dry_penalty_last_n to ctx_size = 12288
main: llama threadpool init, n_threads = 4
main: model was trained on only 2048 context tokens (12288 specified)

sampler seed: 3917328361
sampler params:
repeat_last_n = 5, repeat_penalty = 1.500, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 12288
top_k = 50, top_p = 0.900, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.000
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 12288, n_batch = 1024, n_predict = 10, n_keep = 1

my cat’s name is Luna.
I’m a software

llama_perf_sampler_print: sampling time = 231.69 ms / 16 runs ( 14.48 ms per token, 69.06 tokens per second)
llama_perf_context_print: load time = 104768.16 ms
llama_perf_context_print: prompt eval time = 82625.01 ms / 6 tokens (13770.84 ms per token, 0.07 tokens per second)
llama_perf_context_print: eval time = 565691.32 ms / 9 runs (62854.59 ms per token, 0.02 tokens per second)
llama_perf_context_print: total time = 670759.03 ms / 15 tokens

TXE_ADD Operation, total tensor: 10 Number of Kernel Call: 320 Number of tensor got spilt: 10 Min Num of Elem 2048 Max Num of Elem 2048

TXE_SUB Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_MULT Operation, total tensor: 450 Number of Kernel Call: 21280 Number of tensor got spilt: 450 Min Num of Elem 2048 Max Num of Elem 12288

TXE_DIV Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_SQRT Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_NEG Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_ABS Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_SIN Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_SIGMOID Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0

TXE_SILU Operation, total tensor: 220 Number of Kernel Call: 28600 Number of tensor got spilt: 220 Min Num of Elem 5632 Max Num of Elem 33792
[2018-03-09 13:20:21.984970] 274:275 [ info] :: </proj/work/dmohapatra/ubitest/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.

GGML Tsavorite Profiling Results:

Calls Total(ms) T/call Self(ms) Function

50200 53240.000 1.061 0.000 [ 8%] RuntimeHostShim::awaitCommandListCompletion
28600 45154.256 1.579 45154.256 └─ [ 7%] [ txe_silu ]
21280 32959.676 1.549 32959.676 └─ [ 5%] [ txe_mult ]
320 498.163 1.557 498.163 └─ [ 0%] [ txe_add ]
50200 0.386 0.000 0.386 └─ [ 0%] TXE 0 Idle
1 127.000 127.000 18.000 [ 0%] GGML Tsavorite
1 109.000 109.000 109.000 └─ [ 0%] RuntimeHostShim::initialize
1 64.000 64.000 64.000 [ 0%] RuntimeHostShim::finalize
50200 46.000 0.001 46.000 [ 0%] RuntimeHostShim::loadBlob
50200 7.000 0.000 7.000 [ 0%] RuntimeHostShim::createCommandList
50200 5.000 0.000 5.000 [ 0%] RuntimeHostShim::finalizeCommandList
50200 3.000 0.000 3.000 [ 0%] RuntimeHostShim::addCommandToList
50201 2.000 0.000 2.000 [ 0%] RuntimeHostShim::allocate
172200 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
50200 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
50200 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
50200 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate

624003 651694.000 1.044651694.000 [100%] TOTAL

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin#
Terminating...
Thanks for using picocom
atrivedi@fpga2:~$

as follows /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_abs.o: in function `txe_abs_host': LLVMDialectModule:(.text+0x18): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x24): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x30): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x3c): undefined reference to `tsi_create_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x58): undefined reference to `tsi_load_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x64): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x70): undefined reference to `tsi_launch_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x7c): undefined reference to `tsi_add_command_to_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x84): undefined reference to `tsi_finalize_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x8c): undefined reference to `tsi_wait' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x94): undefined reference to `tsi_unload_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0xa0): undefined reference to `tsi_dealloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_add.o: in function `txe_add_host': LLVMDialectModule:(.text+0x20): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x2c): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x38): undefined reference to `tsi_shmem_handle_from_ptr'

runtime/utils/lib/ path

akapoor3518

lgtm

mmankal · 2025-06-05T16:23:59Z

lgtm

FIR-714: Updated the SDK Release r0.1.3

…gml-org#16038) Initalizing RESERVED_NAME in is_reserved_name() is not thread safe and leads to corrupted memory when used from multiple threads as can be seen in the asan trace below. This fixes the initialization to make it thread-safe. #0 0x000100abd018 in std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void*>*>, bool> std::__1::__hash_table<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__emplace_unique_key_args<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) __hash_table:1565 #1 0x000100ab0320 in SchemaConverter::visit(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) json-schema-to-grammar.cpp:802 #2 0x000100aafc48 in std::__1::__function::__func<build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&)::$_2, std::__1::allocator<build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&)::$_2>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>::operator()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&) function.h:319 #3 0x000100a2c938 in std::__1::__function::__func<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0::operator()(common_grammar_builder const&) const::'lambda'(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&), std::__1::allocator<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0::operator()(common_grammar_builder const&) const::'lambda'(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>, void (nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>::operator()(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&) function.h:319 #4 0x000100a139f8 in foreach_function(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&, std::__1::function<void (nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)> const&) chat.cpp:762 #5 0x000100a2a7f4 in std::__1::__function::__func<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0, std::__1::allocator<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0>, void (common_grammar_builder const&)>::operator()(common_grammar_builder const&) function.h:319 #6 0x000100aa98f4 in build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&) json-schema-to-grammar.cpp:982 #7 0x0001009c9314 in common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool) chat.cpp:1110 #8 0x0001009b8afc in common_chat_templates_apply_jinja(common_chat_templates const*, common_chat_templates_inputs const&) chat.cpp:1992 #9 0x0001009b533c in common_chat_templates_apply(common_chat_templates const*, common_chat_templates_inputs const&) chat.cpp:2074 #10 0x000100810120 in llamacpp_apply_chat_template+0x724 (predict_oai-98384e17fb94e863:arm64+0x100090120) ... ==45482==Register values: x[0] = 0x00006020004147f8 x[1] = 0x00006080000013c8 x[2] = 0x0000000000000000 x[3] = 0x0000604006289738 x[4] = 0x0000000000000002 x[5] = 0x0000000000000001 x[6] = 0x04034000004b4000 x[7] = 0x0000000000000001 x[8] = 0xbebebebebebebebe x[9] = 0x17d7d7d7d7d7d7d7 x[10] = 0x00000c04000828ff x[11] = 0x0000000000000001 x[12] = 0x000000002018d383 x[13] = 0x0000000000000000 x[14] = 0xfa0000000000fafa x[15] = 0x000010700001ffff x[16] = 0x000000019dc012c0 x[17] = 0x00000001021284f8 x[18] = 0x0000000000000000 x[19] = 0x00000001700acdc0 x[20] = 0x0000000000000002 x[21] = 0x000000002018d384 x[22] = 0x16dd16fd2e731151 x[23] = 0x0000007000020000 x[24] = 0x0000000100c69c08 x[25] = 0x0000000100c69c20 x[26] = 0x00006080000013c7 x[27] = 0x0000000100c69c00 x[28] = 0x00000001700acd60 fp = 0x00000001700aceb0 lr = 0x0000000100abce30 sp = 0x00000001700acd60 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV __hash_table:1565 in std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void*>*>, bool> std::__1::__hash_table<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__emplace_unique_key_args<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Thread T5 created by T0 here: #0 0x0001020b99d4 in pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x359d4) #1 0x000100873910 in std::sys::pal::unix::thread::Thread::new::h77254fdd87a28e05+0x118 (predict_oai-98384e17fb94e863:arm64+0x1000f3910) #2 0x0001007c7a1c in test::run_test::haeb3c2bcd5ed6cf6+0x76c (predict_oai-98384e17fb94e863:arm64+0x100047a1c) #3 0x0001007aedb0 in test::console::run_tests_console::he9d142d704f3a986+0x149c (predict_oai-98384e17fb94e863:arm64+0x10002edb0) #4 0x0001007c5758 in test::test_main::hf86a5e20735245b9+0x118 (predict_oai-98384e17fb94e863:arm64+0x100045758) #5 0x0001007c5da0 in test::test_main_static::h61ee9c8fd30abca0+0x54 (predict_oai-98384e17fb94e863:arm64+0x100045da0) ... ==45482==ABORTING

atrivedi-tsavoritesi added 5 commits June 3, 2025 11:04

@FIR-714: Updated SDK version to r0.1.3 version

9459c0c

@FIR-714: Updated TLIBS to be passed to llama_build function

c18585c

@FIR-714: Updated to use 1.30 external dependencies

47ceff0

@FIR-714: Fixed the issues of not finding fpga libs using

cea50af

runtime/utils/lib/ path

atrivedi-tsavoritesi requested review from DashingR, LewisLui777, akapoor3518, brnorris03, dineshReddy6381, dmpatra, gkethamallax, mmankal, reach2shaunak and sh1r1sh June 5, 2025 05:07

atrivedi-tsavoritesi self-assigned this Jun 5, 2025

akapoor3518 approved these changes Jun 5, 2025

View reviewed changes

atrivedi-tsavoritesi merged commit a7b7e46 into master Jun 5, 2025

atrivedi-tsavoritesi deleted the FIR-714 branch June 5, 2025 14:41

atrivedi-tsavoritesi added a commit that referenced this pull request Jun 5, 2025

Merge pull request #5 from tsisw/FIR-714

1dab233

FIR-714: Updated the SDK Release r0.1.3

atrivedi-tsavoritesi added a commit that referenced this pull request Jun 5, 2025

Merge pull request #5 from tsisw/FIR-714

78e8749

FIR-714: Updated the SDK Release r0.1.3

atrivedi-tsavoritesi added a commit that referenced this pull request Jun 6, 2025

Merge pull request #5 from tsisw/FIR-714

d4484c5

FIR-714: Updated the SDK Release r0.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIR-714: Updated the SDK Release r0.1.3 #5

FIR-714: Updated the SDK Release r0.1.3 #5

Uh oh!

atrivedi-tsavoritesi commented Jun 5, 2025

Uh oh!

akapoor3518 left a comment

Uh oh!

mmankal commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FIR-714: Updated the SDK Release r0.1.3 #5

FIR-714: Updated the SDK Release r0.1.3 #5

Uh oh!

Conversation

atrivedi-tsavoritesi commented Jun 5, 2025

Profiling Results (LlamaForCausalLM_Random):

Calls Total(ms) T/call Self(ms) Function

3532 1121.000 0.317 1121.000 [100%] TOTAL

GGML Tsavorite Profiling Results:

Calls Total(ms) T/call Self(ms) Function

624003 651694.000 1.044651694.000 [100%] TOTAL

Uh oh!

akapoor3518 left a comment

Choose a reason for hiding this comment

Uh oh!

mmankal commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants