-
Notifications
You must be signed in to change notification settings - Fork 769
[UR] Bump UMF to v0.11.0-dev3 #17188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR] Bump UMF to v0.11.0-dev3 #17188
Conversation
|
045f182
to
3bb9708
Compare
3bb9708
to
a80ad8b
Compare
changing to draft, as |
Compute Benchmarks level_zero run (with params: ): |
Benchmarks level_zero run (): |
Compute Benchmarks level_zero run (with params: ): |
Benchmarks level_zero run (): Summary(Emphasized values are the best results) Performance change in benchmark groupsVelocity BenchRelative perf in group Other (8)
SYCL-BenchRelative perf in group Other (53)
llama.cpp benchRelative perf in group Other (6)
DetailsBenchmark details - environment, command...Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m Runtime_IndependentDAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_DAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_LocalMem_int32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 MicroBench_LocalMem_fp32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 Pattern_Reduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Pattern_Reduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 ScalarProduct_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 USM_Allocation_latency_fp32_deviceCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_hostCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_sharedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 VectorAddition_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 Polybench_2mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/2mm.csv --size=512 Polybench_3mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/3mm.csv --size=512 Polybench_AtaxCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Atax.csv --size=8192 Kmeans_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Kmeans.csv --size=700000000 MolecularDynamicsCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/MolecularDynamics.csv --size=8196 llama.cpp Prompt Processing Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf |
Compute Benchmarks level_zero_v2 run (with params: ): |
Benchmarks level_zero_v2 run (): Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (4)
Relative perf in group Other (6)
Relative perf in group SinKernelGraph 5 (3)
Relative perf in group SinKernelGraph 100 (3)
Velocity BenchRelative perf in group Other (8)
SYCL-BenchRelative perf in group Other (54)
llama.cpp benchRelative perf in group Other (6)
DetailsBenchmark details - environment, command...api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m Runtime_IndependentDAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_DAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_LocalMem_int32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 MicroBench_LocalMem_fp32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 Pattern_Reduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Pattern_Reduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 ScalarProduct_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 USM_Allocation_latency_fp32_deviceCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_hostCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_sharedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 VectorAddition_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 Polybench_2mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/2mm.csv --size=512 Polybench_3mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/3mm.csv --size=512 Polybench_AtaxCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Atax.csv --size=8192 Kmeans_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Kmeans.csv --size=700000000 LinearRegressionCoeff_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LinearRegressionCoeff.csv --size=1638400000 MolecularDynamicsCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/MolecularDynamics.csv --size=8196 llama.cpp Prompt Processing Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf |
From now on disjoint_pool is part of libumf, instead of being a separate library.
a80ad8b
to
2241df2
Compare
Compute Benchmarks level_zero run (with params: --compare baseline): |
Benchmarks level_zero run (--compare baseline): Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (4)
Relative perf in group Other (6)
Relative perf in group SinKernelGraph 5 (3)
Relative perf in group SinKernelGraph 100 (3)
Velocity BenchRelative perf in group Other (8)
SYCL-BenchRelative perf in group Other (54)
llama.cpp benchRelative perf in group Other (6)
DetailsBenchmark details - environment, command...api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:5Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=5 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=0 --withCopyOffload=1 --immediateAppendCmdList=0 graph_api_benchmark_l0 SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/llvm_bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_l0 --test=SinKernelGraph --csv --noHeaders --iterations=10000 --numKernels=100 --withGraphs=1 --withCopyOffload=1 --immediateAppendCmdList=0 Velocity-Bench HashtableCommand:/home/test-user/llvm_bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/llvm_bench_workdir/bitcracker/bitcracker -f /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/llvm_bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/llvm_bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/llvm_bench_workdir/QuickSilver/qs -i /home/test-user/llvm_bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/llvm_bench_workdir/sobel_filter/sobel_filter -i /home/test-user/llvm_bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Velocity-Bench dl-cifarCommand:/home/test-user/llvm_bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistCommand:/home/test-user/llvm_bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Environment Variables:NEOReadDebugKeys=1 Velocity-Bench svmCommand:/home/test-user/llvm_bench_workdir/svm/svm_sycl /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/test-user/llvm_bench_workdir/velocity-bench-repo/svm/SYCL/a.m Runtime_IndependentDAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_DAGTaskThroughput_SingleTaskCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_BasicParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/DAGTaskThroughput_multi.csv --size=327680 MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_StridedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_LocalMem_int32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 MicroBench_LocalMem_fp32_4096Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LocalMem_multi.csv --size=10240000 Pattern_Reduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Pattern_Reduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_Reduction_multi.csv --size=10240000 ScalarProduct_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/ScalarProduct_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int16Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 USM_Allocation_latency_fp32_deviceCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_hostCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_sharedCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/USM_Instr_Mix_multi.csv --size=8192 VectorAddition_int32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_int64Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/VectorAddition_multi.csv --size=102400000 Polybench_2mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/2mm.csv --size=512 Polybench_3mmCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/3mm.csv --size=512 Polybench_AtaxCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Atax.csv --size=8192 Kmeans_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/Kmeans.csv --size=700000000 LinearRegressionCoeff_fp32Command:/home/test-user/llvm_bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/LinearRegressionCoeff.csv --size=1638400000 MolecularDynamicsCommand:/home/test-user/llvm_bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/llvm_bench_workdir/MolecularDynamics.csv --size=8196 llama.cpp Prompt Processing Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 128Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 256Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 512Command:/home/test-user/llvm_bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/test-user/llvm_bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf |
Replaced with #17468 |
replaced with newer version |
From now on disjoint_pool is part of libumf, instead of being a separate library.