Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : refactor graph build code #3837

Merged
merged 21 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
8b2420d
llama : factor out ggml-alloc from graph graph build functions
ggerganov Oct 28, 2023
5946d98
metal : disable kernel load log
ggerganov Oct 28, 2023
38aca9e
llama : factor out tensor offloading outside the build call (wip)
ggerganov Oct 28, 2023
83d2c43
llama : offload rest of the models
ggerganov Oct 28, 2023
3af8771
llama : update offload log messages to print node index
ggerganov Oct 28, 2023
51c4f9e
llama : comments
ggerganov Oct 28, 2023
4e98897
llama : support offloading result_norm + comments
ggerganov Oct 29, 2023
0dc05b8
llama : factor graph input into a function
ggerganov Oct 29, 2023
e14aa46
llama : do tensor offload only with CUDA
ggerganov Oct 29, 2023
7961790
llama : fix res_norm offloading
ggerganov Oct 29, 2023
b4ad03b
llama : try to optimize offloading code
ggerganov Oct 29, 2023
25cfbf6
llama : fix non-CUDA build
ggerganov Oct 29, 2023
739b85c
llama : try to fix build
ggerganov Oct 29, 2023
da93618
llama : move refact in correct place + optimize graph input
ggerganov Oct 29, 2023
1e9c544
llama : refactor tensor offloading as callback
ggerganov Oct 29, 2023
8925cf9
llama : add layer index to all tensor names
ggerganov Oct 29, 2023
7610879
llama : add functional header
ggerganov Oct 29, 2023
79ad734
llama : comment
ggerganov Oct 29, 2023
210e6e5
llama : remove obsolete map for layer counting
ggerganov Oct 29, 2023
5baefef
llama : add llm_build helper functions (#3848)
ggerganov Oct 31, 2023
afb3929
Merge branch 'master' into llama-refactor
ggerganov Oct 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions ggml-metal.m
Original file line number Diff line number Diff line change
Expand Up @@ -238,14 +238,17 @@ static void ggml_metal_log(enum ggml_log_level level, const char* format, ...){
// load kernels
{
NSError * error = nil;
#define GGML_METAL_ADD_KERNEL(name) \
ctx->function_##name = [ctx->library newFunctionWithName:@"kernel_"#name]; \
ctx->pipeline_##name = [ctx->device newComputePipelineStateWithFunction:ctx->function_##name error:&error]; \

/*
GGML_METAL_LOG_INFO("%s: loaded %-32s %16p | th_max = %4d | th_width = %4d\n", __func__, "kernel_"#name, (void *) ctx->pipeline_##name, \
(int) ctx->pipeline_##name.maxTotalThreadsPerThreadgroup, \
(int) ctx->pipeline_##name.threadExecutionWidth); \
*/
#define GGML_METAL_ADD_KERNEL(name) \
ctx->function_##name = [ctx->library newFunctionWithName:@"kernel_"#name]; \
ctx->pipeline_##name = [ctx->device newComputePipelineStateWithFunction:ctx->function_##name error:&error]; \
if (error) { \
GGML_METAL_LOG_ERROR("%s: error: load pipeline error: %s\n", __func__, [[error description] UTF8String]); \
GGML_METAL_LOG_ERROR("%s: error: load pipeline error: %s\n", __func__, [[error description] UTF8String]); \
return NULL; \
}

Expand Down
2 changes: 1 addition & 1 deletion ggml.h
Original file line number Diff line number Diff line change
Expand Up @@ -709,7 +709,7 @@ extern "C" {
// Context tensor enumeration and lookup
GGML_API struct ggml_tensor * ggml_get_first_tensor(struct ggml_context * ctx);
GGML_API struct ggml_tensor * ggml_get_next_tensor (struct ggml_context * ctx, struct ggml_tensor * tensor);
GGML_API struct ggml_tensor * ggml_get_tensor(struct ggml_context * ctx, const char * name);
GGML_API struct ggml_tensor * ggml_get_tensor (struct ggml_context * ctx, const char * name);

GGML_API struct ggml_tensor * ggml_set_zero(struct ggml_tensor * tensor);
GGML_API struct ggml_tensor * ggml_set_i32 (struct ggml_tensor * tensor, int32_t value);
Expand Down
Loading