ggml : remove GGML_MAX_NODES limit #567

ggerganov · 2023-10-10T11:07:55Z

We should be able to create graphs with large number of nodes. One application is support for LSTM networks:

The exact approach for implementing this is not currently clear, but some ideas and discussion is available in the linked issue above

datduonguva · 2023-10-11T18:01:29Z

my approach for this would be to create the graph for just 1 cell. For each time step, will simply update the input and states, call the ggml_graph_compute_with_ctx and get the output. That way, the graph's size doesn't grow with the number of timestep.

Example:

https://github.com/datduonguva/ggml-experiments/blob/master/rnn_text_gen/rnn_text_generation.cpp

ggerganov · 2023-10-11T18:30:12Z

my approach for this would be to create the graph for just 1 cell.

cc @PABannier

PABannier · 2023-10-11T18:54:32Z

Yes we discussed that in #467 ! But for large sequences, isn't the overhead of building the graph per cell slowing this down dramatically?

@datduonguva Have you tried by building a general graph by setting GGML_MAX_NODES to 100,000 for instance?

ggerganov · 2023-10-11T19:06:55Z

@PABannier They build the graph only once and then for each iteration, they put different input in the input tensor and read the output from the output tensor and so on. I don't know if this is valid for all LSTM models, but this specific one has a fixed graph that does not change it's shape with time.

This approach does not have the overhead of building the graph each time, but it does have an overhead of starting and stopping the thread pool for every iteration.

ggerganov · 2023-10-19T15:18:08Z

Let's fix the ggml_cgraph MAX_NODES limit. We should implement the second option from #586 (comment)

If we take the ideas from #467 and drop the requirement for having graphs on the stack.
So basically have:

    struct ggml_cgraph {
        int n_nodes;
        int n_leafs;

        // by default we work on the stack
        struct ggml_tensor * nodes;
        struct ggml_tensor * grads;
        struct ggml_tensor * leafs;

        void ** visited_hash_table;

        // performance
        int     perf_runs;
        int64_t perf_cycles;
        int64_t perf_time_us;
    };

We allow creation of new graphs only through ggml_new_graph(). Extend the interface with number of nodes:

GGML_API struct ggml_cgraph * ggml_new_graph        (struct ggml_context * ctx, int n_nodes_max);
GGML_API struct ggml_cgraph * ggml_build_forward_ctx(struct ggml_context * ctx, struct ggml_tensor * tensor, int n_nodes_max);

I.e. all graphs are always create inside a ggml_context - is this a good idea?

Deprecate ggml_build_forward (struct ggml_tensor * tensor);. Instead, we should use:

// old
gf = ggml_build_forward(x);

// new
gf = ggml_build_forward_ctx(ctx, x, n_nodes_max);

The proposed struct ggml_cgraph can be used as a view to a graph allocated in a context.
Add API for creating views:

struct ggml_cgraph ggml_graph_view(struct ggml_cgraph * cgraph, int n0, int n1);

I think leafs always have to be "viewed in full", i.e only the nodes/grads can be split? The hash table can be reused by the views I think? Probably OK to return graph views on the stack?

Probably need to extend:

GGML_API size_t ggml_graph_overhead(int n_nodes_max);

slaren · 2023-10-19T15:49:50Z

Deprecate ggml_build_forward (struct ggml_tensor * tensor);

Deprecate usually means that there would be a transition period where both APIs would work, but I am not sure that would be possible here. Do you actually mean that, or just removing ggml_build_forward entirely?

I think leafs always have to be "viewed in full", i.e only the nodes/grads can be split?

For the purposes of ggml-backend, I don't care about leafs at all. I don't think that backends ever use them while computing a graph either. So I would be fine with just setting it to NULL. I don't know in what cases a list of leaf is required.

The hash table can be reused by the views I think?

The hashtable is only used to avoid revisiting nodes while expanding the graph, and views shouldn't be allowed to be expanded since they would just overwrite the parent's memory. I think it would be ok to just set the hash table to NULL.

Probably OK to return graph views on the stack?

It's probably OK, but it doesn't really solve anything and it may reduce options in the future. Ie. we may need to change something that would require dynamic memory allocation even on views. So I think the safer route would be to also allocate views in a context.

Should we do something to avoid allocating the list of grads on forward-only graphs? It is just wasted memory during inference.

ggerganov · 2023-10-19T15:59:52Z

Do you actually mean that, or just removing ggml_build_forward entirely?

Remove it entirely. We will add comment in the header how to migrate so it is easier for 3rd party projects.

I don't know in what cases a list of leaf is required.

Hm, interesting. leafs do seem like an unnecessary categorization. I guess I must have been thinking about some use case when I separated the nodes and leafs, but as it stands - this seems not needed, just doubling the logic of handling nodes. We should remove the leafs from the graph - can do it in a separate issue.

So I think the safer route would be to also allocate views in a context.

OK

Should we do something to avoid allocating the list of grads on forward-only graphs? It is just wasted memory during inference.

Yes, probably a flag that the user sets when they know that backward passes will be computed.

slaren · 2023-10-20T21:05:29Z

Most of the changes to ggml.c/h are done, but I have not updated the tests and examples yet because I am not entirely sure about the interface. ggml_new_graph should be easy to use and shouldn't have too many parameters for basic cases, and these parameters should not need to be replicated in ggml_build_xxx_ctx.

I propose keeping a simple ggml_new_graph function that doesn't take any parameters (other than a context). More complex initialization can be done with another function, such as ggml_new_graph_custom. The ggml_build_xxx_ctx functions are removed. The complete interface would be something like this:

// Graph interface
GGML_API void ggml_build_forward_expand (struct ggml_cgraph * cgraph, struct ggml_tensor * tensor);
GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);

GGML_API struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx); // = ggml_new_graph_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API struct ggml_cgraph * ggml_new_graph_custom(struct ggml_context * ctx, size_t size, bool grads);

GGML_API void ggml_graph_reset(struct ggml_cgraph * graph);

GGML_API struct ggml_cgraph * ggml_graph_view(struct ggml_context * ctx, struct ggml_cgraph * cgraph, int i0, int i1);
// Previously, graphs could be copied simply with the operator =, but this does not work anymore.
// Restore this functionality with these functions:
GGML_API void ggml_graph_cpy (struct ggml_cgraph * src, struct ggml_cgraph * dst);
GGML_API struct ggml_cgraph * ggml_graph_dup (struct ggml_context * ctx, struct ggml_cgraph * src); // ggml_graph_cpy(src, ggml_new_graph(...));

GGML_API size_t ggml_graph_overhead(); // = ggml_graph_overhead_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API size_t ggml_graph_overhead_custom(size_t size, bool grads);

// Removed:
// GGML_API struct ggml_cgraph * ggml_build_forward_ctx (struct ggml_context * ctx, struct ggml_tensor * tensor, size_t size);
// GGML_API struct ggml_cgraph * ggml_build_backward_ctx(struct ggml_context * ctx, struct ggml_cgraph * gf, bool keep);

// Instead:
// struct ggml_cgraph * gf = ggml_new_graph(ctx);
// ggml_graph_reset(gf); // ggml_build_forward() also resets the graph, add this function to support this
// ggml_build_forward_expand(gf, tensor);
//
// structr ggml_cgraph * gb = ggml_new_graph(ctx);
// ggml_build_backward_expand(ctx, gf, gb, keep);

I noticed that the graph reset functionality in ggml_build_forward isn't really working currently because the hash table is not reset, only n_nodes and n_leafs, so I imagine nobody depends on this behavior at the moment. Nonetheless, a ggml_graph_reset function would be easy to implement and could be useful in some cases.

ggerganov · 2023-10-21T08:12:23Z

GGML_API struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx); // = ggml_new_graph_custom(GGML_GRAPH_DEFAULT_SIZE, true);

By default, we probably don't want to have grads as 99% of the use-cases are currently just forward pass, so maybe change to false.

For the graph resetting in ggml_build_forward we probably need a new ggml_graph_reset or ggml_graph_clear that zeros the nodes and clears the hash table.

Below is a short description I wrote about the existing ggml_graph_reset(), before I realized you were referring to the resetting in ggml_build_forward(), so not very relevant, but keeping it for FYI because I already wrote it anyway:

GGML_API void ggml_graph_reset(struct ggml_cgraph * graph);

The purpose of this function is to just set the gradient contents to 0. The test1.c test demonstrates sample use-cases. We first set the input against which we will do automatic differentiation with ggml_set_param and then we build the forward graph. This will cause each node in the graph that depends directly or indirectly on any of the inputs to have a grad tensor allocated. For a given forward graph, we can build the backward graph. Here we have 2 options - keep == true means that the backward graph will make a copy of the original grad tensors and if keep == false it will reuse them. In the latter option, we save memory, but we can no longer build a backward graph of the backward graph (if we wanted to compute the second derivative for example).
Back to ggml_graph_reset, lets assume we have already build the forward and backward graph and now we want to compute the gradients for different values of the input. Before each ggml_graph_compute, we zero-out the gradients and set the values of the inputs. This mechanism allows us to evaluate the function (i.e. forward graph) and it's derivatives (i.e. backward graph) many times without recreating the graphs.
In that sense, probably we should rename the function to ggml_graph_reset_grads.

Other than that, I think the API is good.

slaren · 2023-10-21T12:51:56Z

I didn't realize that there is already a function called ggml_graph_reset, sorry for the confusion.

ggerganov added the refactoring Refactoring label Oct 10, 2023

ggerganov mentioned this issue Oct 19, 2023

[wip] ggml-backend v2 : add ggml_backend_sched #586

Merged

slaren linked a pull request Oct 21, 2023 that will close this issue

[wip] ggml-backend v2 : add ggml_backend_sched #586

Merged

ggerganov closed this as completed in #586 Oct 30, 2023

fann1993814 mentioned this issue Oct 31, 2023

Update new GGML for GGML_MAX_NODES limit? RWKV/rwkv.cpp#142

Open

slaren mentioned this issue Feb 16, 2024

Baby-llama.cpp report bus error ggerganov/llama.cpp#4830

Closed

NawafAlansari mentioned this issue Feb 18, 2024

Fixed the baby-llama issue (see issue #4830) ggerganov/llama.cpp#5573

Merged

ggerganov mentioned this issue Mar 1, 2024

Differences between cgraph->leafs and cgraph->nodes? ggerganov/llama.cpp#5791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : remove GGML_MAX_NODES limit #567

ggml : remove GGML_MAX_NODES limit #567

ggerganov commented Oct 10, 2023

datduonguva commented Oct 11, 2023

ggerganov commented Oct 11, 2023

PABannier commented Oct 11, 2023

ggerganov commented Oct 11, 2023 •

edited

Loading

ggerganov commented Oct 19, 2023 •

edited

Loading

slaren commented Oct 19, 2023 •

edited

Loading

ggerganov commented Oct 19, 2023

slaren commented Oct 20, 2023

ggerganov commented Oct 21, 2023

slaren commented Oct 21, 2023

ggml : remove GGML_MAX_NODES limit #567

ggml : remove GGML_MAX_NODES limit #567

Comments

ggerganov commented Oct 10, 2023

datduonguva commented Oct 11, 2023

ggerganov commented Oct 11, 2023

PABannier commented Oct 11, 2023

ggerganov commented Oct 11, 2023 • edited Loading

ggerganov commented Oct 19, 2023 • edited Loading

slaren commented Oct 19, 2023 • edited Loading

ggerganov commented Oct 19, 2023

slaren commented Oct 20, 2023

ggerganov commented Oct 21, 2023

slaren commented Oct 21, 2023

ggerganov commented Oct 11, 2023 •

edited

Loading

ggerganov commented Oct 19, 2023 •

edited

Loading

slaren commented Oct 19, 2023 •

edited

Loading