You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
When a graph has weight buffers on several backends, the scheduler makes the graph into several splits. ggml-backend.c#L1593
if (src_backend_id != cur_backend_id && !supported) {
// create a copy of the input in the split's backend
const size_t id = hash_id(src);
if (sched->tensor_copies[id][cur_backend_id][0] == NULL) {
ggml_backend_t backend = sched->backends[cur_backend_id];
for (int c = 0; c < sched->n_copies; c++) {
struct ggml_tensor * tensor_copy = ggml_dup_tensor_layout(sched->ctx, src);
ggml_format_name(tensor_copy, "%s#%s#%d", ggml_backend_name(backend), src->name, c);
if (sched->n_copies > 1) {
ggml_set_input(tensor_copy);
ggml_set_output(tensor_copy); // prevent ggml-alloc from overwriting the tensor
}
sched->tensor_copies[id][cur_backend_id][c] = tensor_copy;
SET_CAUSE(tensor_copy, "4.cpy");
}
int n_inputs = split->n_inputs++;
GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS);
split->inputs[n_inputs] = src;
}
node->src[j] = sched->tensor_copies[id][cur_backend_id][sched->cur_copy];
}
If a node's backend_id differs from one of its srcs, it creates a new split, as well as tensor_copies and the information needed to copy between splits later.
However, here it also modifies the original cgraph to make the node's src points to the tensor_copy.
void ggml_backend_sched_reset(ggml_backend_sched_t sched) {
// reset state for the next run
if (!sched->is_reset) {
size_t hash_size = sched->hash_set.size;
memset(sched->hash_set.keys, 0, sizeof(sched->hash_set.keys[0]) * hash_size); // NOLINT
memset(sched->tensor_backend_id, -1, sizeof(sched->tensor_backend_id[0]) * hash_size);
memset(sched->tensor_copies, 0, sizeof(sched->tensor_copies[0]) * hash_size);
sched->is_reset = true;
}
sched->is_alloc = false;
}
As a result, the next time ggml_backend_sched_graph_compute is called, there are two different situations:
The allocated tensor_copy is still there: Then this node no longer has different backend_id from its srcs, thus the copying between splits doesn't happen again.
The allocated tensor_copy is freed: Then access violation or something else may happen.
As you noted, ggml_backend_sched_alloc_graph modifies the graph in a way that prevents it from being used again after ggml_backend_sched_reset is called. You should still be able to evaluate the same graph multiple times by calling ggml_backend_sched_graph_compute only. I would consider this is a limitation rather than a bug.
As you noted, ggml_backend_sched_alloc_graph modifies the graph in a way that prevents it from being used again after ggml_backend_sched_reset is called. You should still be able to evaluate the same graph multiple times by calling ggml_backend_sched_graph_compute only. I would consider this is a limitation rather than a bug.
It seems that you are right :P
Thanks a lot for your help!
Hi!
When a graph has weight buffers on several backends, the scheduler makes the graph into several splits.
ggml-backend.c#L1593
If a node's backend_id differs from one of its srcs, it creates a new split, as well as tensor_copies and the information needed to copy between splits later.
However, here it also modifies the original cgraph to make the node's src points to the tensor_copy.
But
ggml_backend_sched_reset
doesn't reset this.As a result, the next time
ggml_backend_sched_graph_compute
is called, there are two different situations:Here's a simple program to reproduce this issue:
Is this a bug? Feel free to point it out if I've misunderstood something :D
Thanks in advance
The text was updated successfully, but these errors were encountered: