Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : model-based max number of graph nodes #8622

Merged
merged 2 commits into from
Jul 27, 2024
Merged

Conversation

ggerganov
Copy link
Owner

fix #8615

Propose to determine the max number of nodes based on the model info (arch, hparams, etc.)

@ggerganov ggerganov requested a review from slaren July 22, 2024 06:52
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jul 22, 2024
Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve the immediate problem. It would be good to be able to detect the number of nodes necessary automatically and always use the lowest possible value, because there is an overhead from clearing some buffers that is proportional to the number of nodes. I have been doing some work to reduce this overhead, but it is not ready yet.

@ggerganov
Copy link
Owner Author

Ok, will merge this after the 405B model is release and the need for this change is confirmed. Likely the proposed n_layer > 400 check would have to be updated, because this number seems too big to me

@ceddybi
Copy link

ceddybi commented Jul 26, 2024

@ggerganov what's left? up to now?

@ggerganov
Copy link
Owner Author

I haven't noticed any reports of 405B failing, so removed the increased max nodes limit for now and planning to merge just the new llama_model_max_nodes function

@ggerganov ggerganov merged commit 92090ec into master Jul 27, 2024
59 checks passed
@ggerganov ggerganov deleted the gg/custom-max-nodes branch July 27, 2024 11:59
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
…anov#8622)

* llama : model-based max number of graph nodes

ggml-ci

* llama : disable 405B max_nodes path due to lack of complaints

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: LLAMA_MAX_NODES must be increased to run 405B Mega merge
4 participants