Skip to content

Conversation

@gtygo
Copy link
Contributor

@gtygo gtygo commented Aug 9, 2024

  • I have read the contributing guidelines

  • Self-reported review complexity:

    • Low
    • Medium
    • High
  • Description
    This pull request addresses a memory leak issue in the retrieval.cpp file, specifically when continuously accepting query inputs. The problem arises from the llama_batch initialization and clearing process.

  • Problem
    The llama_batch_init function allocates memory on the heap for the batch. However, the current implementation uses llama_batch_clear to reset the batch size to 0, which does not properly free the allocated heap memory. This results in a continuous increase in memory usage as the process runs.

  • Solution
    The solution involves ensuring that the allocated memory for llama_batch is properly freed after each query is processed. This prevents the memory leak and stabilizes the memory usage of the process.

  • Changes
    Replaced llama_batch_clear with llama_batch_free to ensure proper memory deallocation.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to init and free the batch once outside the while loop and, clear it inside the loop

@gtygo
Copy link
Contributor Author

gtygo commented Aug 9, 2024

It would be better to init and free the batch once outside the while loop and, clear it inside the loop

Yes, this reduces frequent memory allocations

@gtygo gtygo requested a review from ggerganov August 9, 2024 17:47
@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix bugfix fixes an issue or bug labels Aug 10, 2024
@ggerganov ggerganov merged commit 4b9afbb into ggml-org:master Aug 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* retrieval

* Reuse querybatch to reduce frequent memory allocation

* delete unused white space
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* retrieval

* Reuse querybatch to reduce frequent memory allocation

* delete unused white space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants