Skip to content

Memory tests #13669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Memory tests #13669

wants to merge 2 commits into from

Conversation

gabe-l-hart
Copy link
Contributor

Description

This PR ports the scaffold for unit tests around the memory hierarchy from #12331 . I didn't yet add tests for the new iSWA memory since that was added after I started writing tests. My thought is to add these basic tests to enable further development towards the hybrid cache implementation in parallel with more robust tests.

@github-actions github-actions bot added the testing Everything test related label May 20, 2025
These only test the basics so far, but should allow for more expansive
tests to come.

Branch: MemoryTests

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
These tests use private headers, so won't build on windows

Branch: MemoryTests

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@ggerganov
Copy link
Member

As it is, this is mostly testing the llama_batch construction which is not very helpful. We should think if there is some useful way to exercise the functionality of the memory objects.

Relying on symbols that are not explicitly exported from libllama is also not great because it prevents use in some environments like Windows.

An alternative approach that could be better probably is to run tests using just the public API. But for that, we need to create dummy models in some ways. So I am thinking if we should look into adding some very basic experimental API for constructing tiny models for a user-specified arch, which we can then use for testing. Such functionality can be useful not only for testing the KV caches, but other stuff too.

I think @ngxson had some ideas in this direction.

@gabe-l-hart
Copy link
Contributor Author

@ggerganov Makes sense. I had planned to add more tests yesterday, but ended up spending my time getting the other branches in the chain updated for the recent changes on master.

I can try to refactor this to only use the public APIs as you suggest. I think the biggest hang up will be somehow mocking tiny models that represent specific caching types in create_memory. I've been writing too much python lately, so my brain goes to some kind of patching solution, but I don't think that would play well here.

One solution that could have interesting consequences would be to declare many of the members in llama_model as virtual and then create a child class for testing that explicitly overrides key methods. This would also have the consequence of allowing users to define their own "extension models" outside of the core codebase, but would probably violate a large number of assumptions. It also looks like there are comments referring to a new llm_arch_model_i interface, so it seems like you may already be working towards further extensibility in the model interface?

My day is a bit of a meeting wall, but if I get some time in the editor, I'll see if I can push any further on making these tests useful.

@gabe-l-hart
Copy link
Contributor Author

gabe-l-hart commented May 22, 2025

@ggerganov @ngxson I think there's a fairly simple way to enable custom model extensions in unit tests, but I'm guessing it conflicts with plans you already have in place, so want to spell it out here for comment before going too deep.

Basically, each member in llama_model that holds model-specific logic could be declared virtual allowing a derived child class to override them and avoid relying on modifying a big switch statement on arch. The methods that I think would fall into this would be:

  • load_hparams: This would allow us to hand-craft hparams for a test
  • create_memory: This would allow us to be explicit about the type of memory a toy model needs
  • build_graph: This would allow us to build a toy graph with just the pieces we want to poke at in the test
  • load_tensors: This would allow the test to dummy fill the tensors without needing to actually load anything from disk

Going down this route would also open the door to building model-architecture extension modules for real model architectures by eliminating the need for new architectures to extend the llm_arch enum and then integrate into the corresponding switch statements. If we did go down this route, there may be other methods that should be made virtual as well (eg. load_*) so that derived models could customize loading.

One key property of all methods that would be made virtual is that they would be load-time-only in order to avoid any performance impact at runtime.

Another key implication of going down this route would be that the llama-model.h interfaces would need to be semi-public to allow deriving modules to use them. This would probably lead to two "classes" of public API, one for usage, one for extension. It would likely also have knock-on effects like requiring that llama_graph_context be extension-public as well so that derived models could create their own subclasses for their build_graph implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants