Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests that model switching costs and add func arg lowering for the air pipeline #566

Merged
merged 8 commits into from
Jul 18, 2024

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Jul 17, 2024

Another change we are making and will have in all iree-compile commands running something beyond a single dispatch is the use of --iree-scheduling-optimize-bindings=false . This is needed because there is currently no way to pass runtime constants to generated kernel. In a typical iree backend they are used to pass two things

  1. buffer offsets
  2. dynamic shapes

with the above flag we wont generate offseted buffers and we dont support dynamic shapes yet so for now this flag allows us to run models where such offsets would otherwise occur.

It was also discovered that the lowering used by the air pipeline does not respect the binding number for the hal.binding ops so we lower them before the AIRRtToNpuPass pass can do it.

@nirvedhmeshram nirvedhmeshram force-pushed the nm_add_pdi_switch_e2e_tests branch 2 times, most recently from 96adfab to 576f8e0 Compare July 17, 2024 17:08
@newling
Copy link
Contributor

newling commented Jul 17, 2024

There are 2 tests which repeat the same matmul N times. Can you update the comments in the tests specifying what they're testing, so that it's clear they're both useful? Have you run this locally with success? CI has numerical errors which aren't zeros... is the offset thing not working?

FYI I created a task which is slightly related (compiler side though, not runtime) #541

I think these are useful numerical tests, so happy include them here. But for benchmarking runtime overheads, should we also have some separate code, something along the lines of #415 ?

@nirvedhmeshram
Copy link
Contributor Author

There are 2 tests which repeat the same matmul N times. Can you update the comments in the tests specifying what they're testing, so that it's clear they're both useful? Have you run this locally with success? CI has numerical errors which aren't zeros... is the offset thing not working?

Yes I can update the comments, what they are testing is that what happens if you dont switch and just keep calling the same kernel, so they establish the baseline for the switching case, Ya the CI caught an error that I didnt locally becuase I was using uniform inputs, its a fun one where we are wrongly swapping argument locations on the hal.interface.bindings (kind of stuff that is tickled with these "serious" models :) )

FYI I created a task which is slightly related (compiler side though, not runtime) #541

I think these are useful numerical tests, so happy include them here. But for benchmarking runtime overheads, should we also have some separate code, something along the lines of #415 ?

The way I am using these tests are with Tracy profiles which give me ns precision timing of all runtime functions,. Of course #415 is useful for the whole model performance benchmarking though, so we should have that for all these "models" too.

@newling
Copy link
Contributor

newling commented Jul 18, 2024

image
Just an FYI: I'm working on changing the script to allow golden values from other sources (arbitrary python) which might be useful if we don't fully trust the llvm-cpu backend.

@nirvedhmeshram nirvedhmeshram changed the title Add e2e tests that model switching costs Add e2e tests that model switching costs and add func arg lowering for the air pipeline Jul 18, 2024
@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Jul 18, 2024

Just an FYI: I'm working on changing the script to allow golden values from other sources (arbitrary python) which might be useful if we don't fully trust the llvm-cpu backend.

so far havent seen any issues but thats a good feature!

@newling
Copy link
Contributor

newling commented Jul 18, 2024

Thanks, looks good to me. I've left only minor suggestions, so accepting in advance.

It was also discovered that the lowering used by the air pipeline does not respect the binding number for the hal.binding ops so we lower them before the AIRRtToNpuPass pass can do it.

Nice fix! I would probably have done this as a separate PR, but they're both pretty small so fine.

@nirvedhmeshram nirvedhmeshram merged commit 2280d55 into main Jul 18, 2024
2 checks passed
@nirvedhmeshram nirvedhmeshram deleted the nm_add_pdi_switch_e2e_tests branch July 18, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants