Skip to content

Passing arguments through iron.jit decorator #2213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Apr 25, 2025

Conversation

ypapadop-amd
Copy link
Collaborator

@ypapadop-amd ypapadop-amd commented Apr 18, 2025

This PR allows to pass arguments through the iron.jit decorator to the called function.

Closes #2211

@ypapadop-amd ypapadop-amd requested a review from mawad-amd April 18, 2025 18:42
@ypapadop-amd ypapadop-amd added the enhancement New feature or request label Apr 18, 2025
@jgmelber
Copy link
Collaborator

I like this!

@mawad-amd
Copy link
Collaborator

mawad-amd commented Apr 18, 2025

I don't think this will work, unfortunately. The operator () (i.e., vector_vector_add(device_map[args.device], input0, input1, output) expects Tensors that match what is inside the runtime sequence because these will be passed to the kernel. I need to add some unittests.

def __call__(self, *args):
"""
Allows the kernel to be called as a function with the provided arguments.
Parameters:
args (IRON Tensors): Arguments to pass to the kernel.
"""
opcode = 3
kernel_args = []
for tensor in args:
if not hasattr(tensor, "buffer_object"):
raise TypeError(
f"Expected Tensor with .buffer_object(), got {type(tensor)}"
)
kernel_args.append(tensor.buffer_object())
h = self.__kernel(opcode, self.__insts_buffer_bo, self.__n_insts, *kernel_args)
r = h.wait()
if r != xrt.ert_cmd_state.ERT_CMD_STATE_COMPLETED:
raise Exception(f"Kernel returned {r}")

@mawad-amd
Copy link
Collaborator

mawad-amd commented Apr 18, 2025

I like the iron.jit though (w/o arguments). I tried to implement previously but had issues.

@ypapadop-amd
Copy link
Collaborator Author

I don't think this will work, unfortunately. The operator () (i.e., vector_vector_add(device_map[args.device], input0, input1, output) expects Tensors that match what is inside the runtime sequence because these will be passed to the kernel. I need to add some unittests.

def __call__(self, *args):
"""
Allows the kernel to be called as a function with the provided arguments.
Parameters:
args (IRON Tensors): Arguments to pass to the kernel.
"""
opcode = 3
kernel_args = []
for tensor in args:
if not hasattr(tensor, "buffer_object"):
raise TypeError(
f"Expected Tensor with .buffer_object(), got {type(tensor)}"
)
kernel_args.append(tensor.buffer_object())
h = self.__kernel(opcode, self.__insts_buffer_bo, self.__n_insts, *kernel_args)
r = h.wait()
if r != xrt.ert_cmd_state.ERT_CMD_STATE_COMPLETED:
raise Exception(f"Kernel returned {r}")

You mean because of the device arg? That can be fixed if we always assume that the first argument is device (or make it positional).

@mawad-amd
Copy link
Collaborator

mawad-amd commented Apr 19, 2025

Particularly for the device, I think it's best if we do things like:

@iron.jit
def kernel()
   device = iron.get_current_device()

iron.set_device(device)

Or via contexts:

@iron.jit
def kernel()
   device = iron.get_current_device()

with iron.device('npu'):
    kernel()

But it's likely we will need to pass more arguments to the kernel that we can't set as globals. We could possibly start adding annotations (e.g., iron.kernel_argument) or something like that.

I think it's best if we just meet and discuss to keep the momentum going. I will schedule something. We can talk more on Tuesday.

@jgmelber
Copy link
Collaborator

We can talk more on Tuesday.

Sounds good to me

@ypapadop-amd
Copy link
Collaborator Author

I can see us controlling not just which device, but how many tiles to use etc. Which I think makes it equivalent to what numba does with kernel invocation (https://numba.readthedocs.io/en/stable/cuda/kernels.html).

But I'd like us to think what's appropriate for our devices, not how the GPU world converged around CUDA.

@ypapadop-amd ypapadop-amd force-pushed the ypapadop/decorator-arg-passthrough branch from 7dacfb7 to f6b8e2a Compare April 21, 2025 16:05
@mawad-amd
Copy link
Collaborator

I like the syntax of kernel[configs](kernel_arg) because then users won't be confused about kernel arguments, their order and how they relate to the runtime sequence. Eventually, we want the user to get a nice error when they mess that up (also the asserts you added, these are great but should be eventually done by compiler).

I also like your config argument, although, I would prefer it to be the last argument. I wonder if we can get rid of tensor_ty = np.ndarray[(num_elements,), np.dtype[dtype]] and other numpy arrays and reuse the input tensors too (not sure how the objectfifo work though).

@ypapadop-amd
Copy link
Collaborator Author

Here's how numba does it: https://numba.pydata.org/numba-doc/dev/roc/examples.html

Triton: https://triton-lang.org/main/getting-started/tutorials/03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py (I think they infer the device from the inputs). In triton, functions with @triton.jit if you pass them as parameters to other functions, which is very convenient.

@ypapadop-amd
Copy link
Collaborator Author

Here's the separation I have in my head:

  1. Function arguments should be passed to function parameters, e.g., def vector_vector_add(input0, input1, output) should be called with vector_vector_add(input0, input1, output).
  2. Attributes that affect how the code is compiled, e.g., is_placed, should go as decorator arguments, e.g., @iron.jit(is_placed=True). I think that also includes any type constraints, like numba does (@roc.jit('(float32[:,:], float32[:,:], float32[:,:])') in https://numba.pydata.org/numba-doc/dev/roc/examples.html) but I'm not sure if we need something like this right now.
  3. Attributes that are affecting run-time behaviour, e.g., to use a cached version or not, where is the cache, which device we target, etc. should be passed somehow else, e.g., the config variable, a singleton that is visible upon import etc.

The distinction between 2 and 3 is that 2 has to do with the code that follows (think of C++ strong typing) whereas 3 is more on where and how that code will be called.

@ypapadop-amd ypapadop-amd force-pushed the ypapadop/decorator-arg-passthrough branch from 6ad02a4 to e69c83e Compare April 24, 2025 14:20
@ypapadop-amd
Copy link
Collaborator Author

I added a iron.set_default_device() / iron.get_default_device() that eliminates the config parameter. However, the column_id is still a problem to express via a config.

@jgmelber
Copy link
Collaborator

jgmelber commented Apr 24, 2025

I added a iron.set_default_device() / iron.get_default_device() that eliminates the config parameter. However, the column_id is still a problem to express via a config.

column_id should be removed. The NPU runtime firmware handles this well. This is an artifact from supporting vck5000 in this example previously.

@jgmelber
Copy link
Collaborator

😄 5737615

@mawad-amd
Copy link
Collaborator

This is looking awesome, thanks Yiannis for the improvements. And thanks for fixing the tensor random initialization bug too. I suggest adding one comment on why the column id is 0 in the code for the readers (I didn't know about that). Other than that, the PR is looking great.

@jgmelber
Copy link
Collaborator

the PR is looking great

Agreed!

dev = NPU2()

# Define tensor types
# Define tensor shape
data_height = 3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I haven't yet figured out. The tensors are 1D, yet they are described as 2D here, and I don't have a mechanism to pass additional arguments.

This is something we discussed today with @SamuelBayliss and @erwei-xilinx for a different use-case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pointer is the only thing that really matters. My understanding is that the tensor shape is only used for a verifier, checking whether the wrap-and-stride access pattern goes out of bound. The amount of data being copied is coded in TensorAccessPattern's sizes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, we lack the mechanism to pass this information in. Another example would be if you have a function that can generate vector-add or vector-sub using one extra argument. It can't be part of the arguments, since those are supposed to be tensors only. How do we pass that in?

ypapadop-amd and others added 6 commits April 24, 2025 21:34
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@ypapadop-amd ypapadop-amd force-pushed the ypapadop/decorator-arg-passthrough branch from 758cd3e to 14fd5ff Compare April 25, 2025 02:34
Co-authored-by: Muhammad Awad <112003944+mawad-amd@users.noreply.github.com>
@ypapadop-amd ypapadop-amd marked this pull request as ready for review April 25, 2025 17:49
Co-authored-by: Yiannis Papadopoulos <Yiannis.Papadopoulos@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Muhammad Awad <112003944+mawad-amd@users.noreply.github.com>
@ypapadop-amd
Copy link
Collaborator Author

This is looking awesome, thanks Yiannis for the improvements. And thanks for fixing the tensor random initialization bug too. I suggest adding one comment on why the column id is 0 in the code for the readers (I didn't know about that). Other than that, the PR is looking great.

What's the comment here? 0 implies runtime figures out placement? @jgmelber

@jgmelber
Copy link
Collaborator

I thought column_id was removed?

@ypapadop-amd
Copy link
Collaborator Author

I thought column_id was removed?

It did, I'm just asking if an additional comment is needed.

@jgmelber
Copy link
Collaborator

It did, I'm just asking if an additional comment is needed.

I don't think so

Copy link
Collaborator

@mawad-amd mawad-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Yiannis!

@ypapadop-amd ypapadop-amd added this pull request to the merge queue Apr 25, 2025
Merged via the queue into main with commit ba43849 Apr 25, 2025
51 checks passed
@ypapadop-amd ypapadop-amd deleted the ypapadop/decorator-arg-passthrough branch April 25, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing arguments to the JIT'd function
4 participants