Replies: 2 comments 7 replies
-
Thanks for the note. The issue is still being able to write graph kernels in a single library that can work with any backend. Right now it doesn't seem Awk can do that. Am I missing something? |
Beta Was this translation helpful? Give feedback.
4 replies
-
Hi @swamidass could you please give simple example of such function that performs graph calculation using ragged tensors? Does it rely on representing the graphs as ragged tensors?
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm moving a question from pydata/xarray#7988 (reply in thread) into this forum because that thread is about ragged xarrays.
Here's the comment from @swamidass:
I'll start with a response because I think Awkward Array already covers this—you can tell me if I'm still missing something.
The strict (can't be installed without) dependencies for Awkward Array are
numpy
,packaging
, andimportlib_metadata
,typing_extensions
if the Python version is not the latest. It's deliberately a small list. The flip-side of that is that if you use even basic functionality like writing to Parquet, Awkward will complain thatpyarrow
isn't installed, so the workflow of trying something, finding out that you need to install something else, then trying it again may be annoying, but the alternative would be to make Awkward difficult to install for some users, and we chose the conservative approach.Depending on what you mean by metadata, that may be a new feature: #2757 (in
main
) added a top-levelattrs
dict (that gets propagated through all operations) and #2794 (still-open PR) adds per-field attributes. This was inspired by an issue that compared Awkward with xarray (#1391). I said there that we're not attempting to displace xarray (or any array library for rectilinear data), but sometimes you'll get data from a metadata-rich source and at least want to preserve that metadata through pre-processing to the next step, which could be xarray.dask-awkward is a Dask container type, like
dask.array
anddask.bag
, but for Awkward Arrays. Everything is lazy up to thecompute()
call.✔️
#1466 is for the set of functions that convert between Awkward and TensorFlow RaggedTensor and Torch NestedTensor. If it would be helpful to add that, we can get back to it. I think I saw that TensorFlow exposes the offsets and content views, so it can be an easy O(1) function in both directions.
That's default.
Awkward has a backend for JAX, specifically for the purpose of supporting autodiff. It is experimental—requested for autodiff in particle physics (https://github.com/gradhep), but not widely used yet.
For JAX's JIT-compilation, there doesn't seem to be a way to support it. Even with PyTrees, we run into issues in which we need to create arrays whose shapes are determined by values in other arrays, and that's forbidden in the XLA model. We have Numba and cppyy (and soon Julia) for compiled backends, but not JAX.
Ah, I just realized that you meant for the buffers in an Awkward Array to be backed by TensorFlow or Torch, which is not what #1466 will do—it's for conversions. For backing arrays (see ak.to_backend), we only have NumPy for main memory and CuPy for GPUs because once that choice is made, any Python library can view the data without copying.
We don't advertise the CuPy backend yet because we have not yet implemented the full API on GPUs yet. Awkward Arrays in
@numba.cuda.jit
-compiled functions is complete (and was presented in a tutorial), but not theak.*
functions, and we need those to consider Awkward Arrays to be feature-complete on GPUs. This project should be finished next summer. (There's a fixed set of cpu-kernels to rewrite as cuda-kernels.)1Each
ak.Array
has its own backend, so it wouldn't be a global context switch. CPU calculations on NumPy-backed Awkward Arrays can be happening at the same time as CUDA calculations on CuPy-backed Awkward Arrays.A lot of the discussion about ragged arrays and xarray has focused on keeping the xarray interface, which I'm in favor of—xarray users should have a familiar interface, even if that means restricting to only ragged arrays, not the full typesystem. But it sounds like your needs are different, and I don't know of any blockers to using Awkward Array for your task.
Footnotes
I'm being cagey about the distinction between GPUs and CUDA because we pass this handling, down to the compilation itself, onto CuPy. I don't know if CuPy has or will have the capability to cross-compile to ROCm, etc. We're writing very generic CUDA in the hope that auto-translation will become possible. ↩
Beta Was this translation helpful? Give feedback.
All reactions