getNeighborPairs() supports periodic boundary conditions #70

peastman · 2022-11-11T00:38:31Z

I've implemented the CPU version but not the CUDA version so far. Please take a look and see if the API and implementation look OK.

peastman · 2022-11-12T00:31:49Z

I added the CUDA implementation. It mostly works, but test_neighbor_grads() fails with an error I'm not sure of the best way to handle:

RuntimeError: function torch::autograd::CppNode returned an incorrect number of gradients (expected 4, got 3)

Since I added box_vectors as a fourth argument, autograd expects it to return the gradient with respect to that argument. But we don't calculate it, and I'm not sure it would even really make sense. Any gradient with respect to box vectors will be full of discontinuities.

raimis · 2022-11-14T12:51:56Z

Just return an empty tensor Tensor() to indicated that the argument is not differentiable.

peastman · 2022-11-14T18:08:14Z

Thanks! I made the change and the test now passes.

When I run the complete TestNeighbors.py suite, I still get errors in the CUDA version of test_periodic_neighbors():

RuntimeError: CUDA error: device-side assert triggered

After a while of debugging, I figured out it doesn't really have anything to do with that test. It's actually caused by test_too_many_neighbors(), which intentionally triggers an assertion. The error condition somehow isn't getting cleared, such that all CUDA tests run after it always fail. You can observe this by simply adding the line pt.cuda.synchronize() to the end of test_too_many_neighbors(). That will cause it to always fail.

peastman · 2022-11-23T00:18:01Z

Any suggestions about what to do with test_too_many_neighbors()? As far as I can tell torch.cuda doesn't provide any way to reset the device. Once an assert has been triggered, there's no way to clear it and any further CUDA operation in that process will fail.

The obvious solution is not to run that test on CUDA.

raimis · 2022-12-07T16:20:18Z

One option is to call cudaResetDevice using ctypes (https://docs.python.org/3/library/ctypes.html).

peastman · 2022-12-07T20:32:34Z

If pytorch doesn't provide a safe way to reset the device, going behind its back to call a CUDA function directly will likely cause errors as well. That will invalidate all its existing handles to resources on the GPU, but it doesn't know they've been invalidated.

For the moment, I've limited that test to CPU. It's not ideal, but I don't have a better solution.

raimis · 2022-12-09T14:14:17Z

Each test is run in a separate process. So, after the device is reset PyTorch will follow with normal initialization for the next test.

peastman · 2022-12-09T17:03:58Z

You're welcome to see if you can figure out a way to get it to work. But in the mean time, let's not hold up a useful feature over a broken unit test that isn't even related to the new feature.

raimis · 2022-12-12T15:28:44Z

OK! Let's disable the test for now. What else is missing to finish this PR?

peastman · 2022-12-12T16:32:16Z

It's all ready for review.

raimis · 2022-12-12T16:52:57Z

Great! I'll look at it.

raimis

I think, all the requirements of box_vectors can be checked. It would prevent some invalid simulations. The performance impact would be minimal or none (if the CUDA Graphs are used).

raimis · 2022-12-13T16:32:50Z

src/pytorch/neighbors/getNeighborPairsCPU.cpp

@@ -25,6 +28,11 @@ static tuple<Tensor, Tensor, Tensor> forward(const Tensor& positions,

    TORCH_CHECK(cutoff.to<double>() > 0, "Expected \"cutoff\" to be positive");

+    if (box_vectors.size(0) != 0) {


It could check here if all the requirements are satisfied.

I added the checks in the CPU version.

raimis · 2022-12-13T16:33:20Z

src/pytorch/neighbors/getNeighborPairsCUDA.cu

@@ -100,6 +114,12 @@ public:
        TORCH_CHECK(max_num_neighbors_ > 0 || max_num_neighbors_ == -1,
            "Expected \"max_num_neighbors\" to be positive or equal to -1");

+        const bool use_periodic = (box_vectors.size(0) != 0);
+        if (use_periodic) {


It could check here if all the requirements are satisfied too.

Is there any way to check it efficiently? The shape of the tensor is known by the CPU, but the values of its elements are stored on the GPU. If we want to throw an exception based on the values of elements, that will require device to host data transfers and add significant latency every time it's called.

Yes, you are right, it won't be efficient. So the checks have to be done on a GPU.

The only problem is how to abort a kernel elegantly. assert does the jobs, but later a GPU needs a reset. In the CUDA docs, I don't see anything better.

GPU asserts aren't really effective as a way of catching user errors. In addition to the fact that you can't recover from them and have to reset the whole device, they don't provide any useful information to the user. They just get a cryptic "device-side assert triggered" message that tells nothing about what the problem was or how to fix it. The user will usually conclude that your library is broken.

I agree, we need to choose between two evils:

Users get cryptic messages and need to reset the GPU.

Users get incorrect results silently.

raimis · 2023-01-12T10:33:46Z

@RaulPPelaez how do you handle kernel errors in your code?

We need something to be:

Compatible with CUDA graphs
Low overhead
Clear error messages for users

RaulPPelaez · 2023-01-12T10:50:40Z

AFAIK there is no clean way to assert with CUDA. As you mentioned device assert leaves the CUDA context in an unusable state.
What I normally end up doing is to have some errorState array/value in device (or managed) memory. A thread in a kernel encountering an error atomically writes to this errorState and returns as fast as possible. Then you delay as much as possible checking this value for errors.
For instance, if you at least have a record of this error state, the user can query it manually with some kind of checkErrorState() when he notices results are incorrect (unless the code just crashes, that is).
I have never found a clean way to do this without requiring some kind of synchronization (like a device-host copy or a stream sync).

If you think about it, this is the way errors work in CUDA. You need to manually synchronize to query the current error state. e.g auto err = cudaDeviceSynchronize();
So if they have not figured out a better way...

RaulPPelaez · 2023-01-12T10:58:17Z

Something like this https://github.com/RaulPPelaez/UAMMD/blob/f0447444ef1f9d7520e661b6d7151c16b9cfdb1f/src/Interactor/NeighbourList/CellList/CellListBase.cuh#L223-L244

RaulPPelaez · 2023-01-12T13:24:20Z

Any suggestions about what to do with test_too_many_neighbors()? As far as I can tell torch.cuda doesn't provide any way to reset the device. Once an assert has been triggered, there's no way to clear it and any further CUDA operation in that process will fail.

The obvious solution is not to run that test on CUDA.

What is the intended way of using this functionality?
A priori one does not know the total number of pairs, right? I understand it is required, or at least useful, to have control of the maximum number of neighbors per particle from outside, but how does one use it in practice?
In the past I have done things like: set 32 maximum neighbours, if building fails because it is too low increase by 32 until it no longer fails.

If something like that is the case here an extra parameter could be passed to choose whether or not to synchronize and check for a tooManyNeighbours error flag, to find the max number of neighbours as a precomputation. When constructing the CUDA graph this check would be omitted.

peastman · 2023-01-12T15:41:19Z

What I normally end up doing is to have some errorState array/value in device (or managed) memory.

That sounds like a good approach.

Let's merge this now and add error checking along those lines in a separate PR. That's going to require significant design to figure out an efficient mechanism for the error reporting.

raimis · 2023-01-12T15:56:12Z

@peastman let's merge this!

@RaulPPelaez could you open a dedicated issue to discuss and design the error check?

peastman added 2 commits November 10, 2022 16:36

getNeighborPairs() supports periodic boundary conditions

bd50b17

CUDA implementation of periodic boundary conditions

95e863c

Fixed error in autograd

85e0b0d

peastman changed the title ~~[WIP] getNeighborPairs() supports periodic boundary conditions~~ getNeighborPairs() supports periodic boundary conditions Nov 14, 2022

Skip test that causes CUDA assertion

3241687

peastman mentioned this pull request Dec 7, 2022

NNPOps 0.3 #73

Closed

raimis self-requested a review December 12, 2022 16:52

raimis reviewed Dec 13, 2022

View reviewed changes

Added checks for invalid box vectors

d5ee253

raimis self-requested a review January 12, 2023 15:54

raimis approved these changes Jan 12, 2023

View reviewed changes

peastman merged commit 8b2d427 into master Jan 12, 2023

peastman deleted the periodic branch January 12, 2023 15:56

RaulPPelaez mentioned this pull request Jan 12, 2023

Efficient error reporting in CUDA #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getNeighborPairs() supports periodic boundary conditions #70

getNeighborPairs() supports periodic boundary conditions #70

peastman commented Nov 11, 2022

peastman commented Nov 12, 2022

raimis commented Nov 14, 2022

peastman commented Nov 14, 2022

peastman commented Nov 23, 2022

raimis commented Dec 7, 2022

peastman commented Dec 7, 2022

raimis commented Dec 9, 2022

peastman commented Dec 9, 2022

raimis commented Dec 12, 2022

peastman commented Dec 12, 2022

raimis commented Dec 12, 2022

raimis left a comment

raimis Dec 13, 2022

peastman Dec 15, 2022

raimis Dec 13, 2022

peastman Dec 15, 2022

raimis Dec 15, 2022

peastman Dec 15, 2022

raimis Jan 12, 2023

raimis commented Jan 12, 2023

RaulPPelaez commented Jan 12, 2023 •

edited

Loading

RaulPPelaez commented Jan 12, 2023

RaulPPelaez commented Jan 12, 2023

peastman commented Jan 12, 2023

raimis commented Jan 12, 2023

		@@ -25,6 +28,11 @@ static tuple<Tensor, Tensor, Tensor> forward(const Tensor& positions,

		TORCH_CHECK(cutoff.to<double>() > 0, "Expected \"cutoff\" to be positive");

		if (box_vectors.size(0) != 0) {

getNeighborPairs() supports periodic boundary conditions #70

getNeighborPairs() supports periodic boundary conditions #70

Conversation

peastman commented Nov 11, 2022

peastman commented Nov 12, 2022

raimis commented Nov 14, 2022

peastman commented Nov 14, 2022

peastman commented Nov 23, 2022

raimis commented Dec 7, 2022

peastman commented Dec 7, 2022

raimis commented Dec 9, 2022

peastman commented Dec 9, 2022

raimis commented Dec 12, 2022

peastman commented Dec 12, 2022

raimis commented Dec 12, 2022

raimis left a comment

Choose a reason for hiding this comment

raimis Dec 13, 2022

Choose a reason for hiding this comment

peastman Dec 15, 2022

Choose a reason for hiding this comment

raimis Dec 13, 2022

Choose a reason for hiding this comment

peastman Dec 15, 2022

Choose a reason for hiding this comment

raimis Dec 15, 2022

Choose a reason for hiding this comment

peastman Dec 15, 2022

Choose a reason for hiding this comment

raimis Jan 12, 2023

Choose a reason for hiding this comment

raimis commented Jan 12, 2023

RaulPPelaez commented Jan 12, 2023 • edited Loading

RaulPPelaez commented Jan 12, 2023

RaulPPelaez commented Jan 12, 2023

peastman commented Jan 12, 2023

raimis commented Jan 12, 2023

RaulPPelaez commented Jan 12, 2023 •

edited

Loading