Skip to content

getNeighborPairs() supports periodic boundary conditions #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 12, 2023
Merged

Conversation

peastman
Copy link
Member

I've implemented the CPU version but not the CUDA version so far. Please take a look and see if the API and implementation look OK.

@peastman
Copy link
Member Author

I added the CUDA implementation. It mostly works, but test_neighbor_grads() fails with an error I'm not sure of the best way to handle:

RuntimeError: function torch::autograd::CppNode returned an incorrect number of gradients (expected 4, got 3)

Since I added box_vectors as a fourth argument, autograd expects it to return the gradient with respect to that argument. But we don't calculate it, and I'm not sure it would even really make sense. Any gradient with respect to box vectors will be full of discontinuities.

@raimis
Copy link
Contributor

raimis commented Nov 14, 2022

Just return an empty tensor Tensor() to indicated that the argument is not differentiable.

@peastman
Copy link
Member Author

Thanks! I made the change and the test now passes.

When I run the complete TestNeighbors.py suite, I still get errors in the CUDA version of test_periodic_neighbors():

RuntimeError: CUDA error: device-side assert triggered

After a while of debugging, I figured out it doesn't really have anything to do with that test. It's actually caused by test_too_many_neighbors(), which intentionally triggers an assertion. The error condition somehow isn't getting cleared, such that all CUDA tests run after it always fail. You can observe this by simply adding the line pt.cuda.synchronize() to the end of test_too_many_neighbors(). That will cause it to always fail.

@peastman peastman changed the title [WIP] getNeighborPairs() supports periodic boundary conditions getNeighborPairs() supports periodic boundary conditions Nov 14, 2022
@peastman
Copy link
Member Author

Any suggestions about what to do with test_too_many_neighbors()? As far as I can tell torch.cuda doesn't provide any way to reset the device. Once an assert has been triggered, there's no way to clear it and any further CUDA operation in that process will fail.

The obvious solution is not to run that test on CUDA.

@raimis
Copy link
Contributor

raimis commented Dec 7, 2022

One option is to call cudaResetDevice using ctypes (https://docs.python.org/3/library/ctypes.html).

@peastman
Copy link
Member Author

peastman commented Dec 7, 2022

If pytorch doesn't provide a safe way to reset the device, going behind its back to call a CUDA function directly will likely cause errors as well. That will invalidate all its existing handles to resources on the GPU, but it doesn't know they've been invalidated.

For the moment, I've limited that test to CPU. It's not ideal, but I don't have a better solution.

@peastman peastman mentioned this pull request Dec 7, 2022
@raimis
Copy link
Contributor

raimis commented Dec 9, 2022

Each test is run in a separate process. So, after the device is reset PyTorch will follow with normal initialization for the next test.

@peastman
Copy link
Member Author

peastman commented Dec 9, 2022

You're welcome to see if you can figure out a way to get it to work. But in the mean time, let's not hold up a useful feature over a broken unit test that isn't even related to the new feature.

@raimis
Copy link
Contributor

raimis commented Dec 12, 2022

OK! Let's disable the test for now. What else is missing to finish this PR?

@peastman
Copy link
Member Author

It's all ready for review.

@raimis raimis self-requested a review December 12, 2022 16:52
@raimis
Copy link
Contributor

raimis commented Dec 12, 2022

Great! I'll look at it.

Copy link
Contributor

@raimis raimis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, all the requirements of box_vectors can be checked. It would prevent some invalid simulations. The performance impact would be minimal or none (if the CUDA Graphs are used).

@@ -25,6 +28,11 @@ static tuple<Tensor, Tensor, Tensor> forward(const Tensor& positions,

TORCH_CHECK(cutoff.to<double>() > 0, "Expected \"cutoff\" to be positive");

if (box_vectors.size(0) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could check here if all the requirements are satisfied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the checks in the CPU version.

@@ -100,6 +114,12 @@ public:
TORCH_CHECK(max_num_neighbors_ > 0 || max_num_neighbors_ == -1,
"Expected \"max_num_neighbors\" to be positive or equal to -1");

const bool use_periodic = (box_vectors.size(0) != 0);
if (use_periodic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could check here if all the requirements are satisfied too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to check it efficiently? The shape of the tensor is known by the CPU, but the values of its elements are stored on the GPU. If we want to throw an exception based on the values of elements, that will require device to host data transfers and add significant latency every time it's called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it won't be efficient. So the checks have to be done on a GPU.

The only problem is how to abort a kernel elegantly. assert does the jobs, but later a GPU needs a reset. In the CUDA docs, I don't see anything better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU asserts aren't really effective as a way of catching user errors. In addition to the fact that you can't recover from them and have to reset the whole device, they don't provide any useful information to the user. They just get a cryptic "device-side assert triggered" message that tells nothing about what the problem was or how to fix it. The user will usually conclude that your library is broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we need to choose between two evils:

  • Users get cryptic messages and need to reset the GPU.
  • Users get incorrect results silently.

@raimis
Copy link
Contributor

raimis commented Jan 12, 2023

@RaulPPelaez how do you handle kernel errors in your code?

We need something to be:

  • Compatible with CUDA graphs
  • Low overhead
  • Clear error messages for users

@RaulPPelaez
Copy link
Contributor

RaulPPelaez commented Jan 12, 2023

AFAIK there is no clean way to assert with CUDA. As you mentioned device assert leaves the CUDA context in an unusable state.
What I normally end up doing is to have some errorState array/value in device (or managed) memory. A thread in a kernel encountering an error atomically writes to this errorState and returns as fast as possible. Then you delay as much as possible checking this value for errors.
For instance, if you at least have a record of this error state, the user can query it manually with some kind of checkErrorState() when he notices results are incorrect (unless the code just crashes, that is).
I have never found a clean way to do this without requiring some kind of synchronization (like a device-host copy or a stream sync).

If you think about it, this is the way errors work in CUDA. You need to manually synchronize to query the current error state. e.g auto err = cudaDeviceSynchronize();
So if they have not figured out a better way...

@RaulPPelaez
Copy link
Contributor

Any suggestions about what to do with test_too_many_neighbors()? As far as I can tell torch.cuda doesn't provide any way to reset the device. Once an assert has been triggered, there's no way to clear it and any further CUDA operation in that process will fail.

The obvious solution is not to run that test on CUDA.

What is the intended way of using this functionality?
A priori one does not know the total number of pairs, right? I understand it is required, or at least useful, to have control of the maximum number of neighbors per particle from outside, but how does one use it in practice?
In the past I have done things like: set 32 maximum neighbours, if building fails because it is too low increase by 32 until it no longer fails.

If something like that is the case here an extra parameter could be passed to choose whether or not to synchronize and check for a tooManyNeighbours error flag, to find the max number of neighbours as a precomputation. When constructing the CUDA graph this check would be omitted.

@peastman
Copy link
Member Author

What I normally end up doing is to have some errorState array/value in device (or managed) memory.

That sounds like a good approach.

Let's merge this now and add error checking along those lines in a separate PR. That's going to require significant design to figure out an efficient mechanism for the error reporting.

@raimis raimis self-requested a review January 12, 2023 15:54
@raimis
Copy link
Contributor

raimis commented Jan 12, 2023

@peastman let's merge this!

@RaulPPelaez could you open a dedicated issue to discuss and design the error check?

@peastman peastman merged commit 8b2d427 into master Jan 12, 2023
@peastman peastman deleted the periodic branch January 12, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants