[JOSS review] benchmark on Apple silicon #16

KedoKudo · 2023-11-28T20:27:46Z

This is part of the review feedback for JOSS submission (openjournals/joss-reviews#6024)

It would be interesting to see how the software performs on Apple silicon when running as a CPU process and using the mps backend.

The text was updated successfully, but these errors were encountered:

AndySAnker · 2023-12-20T10:28:07Z

Thank you for the suggestion.

The short answer

We do not support MPS GPU's yet, and MPS CPU is slower than doing the calculation without MPS backend.

The longer answer:

To install torch with MPS backend, I have followed the guide here: https://developer.apple.com/metal/pytorch/
As described in the guide, I get the output: tensor([1.], device='mps:0') meaning that it is correctly installed.

We can allow DebyeCalculator to do calculations on the MPS device, however, the torch.pdist function is yet not supported resulting in the following error message:
NotImplementedError: The operator 'aten::_pdist_forward' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I have set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 meaning that the time limiting step (pdist) of DebyeCalculator is performed at CPU. Afterwards, I benchmark the MPS device with CPU against the CPU in the following code:

from debyecalculator import DebyeCalculator
import timeit
import torch
import random
import matplotlib.pyplot as plt
 
# Define the number of atoms
num_atoms_list = [10, 100, 1000, 10000]
 
# Define the devices
devices = ['cpu', 'mps']
 
# Store the times for each device and number of atoms
times = {device: [] for device in devices}
 
for device in devices:
    for num_atoms in num_atoms_list:
        # Generate random coordinates for the atoms
        coordinates = [[random.random() for _ in range(3)] for _ in range(num_atoms)]
 
        # Create the structure tuple
        structure_tuple = (["Fe"] * num_atoms, torch.tensor(coordinates))
 
        # Initialise calculator object
        calc = DebyeCalculator(qmin=1.0, qmax=8.0, qstep=0.01, device=device)
 
        # Setup the timeit function
        setup = 'from __main__ import calc, structure_tuple'
 
        # Time the calc.iq function
        elapsed_time = timeit.timeit('calc.iq(structure_source=structure_tuple)', setup=setup, number=10)
 
        # Convert the time to milliseconds and store it
        elapsed_time_ms = elapsed_time * 1000
        times[device].append(elapsed_time_ms)
 
# Plot the times
for device, device_times in times.items():
    plt.plot(num_atoms_list, device_times, 'o--', label=device)
 
plt.xlabel('Number of atoms')
plt.ylabel('Time (ms)')
plt.legend()
plt.show()

The result is as follows:

We can conclude that for now, we cannot offer any acceleration on the MPS device. We hope that the torch.pdist function will be implemented in the MPS device. It seems like Apple puts some effort into this: https://github.com/ml-explore/mlx

If you have any ideas on how we can offer MPS acceleration, please let us know :-)

AndySAnker · 2024-12-19T08:55:33Z

With #42, we now offer MPS calculations, meaning that 'mps' is allowed as a input for device.

However, the software is not optimised for MPS and therefore does not give speed-ups. For a Mac M3 chip, it is about 10 % slower on MPS than CPU. However, it means that for calculations of scattering patterns from many structures, parallel calculations using both CPU and MPS can be done, giving a speed-up of about 15 %.

Note: MPS does not work with Python3.7. It works with Python >=3.8.

KedoKudo mentioned this issue Nov 28, 2023

[REVIEW]: A GPU-Accelerated Open-Source Python Package for Calculating Powder Diffraction, Small-Angle-, and Total Scattering with the Debye Scattering Equation openjournals/joss-reviews#6024

Closed

AndySAnker self-assigned this Nov 29, 2023

AndySAnker closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JOSS review] benchmark on Apple silicon #16

[JOSS review] benchmark on Apple silicon #16

KedoKudo commented Nov 28, 2023

AndySAnker commented Dec 20, 2023

AndySAnker commented Dec 19, 2024

[JOSS review] benchmark on Apple silicon #16

[JOSS review] benchmark on Apple silicon #16

Comments

KedoKudo commented Nov 28, 2023

AndySAnker commented Dec 20, 2023

The short answer

The longer answer:

AndySAnker commented Dec 19, 2024