[AIR] Support array output from Torch model #24902

amogkam · 2022-05-18T06:09:05Z

The previous implementation of TorchPredictor failed when the model outputted an array for each sample (for example outputting logits). This is because Pandas Dataframes cannot hold numpy arrays of more than 1 element. We have to convert the outermost numpy array returned by the model to a list so that it be can stored in a Pandas Dataframe.

The newly added test fails under the old implementation.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

bveeramani

Overall LGTM. If we can't use TensorArray here, then I think this should be good to merge

bveeramani · 2022-05-18T07:28:25Z

python/ray/ml/predictors/integrations/torch/torch_predictor.py

+        # these cannot be used as values in a Pandas Dataframe.
+        # We have to convert the outermost dimension to a python list (but the values
+        # in the list can still be Numpy arrays).
+        return pd.DataFrame({"predictions": list(prediction)}, columns=["predictions"])


Can we use TensorArray here now that #24752 is merged?

Suggested change

return pd.DataFrame({"predictions": list(prediction)}, columns=["predictions"])

return pd.DataFrame({"predictions": TensorArray(prediction)}, columns=["predictions"])

I didn't want to have a ray.data dependency in Predictors. Once we do the data type conversion cleanup and probably move TensorArray to a common utils file, I definitely agree though, we should use TensorArray here!

bveeramani · 2022-05-18T07:29:37Z

python/ray/ml/predictors/integrations/torch/torch_predictor.py

+            unsqueeze (bool): If set to True, the features tensors will be unsqueezed
+                (reshaped to (N, 1)) before being concatenated into the final features
+                tensor. Otherwise, they will be left as is, that is (N, ).
+                Defaults to True.


Nit: out of scope for this PR, but this explanation doesn't really make sense for multi-dimensional inputs

We could change the description, yeah. We are now calling unsqueeze which should work with an arbitrary number of dimensions and not just 2, like the view call we were using before.

Agreed the semantics here need to be changed. Though this is out of scope for this PR- this PR is just updating the docstring to match the same as the argument name, which is a mistake we had before.

bveeramani · 2022-05-18T07:33:02Z

python/ray/ml/tests/test_torch_predictor.py

+def test_predict_array_output():
+    """Tests if predictor works if model outputs an array instead of single value."""
+
+    predictor = TorchPredictor(model=model, preprocessor=preprocessor)


Nit: this is definitely out of scope for this PR, but it might be better to define model and preprocessor in the test because:

Less confusion when reading the tests since all of the code is visible from within each test.

Less chance of sharing state between tests, which creates unwanted dependencies between them.

Related advice from .NET's unit testing best practices.

bveeramani · 2022-05-18T07:33:39Z

python/ray/ml/tests/test_torch_predictor.py

+def test_predict_array_output():
+    """Tests if predictor works if model outputs an array instead of single value."""
+
+    predictor = TorchPredictor(model=model, preprocessor=preprocessor)


Just for my own understanding, why do we need to use a preprocessor here?

I believe it is to check that its applied, though that could be covered by other tests and not this one specifically

In this case I'd vote to leave it out of this test. We should ideally only test one thing

Sounds good, I updated the tests! PTAL!

Yard1

Thanks, this looks good. +1 to Balaji's comment about using a TensorArray once that's possible, though we probably don't need to block on that.

krfricke

Generally looks good once @bveeramani's comments are addressed

krfricke · 2022-05-18T14:24:04Z

python/ray/ml/tests/test_torch_predictor.py

+def test_predict_array_output():
+    """Tests if predictor works if model outputs an array instead of single value."""
+
+    predictor = TorchPredictor(model=model, preprocessor=preprocessor)


In this case I'd vote to leave it out of this test. We should ideally only test one thing

bveeramani

LGTM

update

e8e7d10

amogkam assigned Yard1, krfricke, xwjiang2010 and bveeramani May 18, 2022

bveeramani reviewed May 18, 2022

View reviewed changes

Yard1 approved these changes May 18, 2022

View reviewed changes

krfricke approved these changes May 18, 2022

View reviewed changes

richardliaw added this to the Ray AIR milestone May 18, 2022

richardliaw added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 18, 2022

update testing file

674ed75

amogkam removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 18, 2022

amogkam requested a review from bveeramani May 18, 2022 23:53

bveeramani approved these changes May 19, 2022

View reviewed changes

amogkam merged commit f7d75c7 into ray-project:master May 19, 2022

amogkam deleted the torch-predictor-image-support branch May 19, 2022 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIR] Support array output from Torch model #24902

[AIR] Support array output from Torch model #24902

amogkam commented May 18, 2022

bveeramani left a comment

bveeramani May 18, 2022

amogkam May 18, 2022

bveeramani May 18, 2022

Yard1 May 18, 2022

amogkam May 18, 2022 •

edited

Loading

bveeramani May 18, 2022

amogkam May 18, 2022

bveeramani May 18, 2022

Yard1 May 18, 2022

krfricke May 18, 2022

amogkam May 18, 2022 •

edited

Loading

Yard1 left a comment

krfricke left a comment

krfricke May 18, 2022

bveeramani left a comment

	return pd.DataFrame({"predictions": list(prediction)}, columns=["predictions"])
	return pd.DataFrame({"predictions": TensorArray(prediction)}, columns=["predictions"])

[AIR] Support array output from Torch model #24902

[AIR] Support array output from Torch model #24902

Conversation

amogkam commented May 18, 2022

Why are these changes needed?

Related issue number

Checks

bveeramani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amogkam May 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amogkam May 18, 2022 • edited Loading

Choose a reason for hiding this comment

Yard1 left a comment

Choose a reason for hiding this comment

krfricke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bveeramani left a comment

Choose a reason for hiding this comment

amogkam May 18, 2022 •

edited

Loading

amogkam May 18, 2022 •

edited

Loading