Skip to content

Code changes to add PluggableDevice support for Inference #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 6, 2025

Conversation

rjauhari2
Copy link
Contributor

Enable TF_LoadPluggableDeviceLibrary API for PluggableDevices

Enable TF_LoadPluggableDeviceLibrary API for PluggableDevices

Signed-off-by: rjauhari2 <rjauhari@amd.com>
@Craigacp
Copy link
Collaborator

Craigacp commented Mar 7, 2025

Adding this line should have generated more code which needs to be checked in, and we've had issues with the experimental C API in the past. Does this build correctly?

@rjauhari2
Copy link
Contributor Author

rjauhari2 commented Mar 11, 2025

Hi @Craigacp, I have added the code generated form c_api_experimental.h file. Yes, I was able to build it successfully. I have also tested loading our plugin solution, and tried running few models.

@karllessard
Copy link
Collaborator

I also recall trying to enable this experimental API in the past and we were facing compilation issues, maybe it has been cleaned up? But if it now works, that's amazing! I'll run a full build on all supported platforms before merging

@karllessard karllessard added the CI build Triggers a full native build on a pull request label Mar 14, 2025
@karllessard karllessard removed the CI build Triggers a full native build on a pull request label Mar 14, 2025
@karllessard
Copy link
Collaborator

karllessard commented Mar 14, 2025

The standard (quick) build is failing. It looks like the symbols that are coming from the experimental layer are not present in the TensorFlow binaries we compile with (tensorflow_cc).

No wait I was wrong, what is missing is the generated classes under this package. @rjauhari2 , can you please commit them as well? It complains about three missing classes: TF_AttrBuilder, TF_ShapeAndTypeList and TF_CheckpointReader

Also, what command did you run to compile it successfully and what's your platform?

 - Add TF_AttrBuilder, TF_ShapeAndTypeList, TF_CheckpointReader classes.
@rjauhari2
Copy link
Contributor Author

Hi @karllessard, I used "mvn clean install -Pgenerating" command to build TF-Java. I have tested this on "linux-x86_64" machine. Now, I have added the generated class in "c_api" folder. With these changes I am able to build with "mvn clean install". Thanks for your review and guidance.

@karllessard karllessard added the CI build Triggers a full native build on a pull request label Mar 19, 2025
@karllessard
Copy link
Collaborator

Thanks a lot @rjauhari2. Unfortunately, the build failed on Windows, similar to what we've been observing in the past. We'll have to debug this on our side (unless you have access to a Windows machine), so we might want to push changes to your branch. To be continued...

@rjauhari2
Copy link
Contributor Author

Thanks for your quick response. I am able to reproduce the error in your CI (GitHub actions). We shall try from our side.

@rjauhari2
Copy link
Contributor Author

Hi @Craigacp, @karllessard i have added commit for windows build fix. Can you trigger your CI runs again to validate this PR?

@Craigacp
Copy link
Collaborator

Please move that stub out into a separate file, we don't want to mix up the different stubs.

@rjauhari2
Copy link
Contributor Author

Hi @Craigacp, i have moved Windows stub into a separate file "tfe_serverdef_stub.cc". Kindly review.

@Craigacp
Copy link
Collaborator

And create a separate header file rather than putting the function definition in an existing unrelated file.

@rjauhari2
Copy link
Contributor Author

Hi @Craigacp, I have added the header file and included it in the "tensorflow.java" file.

@rjauhari2
Copy link
Contributor Author

Hi @Craigacp and @karllessard, Thank you very much for all your comments, and I have updated the PR based on your comments. I request you to help me on the further steps to get this PR merged. Our solution with TensorFlow is based on pluggabledevice. This PR will enable our plugin to seamlessly work with TensorFlow-Java. Thank you!

Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @rjauhari2 !

@karllessard karllessard merged commit 3f94211 into tensorflow:master Jun 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI build Triggers a full native build on a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants