Skip to content

Updated the redistributed version of Tensorflow to 1.14 #3929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 8, 2019
Merged

Updated the redistributed version of Tensorflow to 1.14 #3929

merged 17 commits into from
Jul 8, 2019

Conversation

harishsk
Copy link
Contributor

@harishsk harishsk commented Jun 27, 2019

fixes #3962

@harishsk harishsk requested review from zeahmed and codemzs June 27, 2019 22:58
Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Any reason why some of the tests are failing?

@harishsk
Copy link
Contributor Author

I have probable cause for why Mac tests are failing. The 1.13 package for Mac used to have .so files that we would rename to .dylib files in our build process. As part of the rename, it looks like we were also registering the path in the load library search path.
The new 1.14 packages already have .dylib files so the rename isn't necessary. But when I remove the code for renaming, the tests aren't finding the relevant libraries in their path. So I need to do something different.

I don't yet know whey the Linux tests are failing.

@harishsk harishsk requested a review from eerhardt July 1, 2019 21:55
@harishsk harishsk self-assigned this Jul 1, 2019
@harishsk
Copy link
Contributor Author

harishsk commented Jul 1, 2019

A longer explanation of the fix is necessary.
In v1.13.1 of Tensorflow, both linux and mac binaries are named libtensorflow.so and libtensorflow_framework.so.

However, in v1.14.0 the naming scheme and archive contents have changed as follows:

  • Linux
    • libtensorflow.so.1.14.0
    • libtensorflow.so.1
    • libtensorflow.so
    • libtensorflow_framework.so.1.14.0
    • libtensorflow_framework.so.1
    • libtensorflow_framework.so
  • Mac
    • libtensorflow.1.14.0.dylib
    • libtensorflow.1.dylib
    • libtensorflow.dylib
    • libtensorflow_framework.1.14.0.dylib
    • libtensorflow_framework.1.dylib
    • libtensorflow_framework.dylib

In the list above, the files without a version number, sym-link to the files with a version. The fix has been complicated by the linux versions not using the .so suffix for their files and using the version number instead.

If we only copy the files with version numbers, the tests fail because they are looking for a file with the standard extension. If we copy only the files with standard extension, the tests fail because the libtensorflow.so file attempts to load the libtensorflow_framework.so.1 file internally and doesn't find it. So we have to copy all the files.

And further, all these files have to be copied to the directories that the tests run from so that they can find them locally.

The ideal fix is to extract the files from the archive and just add that directory to the system's library search path. This is easier on Linux and Mac but a bit more complicated on Windows. I am stopping this here, because the current proposal is to migrate tensorflow redist package to the TensorFlow.NET repo.

<TensorFlowConfig Include="darwin" FileExtension=".tar.gz" FilesFromArchive="lib\libtensorflow.so;lib\libtensorflow_framework.so;include\tensorflow\c\LICENSE" Runtime="osx-x64" />

<AdditionalDownloadFile Include="https://raw.githubusercontent.com/tensorflow/tensorflow/master/LICENSE" DownloadFile="$(IntermediateOutputPath)LICENSE" />
<TensorFlowConfig Include="linux" FileExtension=".tar.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an FYI -

The tests in the repo do not run against the produced NuGet packages. Instead, you need to manually test that the NuGet packages are created correctly.

Can you build the packages locally, and then manually smoke test them to ensure they are created correctly?

@@ -18,9 +18,13 @@
<ItemGroup>
<NativeAssemblyReference>
<FullAssemblyPath>$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity)$(LibExtension)</FullAssemblyPath>
<!-- Tensorflow has a different naming scheme for v1.14.0. Those binaries need to be copied along with the standard names -->
<AssemblyPathWithMajorVersion Condition="'$(OS)' == 'Windows_NT'">$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity)$(LibExtension)</AssemblyPathWithMajorVersion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different than the FullAssemblyPath above it? It looks like we'd be trying to copy the same file twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for Windows, it ends up copying the same file twice. Without that line, I get an error in Windows builds that AssemblyPathMajorVersion is undefined.
I guess the other way to fix this is to make the optional copy conditional on the OS being linux or mac. Let me try that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, you can try adding another section:

      <NativeAssemblyReference Condition="'%(NativeAssemblyReference.MajorVersion)' != ''">
        <!-- Tensorflow has a different naming scheme for v1.14.0. Those binaries need to be copied along with the standard names -->
        <AssemblyPathWithMajorVersion Condition="'$(OS)' != 'Windows_NT'">$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity)$(LibExtension).%(NativeAssemblyReference.MajorVersion)</AssemblyPathWithMajorVersion>
        <AssemblyPathWithMajorVersion Condition="$([MSBuild]::IsOSPlatform('osx'))">$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity).%(NativeAssemblyReference.MajorVersion)$(LibExtension)</AssemblyPathWithMajorVersion>
      </NativeAssemblyReference>

And then lower in your copy statement:

 Condition="'%(AssemblyPathWithMajorVersion)' != '' AND Exists('%(AssemblyPathWithMajorVersion)')">

@harishsk harishsk requested a review from artidoro July 3, 2019 18:01
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to test the produced NuGet packages separately to ensure they still work correctly. Just a simple smoke test that loads up a TF model and executes it should suffice.

@harishsk harishsk merged commit 78bfecb into dotnet:master Jul 8, 2019
@harishsk
Copy link
Contributor Author

harishsk commented Jul 8, 2019

I have tested the Nuget packages separately by having the dotnet/machinelearning samples reference the locally built nuget packages.

@harishsk harishsk deleted the TFVersionUpdate branch July 8, 2019 18:26
codemzs pushed a commit that referenced this pull request Jul 8, 2019
* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Updated Tensorflow to version 1.14

* Removed rename rules for Mac

* Updated names of files for Mac

* Attempting to fix test errors on Mac

* Attempting to fix Mac test errors

* Changed list of files to copy from archive for Linux and Mac

* Fixed CopyNativeAssemblies to copy the tensorflow binaries that have version number appended

* Fixed windows build

* Another attempt at fixing both Windows and Linux builds

* Changed copying version-suffixed binaries to occur only for non-Windows platforms

* Changed copying of versioned assemblies to use HasMetadata
Dmitry-A pushed a commit to Dmitry-A/machinelearning that referenced this pull request Jul 24, 2019
* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Updated Tensorflow to version 1.14

* Removed rename rules for Mac

* Updated names of files for Mac

* Attempting to fix test errors on Mac

* Attempting to fix Mac test errors

* Changed list of files to copy from archive for Linux and Mac

* Fixed CopyNativeAssemblies to copy the tensorflow binaries that have version number appended

* Fixed windows build

* Another attempt at fixing both Windows and Linux builds

* Changed copying version-suffixed binaries to occur only for non-Windows platforms

* Changed copying of versioned assemblies to use HasMetadata
@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Version mismatch for Microsoft.ML.TensorFlow.Redist
3 participants