Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The build system could use some ❤️ #717

Open
aaronmondal opened this issue Sep 23, 2021 · 4 comments
Open

The build system could use some ❤️ #717

aaronmondal opened this issue Sep 23, 2021 · 4 comments

Comments

@aaronmondal
Copy link
Contributor

aaronmondal commented Sep 23, 2021

Ok so I just built TFT from source and found some potential issues.


The currently recommended run_build.sh casually runs a configure.sh script, that could be run manually beforehand similar to how the main TF code does is with ./configure. Maybe it would be nice to change the recommended way of building TFT to something like the snippet below?

./configure
bazel build [--config=option] //tensorflow_text/tools/pip_package:build_pip_package

I feel like that would take away some of the complexity of the build process but I may be wrong here oO.


There are currently open pull requests open (#680 and #505) fixing the bazel version. Without these PRs merged the repo will not build via bazelisk (or a bazel other than than 3.7.2, probably).


  • Many of the kernels will not build because of a missing @com_google_absl//absl/container:flat_hash_map dependency. The fix is either adding it to all failing kernels manually or adding it to the tf_cc_library rule, e.g. in oss_deps and removing it from //tensorflow_text_core/kernels:sentencepiece_kernels

Edit: Sent a PR changing this: #718


Installing the resulting pip_package does not work with TF built against head. That TF has version 2.7.0 and will trigger uninstallation of the custom version and will install TF 2.6.0. An undocumented workaround would be

IS_NIGHTLY=nightly ./bazel-bin/oss_scripts/pip_package/build_pip_package .

but even then the tft.__version__ is 2.6.0 (while the tf version is 2.7.0 on custom builds). Probably takes some work to change this behaviour though 😞 .


Before sending a PR I would like to know whether anyone else had the same issues, or if the above mentioned problems do not occur generally, in which case the problem would lie with my own setup.


As a sidenote: Are all the # tf:lib tensorflow dep lines in the BUILD files still required? It seems to me that some preprocessor would put something there, but I could not find such a parser anywhere and the build ran fine when I randomly removed those lines in random places 🤣

@broken
Copy link
Member

broken commented Sep 23, 2021

Thanks for the feedback! I'll try to respond to each individually.


Regarding run_build.sh, the thought here was that one command would be easier than three, but I can see the value in having it more explicit and more closely match that of TF, even if it is more steps. I'll run it by some folks.


The expectation here has been that in the build steps, the first step in "Building TF", you would need to set up your environment to be able to build with it. In this way, whatever Bazel version TF defined at the time (along with other environment requirements that may change over time), we would work with them also. It was an attempt to remove some of the cognitive load of setup step maintenance, but doesn't appear to have worked so well.

It looks like the import process for #680 failed, so we weren't notified about the PR. I'll get somebody to look into the failure, make sure there aren't any other similar PRs lying about, and find a way to prevent this in the future. Regarding #505, it looks like this was dropped by the author (so I just closed it).


The oss_deps is an odd thing. Core changes in TF sometimes adds deps like this to all of our ops. I've tracked down the TF changes before, and don't see any missing dependency declarations in the original change. Recently, we noticed another group of deps broke our builds, but it was subsequently fixed before we updated the oss_deps with it. I've been meaning to look at this "fix" on the TF side to give a clue to what is actually going on, but I've been out for a couple weeks. Could your build have happened with a version of TF in this timeframe (sep 4 - 7)?


Great point regarding the version numbers. Nobody has pointed this out before, and since as you noticed nightly removes the version restriction, it hasn't ever come up. I'll ask that these get updated to the next version whenever the previous version's branch is cut or stable version is released. We'll try to match TF more closely.


I feel there are less people building once we got the nightly builds available, but these have all been valid points.


Those comments are for some internal builds when we build directly into TF. This allows for the removal of the TF library as a dependency. None of our package builds use it though, so they can safely be ignored.

@aaronmondal
Copy link
Contributor Author

@broken Thank you for your quick and elaborate response!

Indeed the TF was built at commit hash 194212937adfec263f2eee9c8459fab52b640cd6 which was apparently 16 days ago (Sep 7ish?). I will rebuild from master tomorrow and let you know what I find. Maybe the oss_deps issue magically resolved itself ✨

@aaronmondal
Copy link
Contributor Author

Ok I just rebuilt TF from master.

  • When using a TF wheel built via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package --nightly_flag /tmp/tensorflow_pkg

the TFT build will fail. The issue seems to be with

def _norm_path(path):
"""Returns a path with '/' and remove the trailing slash."""
path = path.replace("\\", "/")
if path[-1] == "/":
path = path[:-1]
return path

not being able to handle empty paths. I tried changing that but got different (OOM) errors. May be a bazel bug. Not worth to fixing as upgrading bazel may fix that with magically anyways.

  • Using a non-nightly wheel (same TF build, just tagged differently) build via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

lets the TFT build succeed, but one has to manually call

IS_NIGHTLY=nightly ./oss_scripts/configure.sh

in TFT to ensure the creation of a tensorflow_text_nightly-REPLACE_ME-cp39-cp39-linux_x86_64.whl that works with the custom build. Ommitting the IS_NIGHTLY=nightly will create a tensorflow_text-2.6.0-cp39-cp39-linux_x86_64.whl regardless of build-TF version. This wrongly tagged wheel will autoinstall tensorflow==2.6.0.


To summarize, for anyone interested in running a TFF build against a custom TF (only on linux, tested with CUDA support and clang as CUDA compiler for TF, but I think the TFT toolchain defaulted back to gcc(and nvcc?)).

# Build tensorflow
./configure
bazel build //tensorflow/tools/pip_package:build_pip_package

# Tag the wheel as non-nightly, i.e. omit the --nightly_flag.
# This way the installed tensorflow will not be called tf-nightly.
# TFT seems to struggle with the tf-nighty name.
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# > install the tensorflow wheel. It should show up as `tensorflow` in pip. NOT as tf-nightly.

# Build TFT via
IS_NIGHTLY=nightly ./oss_scripts/run_build.sh

# Optionally, test whether you have all the dependencies required by TFT.
# See issue #719 for what should pass/fail at the time of writing.
bazel test --test_output=errors --keep_going --jobs=1 tensorflow_text:all

# > install the tensorflow_text_nightly wheel.

# > You should now have tensorflow and tensorflow_text_nightly in your pip list.

@broken Would you like me to send a PR with something similar to the above as documentation? Renaming a nightly wheel may also work but all of these solutions feel kinda hacky. I tried building TFT against the official tf-nightly binary distro as well but that does not work either because of the wrong name.

As you suspected the absl dependency that I added in #718 is obsolete with a newer version of the TF sources. Closing that PR.

@broken
Copy link
Member

broken commented Sep 24, 2021

The required version of TF is set by setup.py. We don't have this set for the nightly version of the package since we assume anybody using nightly accepts risks of using it against whatever version of TF they have installed. While this is important to have set for the released pip packages, it probably makes sense to not have it set for manual builds.

If we kept the run_build.sh, this could clear out the value before building, or we could just point it out if the user wants to build against a different version of TF. I would prefer one of these options over building as nightly since there can be pitfalls with it - like the version being set to REPLACE_ME.


I'm surprised you received an error when building against a nightly custom build of TF. We do this daily for our nightly automated builds so I wouldn't expect you to have a problem. What is the error you receive?


We'll happily accept PRs to upgrade our build documentation! I have just two initial comments:

  • There is a lot of setup to the TF install that is omitted here. I think linking to their documentation would be better so new users don't skip those steps and we don't need to stay on top of any changes they make to their documentation.
  • Discussed above, but IS_NIGHTLY env var shouldn't be a requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants