-
Notifications
You must be signed in to change notification settings - Fork 74.7k
Exploring a baseline Action build #48421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As expected we had a Github Action timeout on the TensorFlow build step after Github Actions are currently running on a Standard_DS2_v2 machine. As we already know this is really a bottleneck for an average TF external (Episodic or not) contributor as we ask to reproduce these steps on its own local machine just for preparing an occasional code PR. I think that it is important to continuously monitor this Action over time to expect that we could execute it in the expected time that it seems to us reasonable for an Episodic/Average TF contributor. Some proposed solutions to enable this action in order of preference:
|
Just in the case we want to explore the first option with the self hosted |
There is also a Terraform Github Self Hosted Runners on GKE repo maintained by Google Cloud members (/cc @bharathkkb) at https://github.com/terraform-google-modules/terraform-google-github-actions-runners |
/cc @perfinion If we can do some steps together on this. |
Update: We discussed a pilot plan with @perfinion yesterday on SIG-Build Gitter. |
I would add one more difficulty: Even with local cache, it seems to be invalidated each time I pull the commits from upstream. ( I think LLVM-related commits like 17e6dc2 are the culprits ). |
What cache command are you using? |
I am using |
I found that in #40505 (comment), @mihaimaruseac said the same thing. Do you have any problem regarding this issue @bhack ? |
We are waiting to have a bootstrapped GCS cache for this action produced with a fresh master build in |
If the llvm sync will totally invalidate the remote bazel cache we cannot use Github Action but we need to use self hosted github Actions as suggested in #48421 (comment). |
@bhack This PR is in draft, any update on this? Please. Thanks! |
@gbaned It is a draft cause as you can see the introduced action go in Timeout on Github. |
Just for reference, it is going in timeout on this kind of HW resources: |
1b12917
to
3175de1
Compare
Closing this for #57630 |
With this I want to explore a new testing baseline with Github Action and our official CPU
tensorflow/tensorflow:devel
image.The idea is to test in the CI the (more or less) Episodic contributor journey to contribute code to Tensorflow at least on CPU.
This is the proposed list of steps:
tensorflow/tensorflow:devel
image rebuid build (or Dockerhub pull?)ci_sanity.sh
selected steps (--pylint, -- see Supersed pylint_allowlist #48294)./configure
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
bazel test //tensorflow/
As the average user it is already experiencing, this will probably require a bazel cache (on GCS like for TF/IO?) to achieve reasonable compilation times.
I think that reproducibility and the timing of these build steps will let us to monitor the experience of a Tensorflow episodic contribution.
/cc @angerson @mihaimaruseac @theadactyl @joanafilipa