Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

120 add use examples efficientnet fine tuning on cifar 100 #123

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

sashakolpakov
Copy link
Collaborator

Added EfficientNet (v.2, small model) fine-tuning on Cifar-100 using Cerebros / ipynb notebook / py code

Added EfficientNet (v.2, small model) fine-tuning on Cifar-100 using Cerebros / ipynb notebook / py code
@sashakolpakov sashakolpakov added kind/documentation Improvements or additions to documentation status/ready-pending-tests Ready to make pull request once tests pass. triage/high-priority triage/required kind/usability audience/technical Issue primarily for technical review and service. labels Oct 31, 2023
@sashakolpakov sashakolpakov linked an issue Oct 31, 2023 that may be closed by this pull request
@david-thrower
Copy link
Owner

I added a CICD test for this benchmark. Let's pray that this will run on the Github test server in a workable time. If not, we may need to make a miniaturized version of it for the CICD demos. https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123/files#diff-cc8c65daed8907e6bb50ac1769d49c05f5f48bdbe8b5cfd3b24b7c5e56ceb8dc

@sashakolpakov
Copy link
Collaborator Author

sashakolpakov commented Oct 31, 2023 via email

@sashakolpakov
Copy link
Collaborator Author

sashakolpakov commented Oct 31, 2023 via email

@david-thrower
Copy link
Owner

Here is what I did: I added the tensorflow datasets to a separate requirements file, which I should also do later on with tensorflow-text and other ancillary requirements ... I want to avoid bloating the core and separating the use case specific packages from the core packages.

@sashakolpakov
Copy link
Collaborator Author

sashakolpakov commented Oct 31, 2023 via email

@david-thrower
Copy link
Owner

To reply to your question: "Also complains no CUDA. Please instruct the course of action."
- This should be only a warning, which stf will throw whenever it is running on a CPU only machine.
- By default, it will JIT compile (except on text classification). This will speed it up on CPUs almost as much as an inexpensive GPU will. This will leverage the XLA which is basically a technology in the CPU that allows tandem linear algebra operations to complete in 1 step (basically an arrangement of transistors such that a multiply-add operation is done with one pulse of current as a single register taking both add and multiply operands concurrently).

https://www.tensorflow.org/xla

Since we are poor, this approach is preferable to GPUs anyway.

https://keras.io/api/models/model_training_apis/

jit_compile: If True, compile the model training step with [XLA](https://www.tensorflow.org/xla). XLA is an optimizing compiler for machine learning. jit_compile is not enabled for by default. Note that jit_compile=True may not necessarily work for all models. For more information on supported operations please refer to the [XLA documentation](https://www.tensorflow.org/xla). Also refer to [known XLA issues](https://www.tensorflow.org/xla/known_issues) for more details.

@david-thrower
Copy link
Owner

@sashakolpakov

No problem, I was loading everything into the same requirements.txt as well. This commit just happened to be the one where I caught on to the fact that I need to stop adding more and more to it. Once I package this and put it on PyPi, I think requirements will install automatically with a pip install, so for that reason, I need to separate it ... and need to package this for pypi ...

There was added a non-PEFT retraining for comparison. This has no bearing on the Cerebros efficiency.
@sashakolpakov
Copy link
Collaborator Author

sashakolpakov commented Nov 1, 2023 via email

@david-thrower
Copy link
Owner

@sashakolpakov , This is what I had to do on the efficientnet cifar10 example. For showcase examples, definitely full - scale is awesome, but for the CICD tests, the test must complete in a timeframe that fits.

What I think a good solution to this problem is that I should make an environment variable like CICD_TEST, then make all the Python scripts look for this, but default to False if the variable does not exist.

If the execution environment the script runs in has the environment variable CICD_TEST set to true, then a small subset of the data is run in the training jobs. If the variable is set absent or set to false, then the full data set runs.

@david-thrower
Copy link
Owner

Approved to merge this use case in, but given the scale of compute required, it may be infeasible to have as a routine CICD test for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audience/technical Issue primarily for technical review and service. kind/documentation Improvements or additions to documentation kind/usability status/ready-pending-tests Ready to make pull request once tests pass. triage/high-priority triage/required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add use examples: EfficientNet fine-tuning on Cifar-100
2 participants