Skip to content

Comments

Add converter script to keras#567

Merged
s314cy merged 5 commits intodevelopfrom
480-DeepBreathConversion
Mar 23, 2023
Merged

Add converter script to keras#567
s314cy merged 5 commits intodevelopfrom
480-DeepBreathConversion

Conversation

@walidabn
Copy link
Collaborator

@walidabn walidabn commented Feb 13, 2023

Fixes issue @480 in the following sense :

This PR is not meant to be merged on develop, but essentially brings the DeepBreath model to JS. In more details :

  • The converted DeepBreath model in JS is in /tfjs_model
  • The Tensorflow saved_model is in /saved_model
  • Three Python scripts were written, the first one called convertDB.py is a conversion from Pytorch to ONNX, a second one called onnx2TensorFlow.py is a rewritting from the ONNX format to tensorflow/TensorflowJS, but by comparing it to the original Pytorch model, some layers had to be dropped due to incompatibilites between Pytorch and ONNX. The third script is an accurate rewritting from the Pytorch model to Tensorflow/TFJS, called torchToTFModel.py. Torch is not used in it, inside we write a TF/Keras model and then save it.

A few notes : When converting from Pytorch to Tensorflow/TFJS, unless the model is trivial, we should not rely on automatic conversion tools like pytorch2keras, or onnx2keras, as these libraries are out of date and don't support many layers/essential functionalities. Redevelopping the model manually on Tensorflow is the best way to go for these kind of tasks.

@martinjaggi
Copy link
Member

in the description/readme could you clarify the 3rd way in some more details? i guess you mean to write a model in python TF/keras and then saving. this is in torchToTFModel.py while torch is not used in it, right?

can you link to our documentation somehow how to take python TF/keras models and convert it to TFjs?

how about the preprocessing pipeline in this case? i guess it remains in python but how did you abstract it?

were you able to test the resulting JS model if it does the same as the pytorch python one? maybe with and without preprocessing?

@morganridel
Copy link
Collaborator

Did the 70MB+ files "output" needed to be pushed in Git? @s314cy worked a lot to clean the Git history to get a light repo

@s314cy
Copy link
Contributor

s314cy commented Feb 20, 2023

Did the 70MB+ files "output" needed to be pushed in Git? @s314cy worked a lot to clean the Git history to get a light repo

yes, such heavy static files should not be pushed to the repo, even if not merged in dev/prod, since the repo will keep track of the files as long as they exist in some commit in some remote branch (so it won't be enough to remove them and commit)

a good solution would be for the branch to only contain the source code that generated the static model files that were pushed; the latter can be gitignored (note that logs and other situational files should not be pushed either, thus they can be gitignored as well)

technically, this means adding all the log, onnx and bin files to the gitignore, git rm them, commit, squash the git rm commit with the commit that added the files, and finally push force

@walidabn walidabn force-pushed the 480-DeepBreathConversion branch from 491040e to e6cb7df Compare February 27, 2023 11:43
@walidabn
Copy link
Collaborator Author

  • For Martin's first point, it's exactly that. I edited the message to clarify that point.

  • I added and clarified the documentation for model conversion in FAQ.md and adapted the one in TASK.md. I think it's a bit clearer now.

  • Adding some details on the preprocessing pipeline :
    The preprocessing pipeline goes as follows :
    convert the raw audio file into a text file using the python preprocessing pipeline abstracted in preprocess.py into a text file with numeric values. In order to do so, we used the following preprocessing pipeline that can be found [here] (https://github.com/epfl-iglobalhealth/DeepBreath-App-Giorgio/blob/main/models/train_models_for_mobile/train_model_disease_classification.py).

Running the script yields a .txt file containing a certain amount of floating point values, which will be a multiple of 32. In order to use the TensorflowJS model on the data, read values from the text file by loading them in a float tensor of size [32,-1].

  • Comparing performance : As weights cannot be transfered from Pytorch to TensorflowJS, assessing and comparing the performance of the two models cannot be done.

@s314cy
Copy link
Contributor

s314cy commented Mar 16, 2023

@walidabn should I merge this once the .py files are removed? or would you prefer to add the deepbreath task in this PR as well?

@s314cy s314cy force-pushed the 480-DeepBreathConversion branch from 1826b55 to 516539d Compare March 16, 2023 13:02
@s314cy s314cy force-pushed the 480-DeepBreathConversion branch from 516539d to cae03a4 Compare March 16, 2023 13:04
@s314cy s314cy added the documentation Improvements or additions to documentation label Mar 23, 2023
@s314cy s314cy merged commit 6be90bb into develop Mar 23, 2023
@s314cy s314cy deleted the 480-DeepBreathConversion branch March 23, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants