New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Agile Machine Learning API #224

Merged

jaketf merged 7 commits into GoogleCloudPlatform:master from luotigerlsx:master

Apr 5, 2019

Member

luotigerlsx commented Mar 29, 2019

This API has been created to make it easier for non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.

In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.

The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model.

Functionalities of the API

All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge.

To train various ML models on GCP CMLE
To deploy the trained model on GCP CMLE
To predict results of the deployed model both on a batch or on a single datapoints using GCP CMLE
To visualize the predicted results using LIME functionality


          add swift machine learning api

13acdee

googlebot added the cla: yes label

yiliangZhao approved these changes

View reviewed changes

yiliangZhao approved these changes

View reviewed changes

README.md Show resolved Hide resolved

jaketf suggested changes

View reviewed changes

tools/swift-machine-learning-api/README.md Outdated



		Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for
		non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.

jaketf Apr 3, 2019

Let's clean this up into a bulleted list. For example:

The API takes care of:
- The basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models.
- Deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model.
- Versioning of training runs on the same data.
- Prediction based on a saved model.

Member Author

luotigerlsx Apr 5, 2019

Modified as suggested.

tools/swift-machine-learning-api/README.md Outdated

+              Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for
+              non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.
+              In training, the API first clean the data and then calculate all the relevant features that will be  used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training,  all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.

jaketf Apr 3, 2019

clean -> cleans
calculate -> calculates

Member Author

luotigerlsx Apr 5, 2019

Modified as suggested.

tools/swift-machine-learning-api/README.md Outdated


		In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.

		The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model.

jaketf Apr 3, 2019

Add a space after period in first sentence.

Member Author

luotigerlsx Apr 5, 2019

Modified as suggested.

tools/swift-machine-learning-api/README.md Outdated

+              ## Functionalities of the API
+              All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge.
+               - To train various ML models on GCP CMLE

jaketf Apr 3, 2019

Remove leading "To" from each of these bullets.

Member Author

luotigerlsx Apr 5, 2019

Modified as suggested.

tools/swift-machine-learning-api/README.md Outdated

+                  |__ lime_utils.py  (utils functions for LIME functionalities)
+                  |__ update.sh  (bash script for updating trainer package)
+                  |__ config
+                      |__ train.yaml  (yaml file for train configurations)

jaketf Apr 3, 2019

remove "yaml" from parenthetical descriptions this is implied from file extension.

Member Author

luotigerlsx Apr 5, 2019

Modified as suggested.

tools/swift-machine-learning-api/codes/trainer/__init__.py Outdated

		@@ -0,0 +1 @@
		"""Empty Initialization file"""

jaketf Apr 4, 2019

unnecessary docstring remove to make this an empty init.py .

Member Author

luotigerlsx Apr 5, 2019

Modified.

tools/swift-machine-learning-api/codes/trainer/input_pipeline_dask.py Outdated

+                      (say csv, json etc) and other variables.
+                      Arguments:
+                              csv_path : string, Path of the csv files whether local or on remote storage

jaketf Apr 4, 2019

please run pylint and respect 80 char line length.

Member Author

luotigerlsx Apr 5, 2019

Modified.

tools/swift-machine-learning-api/codes/trainer/input_pipeline_dask.py Outdated

+                      if self.gcs_path:
+                          if isinstance(self.csv_path, list):
+                              for index, path in enumerate(self.csv_path):
+                                  bucket = path.split('//')[1].split('/')[0]

jaketf Apr 4, 2019

refactor this to use regex matching, it is much more readable.

Member Author

luotigerlsx Apr 5, 2019

Modified to use urlparse.

tools/swift-machine-learning-api/codes/trainer/launch_demo.py Outdated

+              if __name__ == '__main__':
+                  parser = argparse.ArgumentParser()
+                  parser.add_argument(

jaketf Apr 4, 2019

This is a lot of parameters. Consider refactoring this into a config so CLI invocation would be just passing a reference to the config. This will make it easier for the user to have consistent invocations.

Member Author

luotigerlsx Apr 5, 2019

The launch_demo is not supposed by user directly. It is actually trigged by a post request with parameter passed in the body. Therefore, CLI invocation is not needed.

tools/swift-machine-learning-api/codes/trainer/utils/__init__.py Outdated

		@@ -0,0 +1 @@
		"""Empty Initialization file"""

jaketf Apr 4, 2019

remove this unnecessary docstring. An empty init.py will suffice.

Member Author

luotigerlsx Apr 5, 2019

Modified.

jaketf commented Apr 4, 2019

Please rename this to avoid confusion with Swift programming language.

Upgrading to python 3 is a nice to have.

luotigerlsx added 5 commits

April 5, 2019 11:10


          Modify readme as recommended

76466e7


          style modification

d43365a


          use urlparse to parse bucket name and path

0c984ca


          Modify a typo in readme

0fa1bee


          change name to agile machine learning API

8da7b8c

Member Author

luotigerlsx commented Apr 5, 2019

Hi Jacob, I have done the necessary modification to address your comments. Please take a look. Thanks.

luotigerlsx changed the title ~~Add Swift Machine Learning API~~ Add Agile Machine Learning API

jaketf approved these changes

View reviewed changes


          Merge branch 'master' into master

06eaa61

jaketf commented Apr 5, 2019

LGTM.

jaketf merged commit ff75daf into GoogleCloudPlatform:master

monobinab pushed a commit to monobinab/professional-services that referenced this pull request


          Add Agile Machine Learning API (GoogleCloudPlatform#224)

9405d73

* add swift machine learning api

* Modify readme as recommended

* style modification

* use urlparse to parse bucket name and path

* Modify a typo in readme

* change name to agile machine learning API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels