Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Agile Machine Learning API #224

Merged
merged 7 commits into from
Apr 5, 2019
Merged

Add Agile Machine Learning API #224

merged 7 commits into from
Apr 5, 2019

Conversation

luotigerlsx
Copy link
Member

This API has been created to make it easier for non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.

In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.

The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model.

Functionalities of the API

All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge.

To train various ML models on GCP CMLE
To deploy the trained model on GCP CMLE
To predict results of the deployed model both on a batch or on a single datapoints using GCP CMLE
To visualize the predicted results using LIME functionality

@googlebot googlebot added the cla: yes All committers have signed a CLA label Mar 29, 2019


Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for
non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean this up into a bulleted list. For example:

The API takes care of:
- The basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models.
- Deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model.
- Versioning of training runs on the same data.
- Prediction based on a saved model. 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for
non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.

In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean -> cleans
calculate -> calculates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.


In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.

The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after period in first sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

## Functionalities of the API

All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge.
- To train various ML models on GCP CMLE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove leading "To" from each of these bullets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

|__ lime_utils.py (utils functions for LIME functionalities)
|__ update.sh (bash script for updating trainer package)
|__ config
|__ train.yaml (yaml file for train configurations)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "yaml" from parenthetical descriptions this is implied from file extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

@@ -0,0 +1 @@
"""Empty Initialization file"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary docstring remove to make this an empty init.py .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

(say csv, json etc) and other variables.

Arguments:
csv_path : string, Path of the csv files whether local or on remote storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run pylint and respect 80 char line length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

if self.gcs_path:
if isinstance(self.csv_path, list):
for index, path in enumerate(self.csv_path):
bucket = path.split('//')[1].split('/')[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor this to use regex matching, it is much more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to use urlparse.

if __name__ == '__main__':
parser = argparse.ArgumentParser()

parser.add_argument(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of parameters. Consider refactoring this into a config so CLI invocation would be just passing a reference to the config. This will make it easier for the user to have consistent invocations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The launch_demo is not supposed by user directly. It is actually trigged by a post request with parameter passed in the body. Therefore, CLI invocation is not needed.

@@ -0,0 +1 @@
"""Empty Initialization file"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this unnecessary docstring. An empty init.py will suffice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@jaketf
Copy link

jaketf commented Apr 4, 2019

Please rename this to avoid confusion with Swift programming language.

Upgrading to python 3 is a nice to have.

@luotigerlsx
Copy link
Member Author

Hi Jacob, I have done the necessary modification to address your comments. Please take a look. Thanks.

@luotigerlsx luotigerlsx changed the title Add Swift Machine Learning API Add Agile Machine Learning API Apr 5, 2019
@jaketf
Copy link

jaketf commented Apr 5, 2019

LGTM.

@jaketf jaketf merged commit ff75daf into GoogleCloudPlatform:master Apr 5, 2019
monobinab pushed a commit to monobinab/professional-services that referenced this pull request Sep 18, 2019
* add swift machine learning api

* Modify readme as recommended

* style modification

* use urlparse to parse bucket name and path

* Modify a typo in readme

* change name to agile machine learning API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes All committers have signed a CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants