-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Agile Machine Learning API #224
Conversation
|
||
|
||
Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for | ||
non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean this up into a bulleted list. For example:
The API takes care of:
- The basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models.
- Deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model.
- Versioning of training runs on the same data.
- Prediction based on a saved model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggested.
Machine Learning is a vast subject and to use a basic model of linear or logistic regression can be a difficult task for non-developers. This API has been created to make it easier for | ||
non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model. | ||
|
||
In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean -> cleans
calculate -> calculates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggested.
|
||
In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard. | ||
|
||
The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space after period in first sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggested.
## Functionalities of the API | ||
|
||
All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge. | ||
- To train various ML models on GCP CMLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove leading "To" from each of these bullets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggested.
|__ lime_utils.py (utils functions for LIME functionalities) | ||
|__ update.sh (bash script for updating trainer package) | ||
|__ config | ||
|__ train.yaml (yaml file for train configurations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "yaml" from parenthetical descriptions this is implied from file extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggested.
@@ -0,0 +1 @@ | |||
"""Empty Initialization file""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary docstring remove to make this an empty init.py .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
(say csv, json etc) and other variables. | ||
|
||
Arguments: | ||
csv_path : string, Path of the csv files whether local or on remote storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run pylint and respect 80 char line length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
if self.gcs_path: | ||
if isinstance(self.csv_path, list): | ||
for index, path in enumerate(self.csv_path): | ||
bucket = path.split('//')[1].split('/')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor this to use regex matching, it is much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to use urlparse.
if __name__ == '__main__': | ||
parser = argparse.ArgumentParser() | ||
|
||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of parameters. Consider refactoring this into a config so CLI invocation would be just passing a reference to the config. This will make it easier for the user to have consistent invocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The launch_demo is not supposed by user directly. It is actually trigged by a post request with parameter passed in the body. Therefore, CLI invocation is not needed.
@@ -0,0 +1 @@ | |||
"""Empty Initialization file""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this unnecessary docstring. An empty init.py will suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
Please rename this to avoid confusion with Swift programming language. Upgrading to python 3 is a nice to have. |
Hi Jacob, I have done the necessary modification to address your comments. Please take a look. Thanks. |
LGTM. |
* add swift machine learning api * Modify readme as recommended * style modification * use urlparse to parse bucket name and path * Modify a typo in readme * change name to agile machine learning API
This API has been created to make it easier for non-developers to use machine learning. The API takes care of all the basic and important aspects of machine learning so that the accuracy of the model can be improved from the baseline models. The API also takes care of the deployment of the trained models and model versioning so that users upon getting new data can choose to tweak some part of data and retrain the model. The API takes care of all the versions of the training on the same data. The API also provides you with a feature to do prediction on your saved model.
In training, the API first clean the data and then calculate all the relevant features that will be used in the training. This training data is then saved into temporary files on the disk. These temporary files are then fed into the tensorflow dataset API which makes the input function of the tensorflow estimators which are divided into two parts, custom and canned based on the choice of model users want to run on the data. In the process of training, all the metrics are called calculated and shown in the logs. These logs when stored as checkpoint can be used to preview the model in Tensorboard.
The trained model are saved into different versions based on the name you give to these versions.Version control is important as there will be frequent data changes or hyperparameters updates, which will then create a different model.
Functionalities of the API
All the below functionalities can be used with a simple post request to the respective APIs with very basic background knowledge.
To train various ML models on GCP CMLE
To deploy the trained model on GCP CMLE
To predict results of the deployed model both on a batch or on a single datapoints using GCP CMLE
To visualize the predicted results using LIME functionality