Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
kushagragpt99 authored Mar 15, 2018
1 parent 9c13317 commit 3371968
Showing 1 changed file with 17 additions and 1 deletion.
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,17 @@
#Ripe
# RIPE CUCUMBERS ML PICKLE RECIPE

This repository contains the web app of Ripe Cucumbers, an app which rates movies that are about to be released, i.e. it looks into the future. It reads the data given as input and predicts the probability of the film bein 'Hit', 'Average' or 'Flop'. It uses the tools offered by Microsoft such as AzureML studio and prose to make the life of developers easier.

# explanation of the ML algo

We import data as a CSV(comma seperated file) from the source-code of the website {http://home.iitk.ac.in/~kushgpt/} using the import data feature. The data is then processed to form a proper dataset using 'Convert to Dataset' feature. The 'Edit Metadata' feature enables us to transform the data as per our requirements. For this app, we have converted the release month into string from numbers. We have also converted other text inputs such as keywords1, keywords2, actor1, actor2 etc into string inputs (categorical input). One recurring problem of scrapping data from various sources is that often some columns remain empty. This problem is solved by 'Clean Missing Data', a feature which enables us to substitute missing values by either '0' or by statistical outputs like mean, variance of the non-empty data in that column. 'Select Column in dataset' enables us to exclude certain features that we believe are not that important for predicting the outcome of the movie such as lets say original lanuage of the movie. An important tool for any ML algo is normalization, which scales down the values in a column so that all columns have roughly the same spread, therefore the training output not being biased towards columns with bigger values.
Hyper-parameters are the parameters that are given to the program pre-execution, which affect the learning of the model he module builds and tests models multiple models, using different combinations of settings, and compares metrics over all models to get the combination of settings.The terms parameter and hyperparameter can be confusing. The model's parameters are what you set in the properties pane. Basically, this module performs a parameter sweep over the specified parameter settings, and learns an optimal set of hyperparameters, which might be different for each specific decision tree, dataset, or regression method. The process of finding the optimal configuration is sometimes called tuning. Hyper-parameter tuning using something called parameter-sweep. When you set up a parameter sweep, you define the scope of your search, to use either a finite number of parameters selected randomly, or an exhaustive search over a parameter space you define. Accuracy is used as a parametric for tuning.
What makes hyper-parameter tuning possible is partition and sample. Sampling is an important tool in machine learning because it lets you reduce the size of a dataset while maintaining the same ratio of values. It works by dividing your data into multiple subsections of the same size, separating data into groups and then working with data from a specific group., sampling (You can extract a percentage of the data, apply random sampling, or choose a column to use for balancing the dataset and perform stratified sampling on its values) and finally creating a smaller dataset for testing. Therefore, partition and sample assisted hyper parameter tuning ensure the best accuracy out of any optimization algo. Score model scores the algo on testing data by selecting a column to be treated as the label for train model. After the training is finished, the model is stored as a trained model, which is then directly used after importing and pre-processing input data. But this time around, the label column is also excluded. Input of webservice is given to score model and the output to another column selector, which returns the columns that you need in output.

[api url]{https://ussouthcentral.services.azureml.net/workspaces/8d9305016d584c0e8502a915a50de700/services/92e9367b9263492eaf12479462cb1a47/execute?api-version=2.0&details=true}

[Request Response API Documentation for code.fun.do ripe cucumbers]{https://studio.azureml.net/apihelp/workspaces/8d9305016d584c0e8502a915a50de700/webservices/f2acd7929ae34cbe852e3069f6abf5eb/endpoints/b53afeb937b0492bbf07df4b66b1ff46/score}




0 comments on commit 3371968

Please sign in to comment.