adding readme details

CNuge · Mar 13, 2020 · 547932c · 547932c
1 parent 06dc708
commit 547932c
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@ Alfie is an alignment-free, kingdom level taxonomic classifier for DNA barcode d
 
 Alfie can be deployed from the command line for rapid file-to-file classification of sequences. This is an effective means of separating contaminant sequences in a DNA metabarcoding or environmental DNA data set from sequences of interest. 
 
-For increased control, alfie can also be deployed as a module from within Python. The alfie package also contains functions that can aid a user in the training and application of a custom alignment-free classifier, which allows the program to be applied to different DNA barcodes (or genes), as a binary classifier, or on different taxonomic levels. 
+For increased control, alfie can also be deployed as a module from within Python. The alfie package also contains functions that can [aid a user in the training and application of a custom alignment-free classifier](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb), which allows the program to be applied to different DNA barcodes (or genes), as a binary classifier, or on different taxonomic levels. 
 
 
 ## Installation
@@ -53,7 +53,7 @@ For very large files (order of millions), the input sequence file may need to be
 alfie -f alfie/data/example_data.fastq -b 100
 ```
 
-By default, alignment free classification is performed using the default feature set (4mer frequencies) and the corresponding pre-trained neural network (trained on `COI-5P` sequence fragments of varying lengths). A user can pass an alternative neural network to make predictions using the `-m` flag. If this option is exercised and the model has not been trained on 4mers, then the `-k` flag must be used to ensure the proper set of kmer features are generated to match the neural network input structure (see the [example notebook](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb) for more info on making and using custom neural networks with alfie).
+By default, alignment free classification is performed using the default feature set (4mer frequencies) and the corresponding pre-trained neural network (trained on `COI-5P` sequence fragments of varying lengths). A user can pass an alternative machine learning model (neural network or other algorithms permitted) to make predictions using the `-m` flag. If this option is exercised and the model has not been trained on 4mers, then the `-k` flag must be used to ensure the proper set of kmer features are generated to match the neural network input structure (see the [example notebook](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb) for more info on making and using custom neural networks with alfie).
 
 ```
 #example using the 6mer model that ships with alfie, note the -k 6 option is required

diff --git a/example/non_neural_net_model_example.py b/example/non_neural_net_model_example.py
@@ -3,9 +3,8 @@
 
 Here this is demonstrated by training a support vector machine 
 on the same set of data used in the example jupyter notebook (part 2),
-
+and then simulating deployment on a new dataset
 """
-
 import numpy as np
 import pandas as pd
 
@@ -23,7 +22,7 @@
 data = pd.read_csv('alfie_small_train_example.tsv', sep = '\t')
 
 #####
-# conduct the train test split
+# conduct the train test split with alfie
 #####
 train, test = stratified_taxon_split(data, class_col = 'class', test_size = 0.3, )
 
@@ -46,15 +45,14 @@
 y_train = tax_encoder.fit_transform(y_train_raw)
 y_test = tax_encoder.transform(y_test_raw)
 
-
+#this reshape interfaces with the LinerSVC input shape requirements
 y_train = np.reshape(y_train, len(y_train))
 y_test = np.reshape(y_test, len(y_test))
 
 
 #####
-# train a demo SVM model
+# train a custom SVM model
 #####
-
 svm_params = {'C': 100.0, 'loss': 'squared_hinge', 'max_iter': 10000}
 
 svm_ann_demo = LinearSVC(**svm_params) 
@@ -64,7 +62,6 @@
 #####
 # generate simulated fasta input from the test data
 #####
-
 #new list of dictionaries
 test_simulated_fasta = []
 
@@ -81,10 +78,11 @@
 
 
 #####
-# use the model to make predictions via the classify records function
+# deploy the custom model making predictions via aflie's classify_records function
 #####
+#note the kmer data are generated and predictions made all in the one function call.
 
-# note the parameter argmax = False, this is passed because the LinearSVC already
+# note the parameter argmax = False is passed because the LinearSVC already
 # outputs non-one hot encoded outputs.
 test_out, test_predictions = classify_records(test_simulated_fasta, 
 												model = svm_ann_demo, argmax = False)