Skip to content

Commit

Permalink
adding readme details
Browse files Browse the repository at this point in the history
  • Loading branch information
CNuge committed Mar 13, 2020
1 parent 06dc708 commit 547932c
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 11 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Alfie is an alignment-free, kingdom level taxonomic classifier for DNA barcode d

Alfie can be deployed from the command line for rapid file-to-file classification of sequences. This is an effective means of separating contaminant sequences in a DNA metabarcoding or environmental DNA data set from sequences of interest.

For increased control, alfie can also be deployed as a module from within Python. The alfie package also contains functions that can aid a user in the training and application of a custom alignment-free classifier, which allows the program to be applied to different DNA barcodes (or genes), as a binary classifier, or on different taxonomic levels.
For increased control, alfie can also be deployed as a module from within Python. The alfie package also contains functions that can [aid a user in the training and application of a custom alignment-free classifier](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb), which allows the program to be applied to different DNA barcodes (or genes), as a binary classifier, or on different taxonomic levels.


## Installation
Expand Down Expand Up @@ -53,7 +53,7 @@ For very large files (order of millions), the input sequence file may need to be
alfie -f alfie/data/example_data.fastq -b 100
```

By default, alignment free classification is performed using the default feature set (4mer frequencies) and the corresponding pre-trained neural network (trained on `COI-5P` sequence fragments of varying lengths). A user can pass an alternative neural network to make predictions using the `-m` flag. If this option is exercised and the model has not been trained on 4mers, then the `-k` flag must be used to ensure the proper set of kmer features are generated to match the neural network input structure (see the [example notebook](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb) for more info on making and using custom neural networks with alfie).
By default, alignment free classification is performed using the default feature set (4mer frequencies) and the corresponding pre-trained neural network (trained on `COI-5P` sequence fragments of varying lengths). A user can pass an alternative machine learning model (neural network or other algorithms permitted) to make predictions using the `-m` flag. If this option is exercised and the model has not been trained on 4mers, then the `-k` flag must be used to ensure the proper set of kmer features are generated to match the neural network input structure (see the [example notebook](https://github.com/CNuge/alfie/blob/master/example/custom_alfie_demo.ipynb) for more info on making and using custom neural networks with alfie).

```
#example using the 6mer model that ships with alfie, note the -k 6 option is required
Expand Down
16 changes: 7 additions & 9 deletions example/non_neural_net_model_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
Here this is demonstrated by training a support vector machine
on the same set of data used in the example jupyter notebook (part 2),
and then simulating deployment on a new dataset
"""

import numpy as np
import pandas as pd

Expand All @@ -23,7 +22,7 @@
data = pd.read_csv('alfie_small_train_example.tsv', sep = '\t')

#####
# conduct the train test split
# conduct the train test split with alfie
#####
train, test = stratified_taxon_split(data, class_col = 'class', test_size = 0.3, )

Expand All @@ -46,15 +45,14 @@
y_train = tax_encoder.fit_transform(y_train_raw)
y_test = tax_encoder.transform(y_test_raw)


#this reshape interfaces with the LinerSVC input shape requirements
y_train = np.reshape(y_train, len(y_train))
y_test = np.reshape(y_test, len(y_test))


#####
# train a demo SVM model
# train a custom SVM model
#####

svm_params = {'C': 100.0, 'loss': 'squared_hinge', 'max_iter': 10000}

svm_ann_demo = LinearSVC(**svm_params)
Expand All @@ -64,7 +62,6 @@
#####
# generate simulated fasta input from the test data
#####

#new list of dictionaries
test_simulated_fasta = []

Expand All @@ -81,10 +78,11 @@


#####
# use the model to make predictions via the classify records function
# deploy the custom model making predictions via aflie's classify_records function
#####
#note the kmer data are generated and predictions made all in the one function call.

# note the parameter argmax = False, this is passed because the LinearSVC already
# note the parameter argmax = False is passed because the LinearSVC already
# outputs non-one hot encoded outputs.
test_out, test_predictions = classify_records(test_simulated_fasta,
model = svm_ann_demo, argmax = False)
Expand Down

0 comments on commit 547932c

Please sign in to comment.