Use this directory to make predictions on audio, text, image, video, and/or CSV files.
Specifically, just drag and drop sample data in here and predictions will be made based on the models in the ./models directory.
To get started, all you need to do is put some files in the ./load_dir folder and run the load.py command.
cd /Users/jim/desktop/allie
cd models
python3 load.py
What results is featurized folders with model predictions in .JSON format following the standard dictionary, as shown below:
{"sampletype": "audio", "transcripts": {"audio": {"deepspeech_dict": "this is some testator testing once you three i am testing for ale while machinery framework to see his accretion the new sample test"}, "text": {}, "image": {}, "video": {}, "csv": {}}, "features": {"audio": {"librosa_features": {"features": [48.0, 204.70833333333334, 114.93240011947118, 396.0, 14.0, 216.5, 103.359375, 1.8399430940147428, 1.9668646890049444, 11.385597904924593, 0.0, 1.0559244294941723, 1.0, 0.0, 1.0, 1.0, 1.0, 0.7548547717034835, 0.010364651324331484, 0.7781876333922462, 0.7413898270731765, 0.7531146388573189, 0.5329470301378001, 0.017993121900389288, 0.5625446261639591, 0.49812177879164954, 0.5362475050227643, 0.5019325152826378, 0.014070606086479512, 0.523632811868841, 0.46894673564752315, 0.5016733890463346, 0.47905132092405744, 0.02472744944944913, 0.5170032241543769, 0.408032583763636, 0.481510183943566, 0.47211244429901056, 0.018043067999864118, 0.4986083755424441, 0.4084943475419884, 0.47487594610615297, 0.4698145764425497, 0.02481760404009747, 0.5249353704873062, 0.4293713399428569, 0.4722612320098678, 0.45261508487773017, 0.026497663310172545, 0.49848564789957234, 0.40880741566998524, 0.4507269108715201, 0.4486803104555413, 0.058559460166888094, 0.5292193402660791, 0.3401144705267203, 0.44999945618536435, 0.47497774707770735, 0.06545659069313127, 0.5736851049778624, 0.37925421500129, 0.4734915617563768, 0.4650337731799947, 0.06320658864729298, 0.5675856011170606, 0.3828128296325481, 0.45491284769941215, 0.4336677569640048, 0.06580364398487831, 0.5229786825561087, 0.3254973876934075, 0.4435446804048719, 0.4510229261935718, 0.0716424867984984, 0.5607997826027251, 0.3319068941555564, 0.45336899240905365, -378.4693712461592, 123.45005738361948, -131.02074973363048, -645.6119532302674, -365.0407612849682, 108.01722016743142, 78.5850621057939, 244.19279156346005, -109.89544987268641, 113.87757464191944, -18.990339871317058, 38.97227759803155, 80.46313291288668, -113.14922433281748, -19.5478460234633, 25.85348830525823, 36.66801973350443, 140.72102808980202, -59.74682246793187, 18.3196627309548, 25.890819294565695, 28.110070916600474, 109.71209190044716, -32.50655086525428, 24.126562365562382, -12.77779324195114, 25.980150189124338, 37.34024720564918, -89.18596268298815, -14.092855104596493, -14.213402047550273, 17.851386217883952, 24.416921204215857, -53.80916251929509, -15.616460366626296, -11.056262156053059, 18.131479541957944, 25.019042211813467, -65.95011982036516, -10.115261093647717, -2.111560667454096, 11.800353875032327, 33.815281150727785, -35.047615612670526, -2.4632489045982657, -12.855548041442455, 13.841955462451525, 26.49950045235625, -54.65905146286438, -12.258563565004795, -5.991988010947961, 11.560727147262314, 26.699383611419385, -46.86210002294128, -5.08389478450145, -11.905883972886778, 13.110884275285521, 18.96898208296976, -55.222181120197234, -8.889351847151506, -13.282554457300717, 9.363802595261776, 13.125079504552438, -42.40351688080857, -12.904730673116855, -6.647081175227956e-05, 6.790962221819154e-05, 3.898538767970233e-05, -0.0003530532719088282, -5.176161063821292e-05, 0.5775604310470552, 0.5363262958114443, 3.0171051694951547, 0.005876029108677461, 0.447613631105005, 2196.6402149427804, 1460.1082170800585, 6848.122696727527, 474.45532202867423, 1779.7575344580457, 1879.6573011499802, 758.0548156953982, 3968.436183431614, 710.7057371268927, 1783.9133839417857, 25.057721821734972, 7.417488037600184, 48.54069273302066, 7.980294433517432, 26.382808285840404, 0.02705797180533409, 0.049401603639125824, 0.22989588975906372, 2.3204531316878274e-05, 0.0016842428594827652, 3896.30511090472, 2618.9936438064337, 9829.9072265625, 484.4970703125, 2993.115234375, 0.13837594696969696, 0.11062751539644003, 0.62060546875, 0.01220703125, 0.10009765625, 0.025540588423609734, 0.02010413259267807, 0.09340725094079971, 0.00015651443391107023, 0.02306547947227955], "labels": ["onset_length", "onset_detect_mean", "onset_detect_std", "onset_detect_maxv", "onset_detect_minv", "onset_detect_median", "tempo", "onset_strength_mean", "onset_strength_std", "onset_strength_maxv", "onset_strength_minv", "onset_strength_median", "rhythm_0_mean", "rhythm_0_std", "rhythm_0_maxv", "rhythm_0_minv", "rhythm_0_median", "rhythm_1_mean", "rhythm_1_std", "rhythm_1_maxv", "rhythm_1_minv", "rhythm_1_median", "rhythm_2_mean", "rhythm_2_std", "rhythm_2_maxv", "rhythm_2_minv", "rhythm_2_median", "rhythm_3_mean", "rhythm_3_std", "rhythm_3_maxv", "rhythm_3_minv", "rhythm_3_median", "rhythm_4_mean", "rhythm_4_std", "rhythm_4_maxv", "rhythm_4_minv", "rhythm_4_median", "rhythm_5_mean", "rhythm_5_std", "rhythm_5_maxv", "rhythm_5_minv", "rhythm_5_median", "rhythm_6_mean", "rhythm_6_std", "rhythm_6_maxv", "rhythm_6_minv", "rhythm_6_median", "rhythm_7_mean", "rhythm_7_std", "rhythm_7_maxv", "rhythm_7_minv", "rhythm_7_median", "rhythm_8_mean", "rhythm_8_std", "rhythm_8_maxv", "rhythm_8_minv", "rhythm_8_median", "rhythm_9_mean", "rhythm_9_std", "rhythm_9_maxv", "rhythm_9_minv", "rhythm_9_median", "rhythm_10_mean", "rhythm_10_std", "rhythm_10_maxv", "rhythm_10_minv", "rhythm_10_median", "rhythm_11_mean", "rhythm_11_std", "rhythm_11_maxv", "rhythm_11_minv", "rhythm_11_median", "rhythm_12_mean", "rhythm_12_std", "rhythm_12_maxv", "rhythm_12_minv", "rhythm_12_median", "mfcc_0_mean", "mfcc_0_std", "mfcc_0_maxv", "mfcc_0_minv", "mfcc_0_median", "mfcc_1_mean", "mfcc_1_std", "mfcc_1_maxv", "mfcc_1_minv", "mfcc_1_median", "mfcc_2_mean", "mfcc_2_std", "mfcc_2_maxv", "mfcc_2_minv", "mfcc_2_median", "mfcc_3_mean", "mfcc_3_std", "mfcc_3_maxv", "mfcc_3_minv", "mfcc_3_median", "mfcc_4_mean", "mfcc_4_std", "mfcc_4_maxv", "mfcc_4_minv", "mfcc_4_median", "mfcc_5_mean", "mfcc_5_std", "mfcc_5_maxv", "mfcc_5_minv", "mfcc_5_median", "mfcc_6_mean", "mfcc_6_std", "mfcc_6_maxv", "mfcc_6_minv", "mfcc_6_median", "mfcc_7_mean", "mfcc_7_std", "mfcc_7_maxv", "mfcc_7_minv", "mfcc_7_median", "mfcc_8_mean", "mfcc_8_std", "mfcc_8_maxv", "mfcc_8_minv", "mfcc_8_median", "mfcc_9_mean", "mfcc_9_std", "mfcc_9_maxv", "mfcc_9_minv", "mfcc_9_median", "mfcc_10_mean", "mfcc_10_std", "mfcc_10_maxv", "mfcc_10_minv", "mfcc_10_median", "mfcc_11_mean", "mfcc_11_std", "mfcc_11_maxv", "mfcc_11_minv", "mfcc_11_median", "mfcc_12_mean", "mfcc_12_std", "mfcc_12_maxv", "mfcc_12_minv", "mfcc_12_median", "poly_0_mean", "poly_0_std", "poly_0_maxv", "poly_0_minv", "poly_0_median", "poly_1_mean", "poly_1_std", "poly_1_maxv", "poly_1_minv", "poly_1_median", "spectral_centroid_mean", "spectral_centroid_std", "spectral_centroid_maxv", "spectral_centroid_minv", "spectral_centroid_median", "spectral_bandwidth_mean", "spectral_bandwidth_std", "spectral_bandwidth_maxv", "spectral_bandwidth_minv", "spectral_bandwidth_median", "spectral_contrast_mean", "spectral_contrast_std", "spectral_contrast_maxv", "spectral_contrast_minv", "spectral_contrast_median", "spectral_flatness_mean", "spectral_flatness_std", "spectral_flatness_maxv", "spectral_flatness_minv", "spectral_flatness_median", "spectral_rolloff_mean", "spectral_rolloff_std", "spectral_rolloff_maxv", "spectral_rolloff_minv", "spectral_rolloff_median", "zero_crossings_mean", "zero_crossings_std", "zero_crossings_maxv", "zero_crossings_minv", "zero_crossings_median", "RMSE_mean", "RMSE_std", "RMSE_maxv", "RMSE_minv", "RMSE_median"]}}, "text": {}, "image": {}, "video": {}, "csv": {}}, "models": {"audio": {"males": [{"sample type": "audio", "created date": "2020-08-03 12:55:08.238841", "device info": {"time": "2020-08-03 12:55", "timezone": ["EST", "EDT"], "operating system": "Darwin", "os release": "19.5.0", "os version": "Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64", "cpu data": {"memory": [8589934592, 3035197440, 64.7, 4487892992, 379949056, 2523181056, 2408304640, 1964711936], "cpu percent": 66.0, "cpu times": [14797.03, 0.0, 9385.82, 76944.46], "cpu count": 4, "cpu stats": [153065, 479666, 89106680, 587965], "cpu swap": [2147483648, 1174405120, 973078528, 54.7, 30354079744, 203853824], "partitions": [["/dev/disk1s6", "/", "apfs", "ro,local,rootfs,dovolfs,journaled,multilabel"], ["/dev/disk1s5", "/System/Volumes/Data", "apfs", "rw,local,dovolfs,dontbrowse,journaled,multilabel"], ["/dev/disk1s4", "/private/var/vm", "apfs", "rw,local,dovolfs,dontbrowse,journaled,multilabel"], ["/dev/disk1s1", "/Volumes/Macintosh HD - Data", "apfs", "rw,local,dovolfs,journaled,multilabel"]], "disk usage": [499963174912, 10985529344, 320581328896, 3.3], "disk io counters": [1283981, 844586, 35781873664, 17365774336, 850754, 779944], "battery": [100, -2, true], "boot time": 1596411904.0}, "space left": 320.581328896}, "session id": "867c4358-d5a9-11ea-8720-acde48001122", "classes": ["males", "females"], "problem type": "classification", "model name": "gender_tpot_classifier.pickle", "model type": "tpot", "metrics": {"accuracy": 0.8947368421052632, "balanced_accuracy": 0.8944444444444444, "precision": 0.9, "recall": 0.9, "f1_score": 0.9, "f1_micro": 0.8947368421052632, "f1_macro": 0.8944444444444444, "roc_auc": 0.8944444444444444, "roc_auc_micro": 0.8944444444444444, "roc_auc_macro": 0.8944444444444444, "confusion_matrix": [[8, 1], [1, 9]], "classification_report": " precision recall f1-score support\n\n males 0.89 0.89 0.89 9\n females 0.90 0.90 0.90 10\n\n accuracy 0.89 19\n macro avg 0.89 0.89 0.89 19\nweighted avg 0.89 0.89 0.89 19\n"}, "settings": {"version": "1.0.0", "augment_data": false, "balance_data": true, "clean_data": false, "create_csv": true, "default_audio_augmenters": ["augment_tsaug"], "default_audio_cleaners": ["clean_mono16hz"], "default_audio_features": ["librosa_features"], "default_audio_transcriber": ["deepspeech_dict"], "default_csv_augmenters": ["augment_ctgan_regression"], "default_csv_cleaners": ["clean_csv"], "default_csv_features": ["csv_features"], "default_csv_transcriber": ["raw text"], "default_dimensionality_reducer": ["pca"], "default_feature_selector": ["rfe"], "default_image_augmenters": ["augment_imaug"], "default_image_cleaners": ["clean_greyscale"], "default_image_features": ["image_features"], "default_image_transcriber": ["tesseract"], "default_outlier_detector": ["isolationforest"], "default_scaler": ["standard_scaler"], "default_text_augmenters": ["augment_textacy"], "default_text_cleaners": ["remove_duplicates"], "default_text_features": ["nltk_features"], "default_text_transcriber": ["raw text"], "default_training_script": ["tpot"], "default_video_augmenters": ["augment_vidaug"], "default_video_cleaners": ["remove_duplicates"], "default_video_features": ["video_features"], "default_video_transcriber": ["tesseract (averaged over frames)"], "dimension_number": 2, "feature_number": 20, "model_compress": false, "reduce_dimensions": false, "remove_outliers": true, "scale_features": true, "select_features": true, "test_size": 0.1, "transcribe_audio": true, "transcribe_csv": true, "transcribe_image": true, "transcribe_text": true, "transcribe_video": true, "visualize_data": false, "transcribe_videos": true}, "transformer name": "gender_tpot_classifier_transform.pickle", "training data": ["gender_all.csv", "gender_train.csv", "gender_test.csv", "gender_all_transformed.csv", "gender_train_transformed.csv", "gender_test_transformed.csv"], "sample X_test": [0.19491584410160165, 2.278239927977625, 1.9809968520802117, 0.01621731265879942, 0.15713016963065518, 0.6373734371406007, 0.5565326177000756, 0.21607641781209055, 1.5729652666810199, 0.4175324163804035, 0.25821087005791604, 1.688251084321436, 0.641181793964938, 0.8245062752279405, 3.328186152340374, -3.566702513086108, -0.7896923143197454, -0.33315803775179953, -0.9991381480355723, 3.3414140426072754], "sample y_test": 1}]}, "text": {}, "image": {}, "video": {}, "csv": {}}, "labels": ["load_dir"], "errors": [], "settings": {"version": "1.0.0", "augment_data": false, "balance_data": true, "clean_data": false, "create_csv": true, "default_audio_augmenters": ["augment_tsaug"], "default_audio_cleaners": ["clean_mono16hz"], "default_audio_features": ["librosa_features"], "default_audio_transcriber": ["deepspeech_dict"], "default_csv_augmenters": ["augment_ctgan_regression"], "default_csv_cleaners": ["clean_csv"], "default_csv_features": ["csv_features"], "default_csv_transcriber": ["raw text"], "default_dimensionality_reducer": ["pca"], "default_feature_selector": ["rfe"], "default_image_augmenters": ["augment_imaug"], "default_image_cleaners": ["clean_greyscale"], "default_image_features": ["image_features"], "default_image_transcriber": ["tesseract"], "default_outlier_detector": ["isolationforest"], "default_scaler": ["standard_scaler"], "default_text_augmenters": ["augment_textacy"], "default_text_cleaners": ["remove_duplicates"], "default_text_features": ["nltk_features"], "default_text_transcriber": ["raw text"], "default_training_script": ["tpot"], "default_video_augmenters": ["augment_vidaug"], "default_video_cleaners": ["remove_duplicates"], "default_video_features": ["video_features"], "default_video_transcriber": ["tesseract (averaged over frames)"], "dimension_number": 2, "feature_number": 20, "model_compress": false, "reduce_dimensions": false, "remove_outliers": true, "scale_features": true, "select_features": true, "test_size": 0.1, "transcribe_audio": true, "transcribe_csv": true, "transcribe_image": true, "transcribe_text": true, "transcribe_video": true, "visualize_data": false, "transcribe_videos": true}}
Click the .GIF below to follow along this example in a video format:
File type | extension | recommended format |
---|---|---|
audio file | .WAV, .MP3, .M4A | .WAV |
text file | .TXT | .TXT |
image file | .PNG, .JPG | .PNG |
video file | .MP4 | .MP4 |
CSV file | .CSV | .CSV |
There are a few other scripts in this folder. The table below describes what each of these scripts does and how to call them.
Script name | What it does | How to call |
---|---|---|
clean.py | Cleans the current directory to have only the necessary .py files for core function; useful because the folder can get messy during modeling sometimes | python3 clean.py |
create_csv.py | Creates a nicely formatted .CSV file with the file paths and class labels for regression modeling | python3 create_csv.py [folderpathA] [folderpathB] [folderpath...N] |
create_readme.py | Creates a readme for a machine learning repository; currently used by the modeling API. | python3 create_readme.py [modelpath] |
model2csv.py | Creates an excel sheet of all currently trained models with their model performances; useful to summarize all modeling sessions quickly; outputs to current directory. | python3 model2csv.py |
validate.py | Can be useful to count the number of model predictions in a certain class after prediction is complete to get an intuition of accuracy of model on new datasets. Note you currently have to manually edit this file for it to be useful. | python3 validate.py |