Real-time 26-Point MFCC & 512-Point Radix-2 FFT Generator & Visualizer on Android in Java, C++ and NEON Intrinsics
The following table and figure shows the average time in seconds observed for MFCC generation per 400-sample frame.
-
Java : All written in Java code
-
C++ : All written in Native C++ with JNI interface.
-
C++ & NEON/SSE : Written in C++ with 4-lane ArmV7 NEON/SSE SIMD intrinsics for Hamming, FFT, and DCT.
-
Galaxy S9 -O0 : Galaxy S9 with 8-Core Snapdragon 845 with C++ compiler optimization level 0
-
Galaxy S9 -O3 : Galaxy S9 with 8-Core Snapdragon 845 with C++ compiler optimization level 3
-
Emulator (X86) -O0 : Android Emulator on a Host PC with 4-Core emulation with C++ compiler optimization level 0
-
Emulator (X86) -O3 : Android Emulator on a Host PC with 4-Core emulation with C++ compiler optimization level 3
The numbers are all in seconds.
Tables | Java | C++ | C++ NEON SSE |
---|---|---|---|
Galaxy S9 -O0 | 0.0016 | 0.0015 | 0.0011 |
Galaxy S9 -O3 | 0.0016 | 0.00050 | 0.00034 |
Emulator(X86) -O0 | 0.00020 | 0.00012 | 0.00012 |
Emulator(X86) -O3 | 0.00020 | 0.000049 | 0.000047 |
The Java implementation works surprisingly well. On the test target, it takes 2[ms] to process one frame. Assuming one of the core is availalble all the time, the realtime factor is greater than 5.
The author is not capable of further tuning with assembler beyond the intrinsics, but further performance improvement by CPU-specific assembler-level optimization may be possible.
-
Download the contents and open with AndroidStudio.
-
Copy
NEON_2_SSE.h
fromhttps://github.com/intel/ARM_NEON_2_x86_SSE
into app/src/main/cpp/. -
Build.
It was tested with the following environment.
-
Android Studio 3.5.1
-
Min SDK Version 21
-
Virtual device API29, Android 10.0, x86
If it does not work, check the App permission for Mic on the device. Also, try chanding RECORDING_RATE in AudioReceiver.
Originally motivated to measure the real-time performance of audio signal processing on Android devices. This is a study implementation as a bench-mark for a Native C++ implematation with 4-lane ARM NEON SIMD intrinsics.
-
Audio input 16KHz monaural linear PCM taken from AudioRecorder
-
Frame size 400 samples (25[ms]), Frame shift 160 samples (10[ms])
-
Pre-emphasis (tap 0.96)
-
Hamming window per frame
-
512-Point Radix-2 Cooley-Tukey recursive FFT
-
Mel Filterbank, 26 banks, top 8KHz, bottom 300Hz, with flooring at 1.0
-
DCT into 26-point MFCC [quefrency] with DC.
The upper part is the 26-point MFCC. The lower part is the 256-point spectrum taken from 512-point FFT. Plese click the thumbnails to enlarge.
-
440Hz Sine : 440 Hz Sine wave with some background noise.
-
'Android audio' : Me saying "Android audio".
-
'Blah Blah Blah' : Me saying "Blah Blah Blah...".
-
'Wir sind alle programmiert.' : Me saying "Wir sind alle programmiert."
-
Piano Single Tones : Grand Piano Single Tones C3 and then C4.
-
Piano Chord : Grand Piano Chord C maj9
-
Synth Lead Melody : Sythe lead tracing a line of Spain by Chick Corea (B->E->G->Gb->D->B->E).
-
Cory Wong Jamming : Cory Wong slowly jamming with his guitar. (https://youtu.be/i14pnaRzflU?t=208)
-
Cory Henry & Nick Semrad @NAMM : Cory Henry and Nick Semrad playing Gospel at NAMM show. (https://www.youtube.com/watch?v=8TwhLplrFNo)
-
Jaco Portrait of Tracy : Tried to capture the incredible harmonics of Portrait of Tracy, but not much seen. (https://www.youtube.com/watch?v=nsZ_1mPOuyk)
-
Naturally 7 Human Voices : Successfully captured the incredible grooves/vibrato of their voices. (https://www.youtube.com/watch?v=AF-KagTq7qY)
-
Vulfpeck : Theo Katzman's singing voice. (https://www.youtube.com/watch?v=6HUkbf44iAA)
-
Tori Kelly : Tori Kelly's incredible voice visualized. (https://www.youtube.com/watch?v=Jv8IqJm6q7w)
- HammingWindowJava: Pre-emphasis & Hamming for a 400-sample frame
- FFT512Java: 512-point Radix-2 Cooley-Tukey recursive FFT with pre-calculated Twiddle table
- MelFilterBanksJava: Generates MelFilterBanks log energy coefficients with Bins and precalculated table.
- DCTJava: 26-point DCT with a pre-calculated table.
All the parts related to NEON intrinsics are enclosed by #ifdef HAVE_NEON ... #endif
.
-
mfcc_impl01.cpp: This file contains the following classes and some JNI glue code.
-
class HamminwWindow
: Pre-emphasis & Hamming for a 400-sample frame. It utiizes NEON for the float mult loop. -
class FFT512
: 512-point Radix-2 Cooley-Tukey recursive FFT with pre-calculated Twiddle table. It utlizes NEON for the even-odd splitting and the butterfly calculations. -
class MelFilterBanks
: Generates MelFilterBanks log energy coefficients with Bins and precalculated table. It does not utilize NEON. -
class DCT
: 26-point DCT with a pre-calculated table. It utilizes NEON in the inner-loop of mult-add.
-
Visualization
- ScrollingHeatMapView: ImageView for real-time scrolling spectrum visuzliation.
Others
-
AudioReceiver: receives audio with android.media.AudioRecorder in chunks in realtime.
-
AudioChunkAggregator: arranges the audio data into 400[ms] frames with 10[ms] frame shift.
-
cpu_features : linked to the binary to obtain processor info for convenience. Apache License.
-
NEON_2_SSE : used to to compile NEON intrinsics for X86 Android Emulator. It converts ARM NEON intrinsics to equivalents in Intel SSE. Intel's own license but basically distributable retaining the original copyright notice.
-
J. S. Bridle and M. D. Brown (1974), "An Experimental Automatic Word-Recognition System", JSRU Report No. 1003, Joint Speech Research Unit, Ruislip, England.
-
"Digital signal processing" by Proakis, Manolakis 4th edition Chap 8: Efficient Computation of the DFT: Fast Fourier Transform
-
Mel Frequency Cepstral Coefficient (MFCC) tutorial : Nice tutorial.
-
libmfcc : C-implementation.
-
MFCC.cpp : another nice C-implemetation
For technical and commercial inquiries, please contact: Shoichiro Yamanishi