Skip to content

Commit

Permalink
Add audio tagging APIs for node-addon-api (#875)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored May 14, 2024
1 parent 388e6a9 commit d19f50b
Show file tree
Hide file tree
Showing 12 changed files with 520 additions and 16 deletions.
2 changes: 1 addition & 1 deletion .github/scripts/node-addon/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fi
SHERPA_ONNX_VERSION=$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)
echo "SHERPA_ONNX_VERSION $SHERPA_ONNX_VERSION"

# SHERPA_ONNX_VERSION=1.0.21
# SHERPA_ONNX_VERSION=1.0.22

if [ -z $owner ]; then
owner=k2-fsa
Expand Down
16 changes: 16 additions & 0 deletions .github/scripts/test-nodejs-addon-npm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,22 @@ d=nodejs-addon-examples
echo "dir: $d"
cd $d

echo "----------audio tagging----------"

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
tar xvf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
rm sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2

node ./test_audio_tagging_zipformer.js
rm -rf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
tar xvf sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
rm sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2

node ./test_audio_tagging_ced.js
rm -rf sherpa-onnx-ced-mini-audio-tagging-2024-04-19

echo "----------speaker identification----------"
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx

Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/npm-addon-macos.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Update pip
shell: bash
run: |
pip install -U pip
- uses: actions/setup-node@v4
with:
registry-url: 'https://registry.npmjs.org'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/npm-addon.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
SHERPA_ONNX_VERSION=$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)
echo "SHERPA_ONNX_VERSION $SHERPA_ONNX_VERSION"
# SHERPA_ONNX_VERSION=1.0.21
# SHERPA_ONNX_VERSION=1.0.22
src_dir=.github/scripts/node-addon
sed -i.bak s/SHERPA_ONNX_VERSION/$SHERPA_ONNX_VERSION/g $src_dir/package.json
Expand Down
123 changes: 109 additions & 14 deletions nodejs-addon-examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,82 @@ export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-arm64:$LD_LIBRARY_PATH
```

# Voice Activity detection (VAD)
# Examples

The following tables list the examples in this folder.

## Voice activity detection (VAD)

|File| Description|
|---|---|
|[./test_vad_microphone.js](./test_vad_microphone.js)| VAD with a microphone. It uses [silero-vad](https://github.com/snakers4/silero-vad)|

## Speaker identification

|File| Description|
|---|---|
|[ ./test_speaker_identification.js]( ./test_speaker_identification.js)| Speaker identification from a file|

## Spoken language identification

|File| Description|
|---|---|
|[./test_vad_spoken_language_identification_microphone.js](./test_vad_spoken_language_identification_microphone.js)|Spoken language identification from a microphone using a multi-lingual [Whisper](https://github.com/openai/whisper) model|

## Audio tagging

|File| Description|
|---|---|
|[./test_audio_tagging_zipformer.js](./test_audio_tagging_zipformer.js)| Audio tagging with a Zipformer model|
|[./test_audio_tagging_ced.js](./test_audio_tagging_ced.js)| Audio tagging with a [CED](https://github.com/RicherMans/CED) model|

## Streaming speech-to-text from files

|File| Description|
|---|---|
|[./test_asr_streaming_transducer.js](./test_asr_streaming_transducer.js)| Streaming speech recognition from a file using a Zipformer transducer model|
|[./test_asr_streaming_ctc.js](./test_asr_streaming_ctc.js)| Streaming speech recognition from a file using a Zipformer CTC model with greedy search|
|[./test_asr_streaming_ctc_hlg.js](./test_asr_streaming_ctc_hlg.js)| Streaming speech recognition from a file using a Zipformer CTC model with HLG decoding|
|[./test_asr_streaming_paraformer.js](./test_asr_streaming_paraformer.js)|Streaming speech recognition from a file using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|

## Streaming speech-to-text from a microphone

|File| Description|
|---|---|
|[./test_asr_streaming_transducer_microphone.js](./test_asr_streaming_transducer_microphone.js)| Streaming speech recognition from a microphone using a Zipformer transducer model|
|[./test_asr_streaming_ctc_microphone.js](./test_asr_streaming_ctc_microphone.js)| Streaming speech recognition from a microphone using a Zipformer CTC model with greedy search|
|[./test_asr_streaming_ctc_hlg_microphone.js](./test_asr_streaming_ctc_hlg_microphone.js)|Streaming speech recognition from a microphone using a Zipformer CTC model with HLG decoding|
|[./test_asr_streaming_paraformer_microphone.js](./test_asr_streaming_paraformer_microphone.js)| Streaming speech recognition from a microphone using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|

## Non-Streaming speech-to-text from files

|File| Description|
|---|---|
|[./test_asr_non_streaming_transducer.js](./test_asr_non_streaming_transducer.js)|Non-streaming speech recognition from a file with a Zipformer transducer model|
|[./test_asr_non_streaming_whisper.js](./test_asr_non_streaming_whisper.js)| Non-streaming speech recognition from a file using [Whisper](https://github.com/openai/whisper)|
|[./test_asr_non_streaming_nemo_ctc.js](./test_asr_non_streaming_nemo_ctc.js)|Non-streaming speech recognition from a file using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
|[./test_asr_non_streaming_paraformer.js](./test_asr_non_streaming_paraformer.js)|Non-streaming speech recognition from a file using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|

## Non-Streaming speech-to-text from a microphone with VAD

|File| Description|
|---|---|
|[./test_vad_asr_non_streaming_transducer_microphone.js](./test_vad_asr_non_streaming_transducer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a Zipformer transducer model|
|[./test_vad_asr_non_streaming_whisper_microphone.js](./test_vad_asr_non_streaming_whisper_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Whisper](https://github.com/openai/whisper)|
|[./test_vad_asr_non_streaming_nemo_ctc_microphone.js](./test_vad_asr_non_streaming_nemo_ctc_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
|[./test_vad_asr_non_streaming_paraformer_microphone.js](./test_vad_asr_non_streaming_paraformer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|

## Text-to-speech

|File| Description|
|---|---|
|[./test_tts_non_streaming_vits_piper_en.js](./test_tts_non_streaming_vits_piper_en.js)| Text-to-speech with a [piper](https://github.com/rhasspy/piper) English model|
|[./test_tts_non_streaming_vits_coqui_de.js](./test_tts_non_streaming_vits_coqui_de.js)| Text-to-speech with a [coqui](https://github.com/coqui-ai/TTS) German model|
|[./test_tts_non_streaming_vits_zh_ll.js](./test_tts_non_streaming_vits_zh_ll.js)| Text-to-speech with a Chinese model using [cppjieba](https://github.com/yanyiwu/cppjieba)|
|[./test_tts_non_streaming_vits_zh_aishell3.js](./test_tts_non_streaming_vits_zh_aishell3.js)| Text-to-speech with a Chinese TTS model|


### Voice Activity detection (VAD)

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
Expand All @@ -39,7 +114,27 @@ npm install naudiodon2
node ./test_vad_microphone.js
```

## Streaming speech recognition with Zipformer transducer
### Audio tagging with zipformer

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
tar xvf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
rm sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2

node ./test_audio_tagging_zipformer.js
```

### Audio tagging with CED

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
tar xvf sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
rm sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2

node ./test_audio_tagging_ced.js
```

### Streaming speech recognition with Zipformer transducer

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
Expand All @@ -54,7 +149,7 @@ npm install naudiodon2
node ./test_asr_streaming_transducer_microphone.js
```

## Streaming speech recognition with Zipformer CTC
### Streaming speech recognition with Zipformer CTC

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
Expand All @@ -73,7 +168,7 @@ node ./test_asr_streaming_ctc_microphone.js
node ./test_asr_streaming_ctc_hlg_microphone.js
```

## Streaming speech recognition with Paraformer
### Streaming speech recognition with Paraformer

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
Expand All @@ -88,7 +183,7 @@ npm install naudiodon2
node ./test_asr_streaming_paraformer_microphone.js
```

## Non-streaming speech recognition with Zipformer transducer
### Non-streaming speech recognition with Zipformer transducer

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-en-2023-04-01.tar.bz2
Expand All @@ -102,7 +197,7 @@ npm install naudiodon2
node ./test_vad_asr_non_streaming_transducer_microphone.js
```

## Non-streaming speech recognition with Whisper
### Non-streaming speech recognition with Whisper

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
Expand All @@ -116,7 +211,7 @@ npm install naudiodon2
node ./test_vad_asr_non_streaming_whisper_microphone.js
```

## Non-streaming speech recognition with NeMo CTC models
### Non-streaming speech recognition with NeMo CTC models

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2
Expand All @@ -130,7 +225,7 @@ npm install naudiodon2
node ./test_vad_asr_non_streaming_nemo_ctc_microphone.js
```

## Non-streaming speech recognition with Paraformer
### Non-streaming speech recognition with Paraformer

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
Expand All @@ -144,7 +239,7 @@ npm install naudiodon2
node ./test_vad_asr_non_streaming_paraformer_microphone.js
```

## Text-to-speech with piper VITS models (TTS)
### Text-to-speech with piper VITS models (TTS)

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-cori-medium.tar.bz2
Expand All @@ -154,7 +249,7 @@ rm vits-piper-en_GB-cori-medium.tar.bz2
node ./test_tts_non_streaming_vits_piper_en.js
```

## Text-to-speech with piper Coqui-ai/TTS models (TTS)
### Text-to-speech with piper Coqui-ai/TTS models (TTS)

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-coqui-de-css10.tar.bz2
Expand All @@ -164,7 +259,7 @@ rm vits-coqui-de-css10.tar.bz2
node ./test_tts_non_streaming_vits_coqui_de.js
```

## Text-to-speech with vits Chinese models (1/2)
### Text-to-speech with vits Chinese models (1/2)

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
Expand All @@ -174,7 +269,7 @@ rm sherpa-onnx-vits-zh-ll.tar.bz2
node ./test_tts_non_streaming_vits_zh_ll.js
```

## Text-to-speech with vits Chinese models (2/2)
### Text-to-speech with vits Chinese models (2/2)

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-icefall-zh-aishell3.tar.bz2
Expand All @@ -184,7 +279,7 @@ rm vits-icefall-zh-aishell3.tar.bz2
node ./test_tts_non_streaming_vits_zh_aishell3.js
```

## Spoken language identification with Whisper multi-lingual models
### Spoken language identification with Whisper multi-lingual models

```bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
Expand All @@ -202,7 +297,7 @@ npm install naudiodon2
node ./test_vad_spoken_language_identification_microphone.js
```

## Speaker identification
### Speaker identification

You can find more models at
<https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models>
Expand Down
63 changes: 63 additions & 0 deletions nodejs-addon-examples/test_audio_tagging_ced.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
// Copyright (c) 2024 Xiaomi Corporation
const sherpa_onnx = require('sherpa-onnx-node');

// Please download models files from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/audio-tagging-models
function createAudioTagging() {
const config = {
model: {
ced: './sherpa-onnx-ced-mini-audio-tagging-2024-04-19/model.int8.onnx',
numThreads: 1,
debug: true,
},
labels:
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/class_labels_indices.csv',
topK: 5,
};
return new sherpa_onnx.AudioTagging(config);
}

const at = createAudioTagging();

const testWaves = [
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/1.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/2.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/3.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/4.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/5.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/6.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/7.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/8.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/9.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/10.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/11.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/12.wav',
'./sherpa-onnx-ced-mini-audio-tagging-2024-04-19/test_wavs/13.wav',
];

console.log('------');

for (let filename of testWaves) {
const start = performance.now();
const stream = at.createStream();
const wave = sherpa_onnx.readWave(filename);
stream.acceptWaveform({sampleRate: wave.sampleRate, samples: wave.samples});
const events = at.compute(stream);
const stop = performance.now();

const elapsed_seconds = (stop - start) / 1000;
const duration = wave.samples.length / wave.sampleRate;
const real_time_factor = elapsed_seconds / duration;

console.log('input file:', filename);
console.log('Probability\t\tName');
for (let e of events) {
console.log(`${e.prob.toFixed(3)}\t\t\t${e.name}`);
}
console.log('Wave duration', duration.toFixed(3), 'secodns')
console.log('Elapsed', elapsed_seconds.toFixed(3), 'secodns')
console.log(
`RTF = ${elapsed_seconds.toFixed(3)}/${duration.toFixed(3)} =`,
real_time_factor.toFixed(3))
console.log('------');
}
Loading

0 comments on commit d19f50b

Please sign in to comment.