Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Unable to capture audio from microphone on Cloud Speech API #13

Closed
abepadilla opened this issue Jan 10, 2018 · 6 comments
Closed

Unable to capture audio from microphone on Cloud Speech API #13

abepadilla opened this issue Jan 10, 2018 · 6 comments
Assignees
Labels
api: speech Issues related to the googleapis/nodejs-speech API. type: question Request for information or clarification. Not an issue.

Comments

@abepadilla
Copy link

Once node recognize.js listen is executed, it stops immediately but no error message is displayed. The input should be coming from a microphone. And the main purpose is to transcribe the audio from the microphone using Node.js.

A. Details

  • OS: Windows 7
  • Node js version: 4.5.0
  • npm version: 2.15.9
  • google-cloud/speech version: 1.0.0 (this version I found In the package.json file in (node_module/@google-cloud/speech/package.json)

B. Steps to reproduce

  1. Create project in Google Cloud Console
  2. Enable Google Cloud Speech API service
  3. Create the service account
  4. Download the private key as JSON
  5. Download and install the Google Cloud SDK
  6. Clone the repository: https://github.com/googleapis/nodejs-speech
  7. Set the path in the Google SDK to samples folder where recognize.js
    was there.
  8. Install the required libraries with the command sudo apt-get install sox libsox-fmt-all.
  9. Run the following command to set the credentials : export GOOGLE_APPLICATION_CREDENTIALS="path/to/service_account.json”
  10. And in recognize.js file try to change the recordProgram parameter value to sox if it doesn’t work change it rec or arecord.
  11. Executed the code with node recognize.js listen
  12. It suddenly stops and it only displays, Listening, press Ctrl+C to stop.

Thank you so much!

@stephenplusplus stephenplusplus added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jan 10, 2018
@kadoyau
Copy link

kadoyau commented Jan 11, 2018

I have almost same problem on Windows 10, works fine on macOS.
It's probably due to the bug of node-record-lpcm16 module on Windows.
FYI: gillesdemey/node-record-lpcm16#8

@gabrielreisn
Copy link

The 'rec' option was removed from sox at windows, so you need to fix the file directly from node_modules. I found the solution here: gillesdemey/node-record-lpcm16#8

@stephenplusplus
Copy link
Contributor

@abepadilla @kadoyau could you try this out and report back?

@ghost ghost removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Feb 16, 2018
@atifsajjad
Copy link

I'm having the same issue as @abepadilla and @kadoyau
The solution at:
gillesdemey/node-record-lpcm16#8

only helps with the "Error: spawn rec ENOENT" error. Once I did that, I stopped getting the aforementioned error. Now I'm exactly at the situation @abepadilla describes. The script stops immediately without any error or transcript returned.

@ycai003
Copy link

ycai003 commented Sep 21, 2018

Hi All @abepadilla @atifsajjad ,

I have managed to make the code work on Windows. It is mainly the problem of the node-record-lpcm16 problem. Here are my two cents:

  1. Please check if your node-record-lpcm16-master module works before you start to worry about google cloud speech. I tried this example first (https://github.com/gillesdemey/node-record-lpcm16/blob/master/examples/file.js). It didn't work on my windows. It would't record anything. So I tweaked the index.js in the lpcm module (after googling hard) and here is the tweaked index.js code:

'use strict'

var spawn = require('child_process').spawn

var cp // Recording process

// returns a Readable stream
exports.start = function (options) {
cp = null // Empty out possibly dead recording process

var defaults = {
sampleRate: 44100, //You may need to tweak this according to your system.
channels: 1,
compress: false,
threshold: 0.5,
thresholdStart: null,
thresholdEnd: null,
silence: '1.0',
verbose: false,
recordProgram: 'sox' //The original code uses "rec" and it doesn't work on my computer.
}

options = Object.assign(defaults, options)

var cmd = 'sox';
var cmdArgs = [
'-q', // show no progress
'-t', 'waveaudio', // audio type
'-d', // use default recording device
'-r', options.sampleRate.toString(), // sample rate
'-c', 1, // channels
'-e', 'signed-integer', // sample encoding
'-b', '16', // precision (bits)
'-t', 'raw', // Somehow, this is very important. Without this, lpcm wouldn't record anything.
'-', // pipe
// end on silence
'silence', '1', '0.1', options.thresholdStart || options.threshold + '%',
'1', options.silence, options.thresholdEnd || options.threshold + '%'
];

// Capture audio stream
/*var cmd, cmdArgs, cmdOptions
switch (options.recordProgram) {
// On some Windows machines, sox is installed using the "sox" binary
// instead of "rec"
case 'sox':
var cmd = 'sox';
var cmdArgs = [
'-q', // show no progress
'-t', 'waveaudio', // audio type
'-d', // use default recording device
'-r', options.sampleRate.toString(), // sample rate
'-c', 1, // channels
'-e', 'signed-integer', // sample encoding
'-b', '16', // precision (bits)
'-', // pipe
// end on silence
'silence', '1', '0.1', options.thresholdStart || options.threshold + '%',
'1', options.silence, options.thresholdEnd || options.threshold + '%'
];
break
/*case 'rec':
default:
cmd = options.recordProgram
cmdArgs = [
'-q', // show no progress
'-r', options.sampleRate, // sample rate
'-c', options.channels, // channels
'-e', 'signed-integer', // sample encoding
'-b', '16', // precision (bits)
'-t', 'wav', // audio type
'-', // pipe
// end on silence
'silence', '1', '0.1', options.thresholdStart || options.threshold + '%',
'1', options.silence, options.thresholdEnd || options.threshold + '%'
]
break
*/
// On some systems (RasPi), arecord is the prefered recording binary
/*case 'arecord':
cmd = 'arecord'
cmdArgs = [
'-q', // show no progress
'-r', options.sampleRate, // sample rate
'-c', options.channels, // channels
'-t', 'wav', // audio type
'-f', 'S16_LE', // Sample format
'-' // pipe
]

  if (options.device) {
    cmdArgs.unshift('-D', options.device)
  }
  break

}*/

// Spawn audio capture command
var cmdOptions = { encoding: 'binary' }
if (options.device) {
cmdOptions.env = Object.assign({}, process.env, { AUDIODEV: options.device })
}
cp = spawn(cmd, cmdArgs, cmdOptions)
var rec = cp.stdout

if (options.verbose) {
console.log('Recording', options.channels, 'channels with sample rate',
options.sampleRate + '...')
console.time('End Recording')

rec.on('data', function (data) {
  console.log('Recording %d bytes', data.length)
})

rec.on('end', function () {
  console.timeEnd('End Recording')
})

}

return rec
}

exports.stop = function () {
if (!cp) {
console.log('Please start a recording first')
return false
}

cp.kill() // Exit the spawned process, exit gracefully
return cp
}

  1. After index.js is "fixed", I can record through my microphone and generate a "test.raw" file successfully. The code is as follows:

'use strict'

var record = require('../') //Node: please replace this with the correct path to your node-record-lpcm16 module.
var fs = require('fs')

var file = fs.createWriteStream('test.raw', { encoding: 'binary' })

record.start().pipe(file)

// Stop recording after three seconds and write to file
setTimeout(function () {
record.stop()
}, 300000)

  1. Once the lpcm16 works with microphone, I can finally work on google cloud speech. I use the following code to stream recording successfully:

const record = require('path to your node-record-lpcm16-master module');

// Imports the Google Cloud client library
const speech = require('path to your @google-cloud/speech module');

// Creates a client
const client = new speech.SpeechClient();

/**

  • TODO(developer): Uncomment the following lines before running the sample.
    */
    const encoding = 'LINEAR16';
    const sampleRateHertz = 44100;
    const languageCode = 'en-US';

const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};

// Create a recognize stream
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data =>{
process.stdout.write(
data.results[0] && data.results[0].alternatives[0]
? Transcription: ${data.results[0].alternatives[0].transcript}\n
: \n\nReached transcription time limit, press Ctrl+C\n
)
});

// Start recording and send the microphone input to the Speech API
record
.start({
sampleRateHertz: sampleRateHertz,
threshold: 0,
// Other options, see https://www.npmjs.com/package/node-record-lpcm16#options
verbose: false,
recordProgram: 'sox', // Try also "arecord" or "sox"
silence: '10.0',
})
.on('error', console.error)
.pipe(recognizeStream);

// Stop recording after three seconds

console.log('Listening, press Ctrl+C to stop.');
setTimeout(function () {
record.stop()
}, 300000)

  1. As computer systems vary, you may need to use "audacity"(https://www.audacityteam.org/) to check out your recording's sampleRateHertz and encoding etc.

  2. My working environment: Windows 10, node.js v10.10.0, node.js google client library v1.5.0. I understand that my node.js google client library isn't the latest version. My Recognize.js example (https://github.com/googleapis/nodejs-speech/blob/master/samples/recognize.js) only works with this version of google client library.

Hope this helps!

@Fairnia
Copy link

Fairnia commented Oct 12, 2018

@ycai003 that was incredibly helpful, thank you!

@google-cloud-label-sync google-cloud-label-sync bot added the api: speech Issues related to the googleapis/nodejs-speech API. label Jan 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: speech Issues related to the googleapis/nodejs-speech API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

7 participants