Topic 1 Question 282

Professional Machine Learning Engineer

Topic 1 Question 282
You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have an 8 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into a text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature following Google-recommended best practices?
- Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
- Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
ユーザの投票
コメント(17)
- 正解だと思う選択肢: D
  I went with D. "following Google-recommended best practices" https://cloud.google.com/speech-to-text/docs/optimizing-audio-files-for-speech-to-text#:~:text=We%20recommend%20a%20sample%20rate%20of%20at%20least%2016%20kHz%20in%20the%20audio%20files%20that%20you%20use%20for%20transcription%20with%20Speech%2Dto%2DText
  
  👍 9
  CHARLIE21082024/03/21
- 正解だと思う選択肢: B
  We have longer than minute, 8KHz recordings.
  
  https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data "avoid re-sampling. For example, in telephony the native rate is commonly 8000 Hz, which is the rate that should be sent to the service." -> 8KHz https://cloud.google.com/speech-to-text/docs/sync-recognize "Synchronous speech recognition returns the recognized text for short audio (less than 60 seconds). To process a speech recognition request for audio longer than 60 seconds, use Asynchronous Speech Recognition." -> asynchronous
  
  So, the correct answer is B.
  
  👍 5
  asmgi2024/07/17
- 正解だと思う選択肢: D
  Upsampling to 16 kHz: The Speech-to-Text API recommends an audio sample rate of 16 kHz for optimal transcription accuracy. Upsampling the 8 kHz recordings to 16 kHz will improve the quality of the transcription.
  
  Asynchronous Recognition: Asynchronous recognition is suitable for longer audio recordings (more than one minute). It allows you to submit the audio file and receive the transcription results later, which is more efficient for batch processing.
  
  https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data
  
  👍 4
  tavva_prudhvi2024/03/30
シャッフルモード

ユーザの投票

コメント(17)