The accuracy of voice transcription and recognition are the cornerstones of Otter.
How does Otter transcribe speech-to-text?
Otter is cloud-based and 100% AI-powered with no human transcribers involved. Our speech-to-text engine supports English and works out of the box without requiring hours of personalized training with your voice. Our speaker identification algorithm is powered by advanced machine learning that distinguishes and learns from just a few paragraphs you tag for each speaker. The result is a legible, speaker-labeled transcript with synchronized audio and text.
How accurate is Otter's speech recognition?
Otter provides the most accurate automated transcription for meetings, interviews, lectures, and other long-form conversations.
How to increase speech accuracy
To increase the accuracy of Otter's transcription, follow these best practices:
- Try to minimize background noise. Background noise is generally the biggest factor in inaccurate transcriptions or missing audio. If Otter cannot distinguish what is being said because there is too much background noise, it may result in inaccuracies or the audio not being transcribed at all.
- Avoid overlapping dialog where speakers talk over one another. Otter will not be able to distinguish when multiple people are speaking over each other, which may result in inaccuracies or the audio not being transcribed at all.
- Use an external microphone or headset, rather than your computer or device's built-in microphone. See Using a headset or headphones about limitations of using a headset.
- Place the microphone within three feet from the person speaking and refrain from moving or touching the device to avoid any unnecessary background noise.
- Speak clearly and naturally, at your normal conversational tone and pace.
- Teach Otter jargon, names, and any other vocabulary words to increase the accuracy of these terms.
- Train Otter to recognize your voice and automatically tag your name within the transcript.
How does Otter handle filler words such as "um"?
Fillers, interjections, and hesitation markers are sounds such as "hmm" or "um" that is spoken in conversation as a pause for thought. These sounds or words are programmatically ignored, including if they were to be added to your custom vocabulary.