Automatic speech recognition – Whisper OpenAI

Whisper is a recently released transformer-based automatic speech recognition (ASR) model from OpenAI.

It can be used for:

🗣Language identification

🗣Voice activity detection

🗣Multi-lingual speech recognition

🗣Multi-lingual speech translation

When evaluated on the ESB datasets (including LibriSpeech, Common Voice), Whisper outperformed Conformer RNN-T from NVidia and Wav2Vec2 from Meta.

