Whisper is a recently released transformer-based automatic speech recognition (ASR) model from OpenAI.
It can be used for:
🗣Language identification
🗣Voice activity detection
🗣Multi-lingual speech recognition
🗣Multi-lingual speech translation
When evaluated on the ESB datasets (including LibriSpeech, Common Voice), Whisper outperformed Conformer RNN-T from NVidia and Wav2Vec2 from Meta.
Link to blog: https://openai.com/blog/whisper/
Link to repo: https://github.com/openai/whisper
Link to benchmarking study: https://arxiv.org/abs/2210.13352