Use Cases
Audio-to-Text

Whisper ASR Guide

5min
whisper is a general purpose speech recognition model trained on a large dataset of diverse audio go through the readme https //cloud vast ai/template/readme/0c0c7d65cd4ebb2b340fbce39879703b first before using connecting to the instance go to the templates tab and search for “ whisper ” or click the provided link to the template here https //cloud vast ai/?ref id=62897\&creator id=62897\&name=whisper%20asr%20webservice after you select the template by pressing the triangle button the next step is to choose a gpu 3\ select a gpu offering the template you selected will give your instance access to both jupyter and ssh additionally the open button will connect you to the instance portal web interface 4\ http and token based auth are both enabled by default to avoid certificate errors in your browser, please follow the instructions for installing the tls certificate here https //docs vast ai/instances/jupyter#1smcz to allow secure https connections to your instance via its ip 5\ use the open button to open up the instance, if you are not using the open button the default username will be vastai , and the password will be the value of the environment variable open button token you can also find the token value by accessing the terminal and executing this command echo $open button token 6\ after accessing the swaggerui by clicking the triangle button first then waiting for the page to load, then clicking into the link aligning with swaggerui you should see the page below (note usually loads fast but can take 5 10 minutes) usage two post endpoints are exposed in this template /detect language use this endpoint to automatically detect the spoken language in a given audio file /asr use this endpoint for both transcription and translation of audio files both of these endpoints are documented using the openapi standard and can be tested in a web browser 7\ select the detect language endpoint 8\ then click try it out 9\ from here upload an audio clip 10\ then press the execute button 11\ if you look in the response body (see below) you can see it was able to detect the language was english note if you are getting an internal 500 error its most likely the file you selected to upload is to large for more information and specifics on things such as but not limited to configuration, additional functionality, instance logs, cloudflared, api request, ssh tunnels and port reference mapping, and caddy you can visit the readme linked here to learn more https //cloud vast ai/template/readme/0c0c7d65cd4ebb2b340fbce39879703b links github repository https //github com/ahmetoner/whisper asr webservice/ docker image https //hub docker com/r/onerahmet/openai whisper asr webservice