Gradio

This model runs on a very slow CPU (it's Free tier) so the RTF of FP32 model is around 1.5 (which means it will take 1.5 times the duration of the audio to process it).
This model mights not work with your microphone since it was trained on a quite clean dataset. Try to speak loudly 😃
Although you upload a full audio file, the model will process it in a streaming fashion.

Vietnamese Streaming RNN-T