Vietnamese Streaming RNN-T

Visit https://github.com/HKAB/vietnamese-rnnt-tutorial/ for more information.

  • This model runs on a very slow CPU (it's Free tier) so the RTF of FP32 model is around 1.5 (which means it will take 1.5 times the duration of the audio to process it).
  • This model mights not work with your microphone since it was trained on a quite clean dataset. Try to speak loudly 😃
  • Although you upload a full audio file, the model will process it in a streaming fashion.
Model type

INT8 model is faster but less accurate

Cherry-picked examples
Upload from disk Model type