Interfaze open-sources diffusion-gemma-asr-small, a diffusion-based multilingual ASR model
Decision Brief
Interfaze has open-sourced diffusion-gemma-asr-small, a diffusion-based multilingual automatic speech recognition (ASR) model. Unlike traditional autoregressive methods, it transcribes through a diffusion process. A ~42M-parameter adapter feeds audio into Google's frozen DiffusionGemma model. A single adapter covers six languages, with transcription cost determined by denoising steps rather than transcription length. This release lets developers experiment with diffusion models as an alternative to end-to-end ASR systems, especially where finer control over transcription cost is needed (e.g., long audio clips). However, it currently supports only six languages, and model details require further validation.
Sources
- MarkTechPost
Fast research-paper and ML tooling summaries, useful for infra and agent updates.
- MarkTechPost