Fri, July 311:24Open SourceOpen source Infra & cost Multimodal & image

Interfaze open-sources diffusion-gemma-asr-small, a diffusion-based multilingual ASR model

Decision Brief

What changedInterfaze open-sources diffusion-gemma-asr-small, a multilingual ASR model that transcribes via diffusion instead of autoregression.

Why it mattersDiffusion architecture decouples transcription cost from text length, and a single adapter supports six languages, reducing multilingual deployment complexity.

Who should careOpen-source model users

Affected stackNo specific stack identified

Builder actionMonitor

Source confidenceMedium · Reliable media or first-hand reporting

Interfaze has open-sourced diffusion-gemma-asr-small, a diffusion-based multilingual automatic speech recognition (ASR) model. Unlike traditional autoregressive methods, it transcribes through a diffusion process. A ~42M-parameter adapter feeds audio into Google's frozen DiffusionGemma model. A single adapter covers six languages, with transcription cost determined by denoising steps rather than transcription length. This release lets developers experiment with diffusion models as an alternative to end-to-end ASR systems, especially where finer control over transcription cost is needed (e.g., long audio clips). However, it currently supports only six languages, and model details require further validation.

Summary basis: official / RSS sourceUnless it says 'full article read', this summary is based only on publicly available content — it never pretends to have read restricted originals.

Sources

MarkTechPost
Fast research-paper and ML tooling summaries, useful for infra and agent updates.
MarkTechPost

Decision Brief

Sources

Related intel