AsianFin -- Xiaomi on Friday open-sourced its first native end-to-end speech model, Xiaomi-MiMo-Audio.
The model, built on an innovative pretraining architecture and trained on hundreds of millions of hours of data, achieves few-shot generalization based on in-context learning (ICL) for the first time in the speech domain and exhibits noticeable emergent behaviors during pretraining.