AsianFin — Xiaohongshu’s hilab has open-sourced dots.vlm1, the first vision-language model in its dots family.
Built on a 1.2-billion-parameter vision encoder and the DeepSeek V3 large language model, dots.vlm1 has undergone large-scale pretraining and fine-tuning, achieving near–state-of-the-art performance in visual perception and reasoning tasks.