Beijing’s Humanoid Robotics Innovation Center has fully open-sourced its latest vision-language model for embodied intelligence, Pelican-VL 1.0, positioning it as the most powerful open-source model of its kind to date, according to a statement published Thursday.
The model comes in 7-billion and 72-billion parameter versions and is described as the “largest open-source embodied multimodal model” currently available. Benchmark tests show Pelican-VL outperforming comparable GPT-5 models by 15.79%, while also surpassing leading domestic systems such as Alibaba’s Qwen and Shanghai AI Lab’s InternLM-XComposer, the center said.
The open release of Pelican-VL 1.0 is expected to significantly advance real-world applications of embodied intelligence, enhancing visual-language perception for multi-step task planning in commercial services, general and heavy industry, hazardous operations and household robotics.

