
PixVerse on Wednesday announced the launch of PixVerse R1, a next-generation real-time world model that redefines how humans create, experience, and interact with visual content. Built on a native multimodal foundation, PixVerse R1 generates high-resolution up to 1080P video in real time, responding instantly to user inputs—transforming video from a static playback medium into a dynamic, interactive stream.

PixVerse R1
Unlike traditional generative systems limited by high latency, fixed durations, and fragmented workflows, PixVerse R1 overcomes these barriers through three core innovations:
• An Omni Native Multimodal Foundation Model that unifies text, image, audio, and video into a single token stream for end-to-end generation;
• A Consistency-aware Autoregressive Framework that enables infinite-length, temporally coherent video sequences;
• The Instantaneous Response Engine reduces sampling steps from dozens to just 1–4 steps, guidance rectification, and dense attention computation to sparse, which facilitates the realization of real-time 1080P generation.

Figure: Integrated autoregressive modeling with the Omni foundation model
"This isn't just faster video generation—it's a new kind of media. For the first time, AI can generate a persistent, physically plausible world that evolves in real time based on user intent. Traditional video is a recorded history, but PixVerse R1 ushers in a new era of real-time generation, capturing the 'present moment' as it unfolds," said Changhu Wang, Founder and CEO of PixVerse.
"Whether it's an AI-native game, an interactive film, or a generative live commerce experience, the narrative responds 'As You Think.' The boundaries between creation and consumption are blurring: viewers become co-creators, instantly shaping and generating new content as they engage. We believe intelligent media should respond in real-time to user intentions, empowering everyone to become a creator of dynamic narratives."
PixVerse R1 is designed for applications where immediacy and continuity matter most:
• Gaming: NPCs and environments adapt in real time to player actions;
• Interactive Entertainment: Viewers shape storylines through voice or gesture;
• Co-Creation: Users collaboratively generate and reshape dynamic worlds,from experimental research and scenario exploration to reimagined media and live product simulations.
The system's architecture also lays groundwork for future AI systems capable of sustained, stateful interaction with simulated environments, aligning with emerging research on world modeling as a pillar of general intelligence.
PixVerse R1 is available now, and is prepared to be available via API to enterprise partners. Technical report and demo resources are accessible at https://realtime.pixverse.ai/.
PixVerse is a generative AI video platform transforming how digital content is created. With intuitive, one-click video generation, users can produce cinematic-quality videos from a video, a photo, or a prompt—powered by multi-modal AI at world-leading generation speed—without any prior production experience. It empowers everyone to become the AI director of their own life.
Founded in 2023 and launched globally in 2024, PixVerse has over 100 million users worldwide, supporting 13 languages across more than 175 countries.
Since its launch, PixVerse has continuously pushed the boundaries of AI video generation while lowering the barriers for users worldwide. In October 2023, PixVerse V1 debuted within just six months, introducing the industry's first AI video generation model capable of producing 4K-quality videos, ahead of the release of the Sora model. In February 2024, PixVerse V2 was launched, marking the implementation of a Diffusion Transformer (DiT) architecture.
PixVerse became the world's first near real-time video generation platform. By October 2024, PixVerse V3.5 introduced template-based transformation effects that encapsulate prompt guidance. Driven by its transformation effects alone, PixVerse attracted over 10 million new users globally within two months, marking a global "ChatGPT moment" for video consumption. In December 2024, the launch of the PixVerse mobile app ushered video generation into the 10-second era.
In 2025, PixVerse continued to advance its technology: In February, PixVerse V4 enabled the generation of high-quality 360p videos in just five seconds. In November, V5 Fast was released, allowing 1080p videos to be generated within 30 seconds. The platform reached 60 million global users in May. By August, PixVerse surpassed 100 million users worldwide.
In September and October, PixVerse raised over USD 60 million in a Series B round led by Alibaba, with participation from Antler. PixVerse achieved an annual recurring revenue (ARR) of over USD 40 million by October.
With ongoing improvements in consistency and motion trajectory capabilities, the PixVerse V5.5 Omni model, released in December, became a video-generation model to support both storyboard-based creation and synchronized audio-visual generation, effectively encapsulating directorial thinking for everyday users. This advancement made producing cinematic-quality videos nearly effortless for everyone.


