NEWS  /  Analysis

ByteDance and Kuaishou in Fierce Battle Over AI Large Models

By  xinyue  Jul 17, 2024, 5:23 a.m. ET
Threads (10)

Despite the rapid progress, the commercialization of AI models remains nascent. Many industry insiders acknowledge that current AI models are limited in application scope, serving primarily as productivity tools rather than comprehensive solutions.

AsianFin--ByteDance, the parent company of TikTok, plans to unveil major advancements in AI models this Friday. The focus will be on innovative technologies for generating long-form, high-dynamic videos.

Earlier, ByteDance announced that it would host the ByteDance AI Luminary Talks in Singapore, a forum exploring AI technologies related to World Models. At this event, ByteDance’s research scientist Zhou Daqian will present on the topic of "Continuous High-Dynamic Long Video Generation Solutions" on July 19.

An insider told TMTPost that AI large models have been designated as a top priority within ByteDance. Another source mentioned that multiple teams within the company, including those in Douyin and video editing app CapCut division, are working on AI video model applications, with announcements expected soon.

Meanwhile, Kuaishou, one of the archrivals to Douyin and China’s second-largest short video company, introduced several new features for its text-to-video model, Kling AI, at the World Artificial Intelligence Conference (WAIC) in Shanghai days ago. One of the highlights was the model's ability to generate videos up to ten seconds long.

At WAIC, visitors eagerly lined up to try out the Sora-like tool, which is currently available by invitation only. Users could input simple prompts such as “a panda eating salmon” and “the Mona Lisa putting on her glasses,” and Kling AI would generate videos that almost perfectly rendered these scenarios.

These AI-generated videos quickly spread across the Chinese internet. Kling AI has been used to create clips featuring characters from historical films performing modern-day tasks, spawning numerous memes and viral content.

Kuaishou disclosed that the Kling platform, touted as the world's first user-accessible, photorealistic video generation model, had amassed over 500,000 applicants and more than 300,000 activated users, generating over seven million videos.

A member of staff from Kuaishou’s large language model team said that they were not at liberty to disclose the data used to train Kling AI, but indicated that it was open source.

The TikTok rival meanwhile announced at WAIC that its Midjourney-like model Kolors would become open source, a move which Kuaishou said aims to advance the prosperous ecosystem for the text-to-image generation community.

This intensifying rivalry between Douyin and Kuaishou opened up a new battlefield in China's AI video model landscape, a natural extension of their rivalry in the short video platform sector.

AI is fundamentally intertwined with the functionality of short video platforms, said Gai Kun, the senior vice president of Kuaishou.

AI is essential for business operations, as users seek immersive experiences that require sophisticated machine learning to maintain engagement. Compared to e-commerce, search engines, and other scenarios, short videos are in particular need of AI technology, he added.

Gai, who now oversees AI products at Kuaishou, previously worked at ByteDance. Kuaishou launched its AI strategy in 2023, according to CEO Cheng Yixiao, who said that generative AI has a “very rich combination of business scenarios and huge value potential” for the content platform.

Over the past decade, as China’s mobile internet evolved, many products aimed to capture user attention, yet it was short video platforms like Douyin and Kuaishou that emerged as national favorites. Douyin boasts over 600 million daily active users, while Kuaishou has 394 million daily active users as of Q1 2024.

The emergence of OpenAI's Sora AI video generation model in February 2024, which can create realistic movie-like scenes from simple prompts or static images, has set a new standard in AI technology.

This capability, hailed as a significant milestone towards achieving Artificial General Intelligence (AGI), has spurred Chinese companies to develop similar models.

Gai pointed out that Kuaishou's AI technology is primarily applied in three aspects of short video: content recommendation, content production, and content understanding. In the era of large models, Kuaishou has implemented technologies such as the Kuaiyi language large model, the SIM recommendation model with trillions of parameters, the text-to-image generation model announced in May, and the Kling AI video model released in June.

Not only in technology, but Kuaishou is also strategically advancing in commercialization.

Gai mentioned that leveraging the Kuaiyi large model, they have developed capabilities for generating video and livestream scripts, enhancing advertising search services, and integrating digital human technology. This has contributed to an average daily increase of 20 million in AIGC consumption.

In contrast, ByteDance, not yet publicly listed, maintains a more secretive approach to its AI endeavors. Over the past year, ByteDance has pushed AI model development, from foundational models like the Doubao large language model and the multimodal BuboGPT to AI applications such as the Flow AI department's chatbot products and AI learning tools.

At the AI application layer, ByteDance established a new AI department called Flow in last November. They have since launched three AI conversational products, including DouBao, Kouzi, and Cici.

DouBao is a chatbot product capable of performing tasks such as question answering, text generation, and language translation. It adapts to user demands and context for personalized responses. Kouzi serves as an all-in-one AI bot development platform, allowing users, regardless of programming background, to swiftly create various question-and-answer bots based on AI models. These bots can handle simple queries as well as complex dialogues.

In 2024, ByteDance has intensified its product development efforts, rolling out a variety of AI large-model domain products. These include AI learning partner Hippo AiXue, AI interactive storytelling product MaoXiang, AI-generated image product PicPci, multimodal digital human products, as well as AI-generated image and video products.

ByteDance has also ventured into AI hardware, focusing on wearable AI devices and handheld AI devices. The company recently acquired Oladance, a headphone brand, to explore AI-driven wearables. Additionally, ByteDance’s CapCut is developing an AI product named Jimeng.

At the infrastructure level, ByteDance’s Volcano Engine introduced a self-developed video codec chip. However, ByteDance Vice President Yang Zhenyuan clarified that the company has no plans to enter the general-purpose chip market, such as CPUs or GPUs.

Despite the rapid progress, the commercialization of AI models remains nascent. Many industry insiders acknowledge that current AI models are limited in application scope, serving primarily as productivity tools rather than comprehensive solutions.

Wang Hua, the co-founder of Sinovation Ventures, conducted a statistic indicating that China is currently in the early stage of application explosion, akin to the first half of the year in the United States. Despite recent extensive product promotions and rapid user growth seen across many platforms, when all applications are combined, daily active users (DAUs) total only about 100 million in China, a stark contrast to its 1.2 billion internet users. In comparison, the United States with a population of 300 million has tens of millions of DAUs, highlighting a significant gap.

This suggests that while the future of AI has arrived, the commercialization of models is still in its infancy, indicating that the application of models is just beginning to take root.

"People are still too anxious. It has only been a little over a year since large models began. The entire application development fundamentally needs to progress gradually with the maturity of models and the construction of the entire application ecosystem," Wang remarked.

ChatGPT serves as a general-purpose tool, with users averaging about seven to eight minutes of usage. However, many startups use the tool for over 150 minutes. If the cost of inference decreases by a factor of 10, tool-type applications can achieve large-scale usage for free, he said.

Therefore, for tools with large user bases, this will be achievable by the end of this year or early next year. Moving forward, applications in daily life such as food, clothing, housing, and transportation will require higher model performance and integrated business models, Wang added.

McKinsey's latest global survey indicates that 65% of respondents frequently use generative AI, a significant increase from the previous year's 33%. The service industry has seen the highest growth in AI technology adoption. Additionally, 75% of respondents predict that generative AI will bring significant or disruptive changes to their industries in the coming years.

Several sources familiar with ByteDance suggest that the company employs an internal "survival of the fittest" approach, fostering competition among teams to accelerate AI development.

According to Minimax founder and CEO Yan Junjie, it might take another three years for China to create a "killer" AI application. The competition will drive continuous innovation in AI technologies.

As ByteDance and Kuaishou continue to integrate AI seamlessly into their platforms, they aim to enhance user experience and engagement, ultimately striving to establish themselves as leaders in the AI-driven era. The journey to monetizing AI is just beginning, and the industry eagerly anticipates the next wave of breakthroughs.

Related threads
Please sign in and then enter your comment