NEWS  /  Analysis

Cost of Large Model APIs Down by Over 90%: Agora CEO

By  Innovation-Insight  Oct 28, 2024, 5:27 a.m. ET

The financial report shows that in the second quarter of this year, Agora's domestic revenue was 131.9 million yuan, up 0.3% from year over year. As of June 30, Shengwang had 3,774 active customers, with a net retention rate of 79% over the past 12 months.

Tony Zhao, Founder and CEO of Agora

Tony Zhao, Founder and CEO of Agora

AsianFin -- With the launch of several real-time voice dialogue models such as OpenAI GPT-4o, RTE (Real-Time Engagement) technology is once again entering a new period of development.

At the recent RTE 2024 Real-Time Internet Conference, Tony Zhao, the founder and CEO of Agora, said in his speech that OpenAI in the United States has recently reduced its API call costs and prices by more than 90%. In the Chinese market, a price war is also brewing, with various algorithm and model innovations emerging rapidly and competitively. Therefore, generative AI will provide unlimited imagination space and broad possibilities, including its combination with RTE and real-time interaction capabilities, possessing enormous technological potential.

"In the next 10-20 years, whether it's PCs or smartphones, the main evolution axis will inevitably be how to better support large model capabilities on the terminal side, as well as the improvement and maturation of inference capabilities." Zhao emphasized that generative AI is driving a major transformation in the IT industry, with this trend mainly reflected in four directions: terminals, software, cloud, and human-computer interfaces. Agora will collaborate with the large model unicorn MiniMax to create China's first realtime API.

Agora, founded in 2014, is a global real-time interaction cloud service provider, offering PaaS (Platform as a Service), real-time interaction cloud, and other technical services across multiple fields such as social live streaming, education, gaming and esports, IoT, AR/VR, finance, insurance, healthcare, and enterprise collaboration.

At the end of June 2020, Shengwang's parent company, Agora, Inc., was listed on NASDAQ.

In the FY2023 Q1 earnings report, Zhao announced that in an effort to streamline organizational structure and improve operational efficiency, Agora, Inc. will operate under different brands—SoundNet and Agora as two independent companies. The U.S. and international businesses will operate under the Agora brand, while the Chinese business will operate under the SoundNet brand. "We believe this strategic restructuring will allow us to optimally focus resources on the priorities of each business—driving the growth of the Agora business and competing more effectively in the SoundNet business, while considering the unique economic and product needs of each market's customers. As new opportunities arise, this new organizational structure will also make us more agile."

In August this year, the latest financial report released by Agora, Inc. showed that in the second quarter of FY2024, the company's total revenue was $34.2 million, a year-on-year increase of 0.5%. Among them, benefiting from increased sales in industries such as IoT, SoundNet's domestic revenue was 131.9 million RMB ($18.6 million), a year-on-year increase of 0.3%.

As of June 30, 2024, Shengwang had 3,774 active customers, with a net retention rate of 79% over the past 12 months.

Now, with the global economic recovery and the global tech shift towards AI technology, the generative AI sector is seeing vast prospects.

According to the latest report from McKinsey, in 2023, the global generative AI market size was $67 billion, and it is expected to reach $399 billion by 2027 and $1.3 trillion by 2032, with a compound annual growth rate of 42% from 2023 to 2032.

In May 2024, OpenAI launched a new flagship AI model, GPT-4o, which is free to use and can perform real-time audio, visual, and text reasoning, responding to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, matching human conversational response speeds. In terms of API usage, GPT-4o is priced at half (50%) of GPT-4-Turbo and is twice as fast (200%).

OpenAI CEO Sam Altman said that the new GPT-4o is the best model OpenAI has ever created. It is intelligent, fast, natively multimodal, and available to all ChatGPT users, whether on the free version or the paid GPT-4 version.

In October this year, Agora, a real-time audio and video technology company and a sister company of Agora, drew attention as a voice API collaborator in the public beta release of OpenAI's Realtime API. Prior to this, the last time Agora and Agora garnered attention was two years ago when the real-time voice interaction app Clubhouse became a global sensation, with Agora providing real-time interaction technology support, and the company's market value once soared to over $10 billion.

Since the beginning of the year, the stock of Agora, Inc. group has risen by about 20%, mainly benefiting from the generative AI and overseas live e-commerce boom.

On October 25, Zhao revealed in a speech that over the past decade, the penetration of RTE capabilities in various mobile applications and software has increased from less than 1% to about 7% in 2021, and now exceeds 10%, continuing to penetrate various applications at a rate of about 1% per year.

Currently, Agora serves over 70 billion minutes per month (measured by frequent users).

At the conference, Agora officially released the RTE+AI capability panorama, which includes five dimensions: real-time AI infrastructure, RTE+AI ecosystem capabilities, Agora AI Agent, real-time multimodal conversational AI solutions, and RTE+AI application scenarios, showcasing the current technological capabilities and application solutions of the integration of RTE and AI.

Zhao emphasized that generative AI is driving changes at four levels: terminal, software, cloud, and human-machine interface. At the terminal level, large model capabilities will drive PCs and Phones to evolve towards AI PCs and AI Phones; in software, all software will be re-implemented through large models, evolving from Software with AI to AI Native Software; at the cloud level, all clouds need to have the ability for large model training and inference, and AI Native Cloud will become mainstream; the mainstream interaction method of human-machine interfaces will shift from keyboard, mouse, and touch screen to natural language user interface (LUI).

Additionally, at this year's RTE event, the discussion also covered the "AI $600 billion spending dilemma" proposed by Sequoia Capital partner David Cahn, which highlights the significant gap between the massive investment in AI infrastructure and actual revenue.

In response, Lepton AI founder and CEO Jia Yangqing believes that models of the same size will become increasingly powerful, especially through technologies like distillation and compression. The current Llama 3.2 3B (3 billion parameters) model can even match the capabilities of the previous Llama 70B model. Apart from a few leading companies, more and more enterprises will adopt "open source + fine-tuning" to develop the next generation of models, making the application of open source architectures increasingly common.

Jia Yangqing predicts that inference costs will drop to 1/10 of the current level within a year. Entrepreneurs can calculate costs based on the assumption that building an application will cost 1/10 of what it does now, to see if it's feasible. This includes models, hardware, and applications, all of which can reduce costs as they scale.

Hugging Face engineer Wang Tiezhen stated that worrying about AI replacing humans is premature, but AI has already negatively impacted some industries, such as the realistic fake video effects, including their impact on teenagers' psychology, which presents many entrepreneurial opportunities.

MiniMax partner Wei Wei emphasized that with the advent of multimodal AI, the boundaries of generative AI will continue to expand. Models for text, voice, music, and video can significantly help creators in the arts, film, and music industries improve efficiency and accelerate industry transformation.

"Over the past decade, real-time interaction has evolved from a concept into an industry. Real-time interaction technology has not only facilitated exponential growth in dozens of industries and hundreds of scenarios, such as social entertainment, online education, IoT, and enterprise services, but also supported the evolution of many internet trends. We have reason to expect the next decade to be even more dynamic and exciting, ushering in a new chapter of RTE in the era of generative AI," Zhao Bin said at the end of his speech.

(Author | Lin Zhijia, Editor | Hu Runfeng)

Please sign in and then enter your comment