AsianFin—A recent news report about poor capabilities of many large AI models in solving simple arithmetic problems has sparked heated discussions in China.
Users asked 12 AI models, including GPT-4o, whether "Which number is bigger? 9.11 or 9.9?" Only four models—Alibaba's Tongyi Qianwen, Baidu's Wenxin Yiyan, Minimax, and Tencent's Yuanbao—provided the correct answer, while the other eight, including ChatGPT-4o, gave incorrect responses.
This discrepancy highlights significant issues with the mathematical capabilities of large AI models, showing numerous problems that need to be addressed.
In an exclusive interview with TMTPost, Qi Peng, Director of the AI Large Model Center at Shanghai-Chongqing Institute of Artificial Intelligence, noted that while large models have immense potential and can handle complex problems with generalization abilities, their current level of intelligence is still rudimentary.
Qi likened these models to "five-year-old children" due to limitations such as insufficient computational power, inadequate text data, and challenges with accuracy and reliability.
Qi holds bachelor’s and master's degrees at Tsinghua University and a Ph.D. from the University of Wisconsin-Madison and has extensive experience in data science and AI. Under his leadership, the Shanghai-Chongqing Institute of Artificial Intelligence has developed the "Zhao Yan" large language model, which ranked third globally and second domestically in the SuperCLUE Chinese Large Model Intelligence Benchmark in March this year.
Additionally, in July, Qi and his team, including PhD student Zhuang Shaobin, replicated the Sora text-to-video model in an open-source community project. The advanced Latte spatiotemporal decoupling attention architecture enabled the generation of 16-second (128-frame) videos, a significant improvement from the previous 3-second (24-frame) capability.
Qi explained that the Sora model functions like a new "tool" that addresses various issues. Beyond video generation, Sora can be applied in areas such as autonomous driving and physical world simulation. The most immediate application is in video creation, where users can input text descriptions to rapidly produce videos, thus enhancing efficiency and convenience.
Qi also observed that while large models have broad applications across various sectors, real-world deployment remains limited. The primary challenges include the models' mathematical and engineering deficiencies and the inherent limitations of statistical methods in achieving 100% accuracy.
Looking to the future of artificial general intelligence (AGI) development, Qi emphasized that humanity is at a pivotal moment on the path to AGI. Although current models have not yet reached AGI standards, he believes that ChatGPT has positioned human beings at a critical juncture in history.
While the intelligence of large models can continue to advance from a child's level to that of top experts, they will always require supportive infrastructure and tools for effective operation and application. Although developing these facilities might be relatively inexpensive, they are crucial for the practical use and societal value of large models, Qi added.