NEWS  /  Analysis

How LUMOS Robotics is Building the Data Infrastructure for the Robot Age

By  xinyue  Jan 12, 2026, 10:26 p.m. ET

Founded in 2024, LUMOS Robotics does not position itself primarily as a robot manufacturer. Instead, it describes itself as a “super data factory” for embodied intelligence — a company whose job is to industrialize the production of high-quality physical interaction data.

In the race to build robots that can see, think and act in the physical world, the next big bottleneck isn’t chips, algorithms, or even hardware.

It’s data.

As embodied artificial intelligence moves out of research labs and into factories, warehouses and homes, companies are discovering that training robots is fundamentally different from training language models. Text is abundant. The physical world is not.

“Right now, embodied intelligence is running into a data drought,” said Yu Chao, founder and CEO of LUMOS Robotics. “And whoever solves that will control the next phase of the industry.”

If 2025 was the year hardware finally caught up with ambition, 2026 is shaping up to be the year when data becomes the decisive battlefield.

The scaling logic that transformed natural language processing now looms over robotics.

OpenAI’s early robot learning models were trained on tens of thousands of hours of physical interaction data. The next generation will require hundreds of thousands. Industry estimates suggest that by 2026, leading embodied AI systems will need millions of hours of real-world data to approach general-purpose capability.

But collecting physical data is slow, expensive and messy.

Traditional teleoperation — where humans remotely control robots to generate training examples — can cost hundreds of dollars per hour. Much of that data is unusable due to sensor drift, timing mismatches, human inconsistency and non-repeatable actions.

“You pay a lot of money to collect data, and then you throw half of it away,” Yu said.

This creates what industry insiders now call the “physical scaling wall”: AI models can only improve as fast as high-quality physical data can be generated. And that generation process does not naturally scale.

LUMOS Robotics was founded around the idea that this constraint is not a side problem — it is the core problem.

Founded in 2024, LUMOS Robotics does not position itself primarily as a robot manufacturer. Instead, it describes itself as a “super data factory” for embodied intelligence — a company whose job is to industrialize the production of high-quality physical interaction data.

“We’re not just selling robots,” Yu said. “We’re selling the fuel that makes robots intelligent.”

The company’s leadership team reflects that philosophy.

Yu is a Tsinghua University graduate who began researching robot learning in 2016 after reading Pieter Abbeel’s early work on neural network control of robots. He later led embodied robotics projects at Dreame Technology and played a key role in the mass production of Xiaomi’s CyberDog robot.

CTO Cao Junliang holds a PhD in mechanical engineering from Shanghai Jiao Tong University and previously worked on high-performance embodied systems. Co-CTO Ding Yan earned his doctorate in AI from the State University of New York and was a researcher at Shanghai AI Lab.

Together, they saw a gap emerging: models were improving rapidly, but infrastructure for training those models in the real world lagged badly behind.

“Everyone was racing to build smarter brains,” Ding said. “Almost no one was building better nervous systems.”

To formalize this thinking, Yu proposed what he calls the LUMOS Index:

Scenario Value ÷ (Data Cost × Hardware Cost)

The idea is simple: embodied intelligence is only valuable if it works in real scenarios. But those scenarios only scale if both data and hardware costs fall fast enough.

“If you improve intelligence but it costs 10x more to deploy, that intelligence doesn’t matter,” Yu said.

This index guides LUMOS’s entire strategy: reduce the cost of high-quality data, lower hardware friction, and expand scenario applicability — all at once.

At the heart of that strategy is LUMOS’s core product: FastUMI Pro, an industrial-grade system for collecting robot interaction data.

Unlike many academic or hobbyist setups, FastUMI Pro is designed for scale.

On the hardware side, it integrates custom sensors capable of recording multimodal data — vision, motion, force, and touch — at 60Hz with millisecond-level synchronization. That precision allows physical interactions to be replayed, analyzed and learned from reliably.

On the software side, FastUMI Pro decouples data from specific robot designs. The same dataset can be used across dozens of robotic arms, breaking the silo problem that has plagued robotics for decades.

“Robots today speak different dialects,” Ding said. “We’re building the common language.”

The impact is measurable:

  • Data collection time per sample reduced from 50 seconds to 10 seconds.

  • Overall data collection cost reduced by 80%.

  • Usable data rate increased from ~70% industry average to over 95%.

LUMOS achieved this by implementing an eight-step industrial data quality system that filters out unusable data at the source.

“We don’t clean data after the fact,” Ding said. “We prevent bad data from being created in the first place.”

One of LUMOS’s key conceptual contributions is its distinction between “waste data” and “dirty data.”

Waste data is behavior that looks natural but teaches robots nothing useful — such as human folding motions that don’t reveal fabric properties robots need to understand.

Dirty data is technically flawed — misaligned sensors, jitter, drift, or inconsistent timing.

Both are harmful, but in different ways. Waste data wastes scale. Dirty data corrupts learning.

“The worst thing you can do is scale bad data,” Ding said. “You’re not building intelligence. You’re building confusion.”

In December 2025, LUMOS raised several hundred million RMB in Pre-A1 and Pre-A2 rounds from investors including CDH Investments, Nanjing Venture Capital, Jinjing Capital, Jingu Shares, and Shenneng Chengyi.

It has also formed partnerships with companies including Mitsubishi, COSCO Shipping, and DEMATIC to deploy systems in logistics, manufacturing and industrial automation.

Globally, Yu claims more than two-thirds of top embodied AI teams are now using FastUMI Pro.

LUMOS’s boldest goal is its 2026 milestone: one million hours of embodied real-world machine data.

Yu believes that reaching this threshold could trigger a step change similar to GPT-3’s emergence when internet text reached sufficient scale.

“We believe there is a critical mass where physical intelligence starts to generalize,” Yu said. “And that mass is measured in data hours.”

If LUMOS is right, whoever owns that data may define the next generation of robots.

Long-term, LUMOS wants to evolve from tool provider to platform to ecosystem.

It sells hardware. It sells datasets. It builds robots. But its ambition is larger: to define the standards, protocols and infrastructure upon which the embodied AI industry runs.

“Our goal is to become the global foundation layer for embodied intelligence,” Yu said.

That means becoming indispensable — not by owning applications, but by owning the substrate.

The company faces intense competition and rapid technological change. But Yu says speed is the ultimate defense.

“If we move fast enough, external pressure won’t kill us,” he said. “Slowness will.”

In embodied AI, the arms race is no longer about who builds the smartest robot.

It’s about who builds the biggest, cleanest, fastest pipeline between the physical world and machine intelligence.

And in that race, LUMOS Robotics is betting that data — not hardware — is the real oil of the 21st-century robotics economy.

Please sign in and then enter your comment