
Scientists have created a robot that learns lip movements by watching humans rather than following preset rules. The breakthrough could help future robots feel more natural and emotionally engaging.
When people speak face to face, a surprisingly large share of attention is directed toward the movement of the lips. Robots, however, have struggled for decades to reproduce this basic part of communication. Even the most advanced humanoid machines often rely on stiff, exaggerated mouth motions that feel cartoonish, assuming they have a face at all.
Humans place enormous weight on facial expression, especially the mouth. An awkward step or clumsy hand motion is easy to ignore, but even a small error in facial movement quickly stands out. This sensitivity is part of what researchers call the “Uncanny Valley,” where robots appear unsettling instead of lifelike. Poor lip motion is a major reason many robots feel emotionless or eerie, but researchers say that barrier may finally be weakening.
A Robot That Learns Facial Motion
On January 15, researchers at Columbia Engineering announced a significant advance. They built a robot that can learn how to move its lips for speaking and singing rather than relying on preprogrammed rules. In a study published in Science Robotics, the team showed the robot forming words in multiple languages and even singing a track from its AI generated debut album “hello world_.”
The robot gained this ability through observation instead of instructions. It first learned how to control its own face by watching its reflection in a mirror, gradually understanding how its 26 facial motors shaped different expressions. After that, it studied hours of YouTube videos to observe how human mouths move during speech and song.
“The more it interacts with humans, the better it will get,” said Hod Lipson, James and Sally Scapa Professor of Innovation in the Department of Mechanical Engineering and director of Columbia’s Creative Machines Lab.
Scientists have taught a robot to learn lip movements by observation, much like a human learning in front of a mirror. The advance could make future humanoid robots feel warmer, more natural, and far less creepy when they talk. Credit: Yuhang Hu/Creative Machines Lab
Teaching a Robot to Watch Itself Talk
Creating believable lip motion in robots is difficult for two key reasons. It requires advanced hardware with flexible facial material and many small motors that must operate quickly and quietly together. It also demands precise coordination between sound and motion, since lip movement depends on rapidly changing speech sounds and phonemes.
Human faces rely on dozens of muscles beneath soft skin that naturally move in sync with speech. Most humanoid robots, by contrast, use rigid faces with limited motion. Their lip movements are usually dictated by fixed rules, which produces expressions that look mechanical and unnatural.
To overcome this, the research team designed a flexible robotic face with extensive motor control and allowed the robot to learn facial movement through experimentation. The robot was placed in front of a mirror and produced thousands of random expressions and mouth movements. Over time, it learned which motor actions created specific facial appearances. This process relied on a system known as a “vision-to-action” language model (VLA).
Learning From Human Speech and Song
Once the robot understood how its own face worked, researchers showed it videos of people speaking and singing. The AI system observed how mouths changed shape in response to different sounds. By combining this information with its self learned facial control, the robot was able to translate audio directly into lip movement.
The team tested the system across different languages, sounds, and settings, including music. Even without understanding the meaning of the audio, the robot was able to move its lips in time with what it heard.
The researchers acknowledge clear limitations. “We had particular difficulties with hard sounds like ‘B’ and with sounds involving lip puckering, such as ‘W’. But these abilities will likely improve with time and practice,” Lipson said.
Lip Sync as Part of Bigger Communication
The researchers emphasize that synchronized lip movement is only one part of a larger goal. Their focus is on helping robots communicate in ways that feel natural and emotionally meaningful.
“When the lip sync ability is combined with conversational AI such as ChatGPT or Gemini, the effect adds a whole new depth to the connection the robot forms with the human,” said Yuhang Hu, who led the study during his PhD work. “The more the robot watches humans conversing, the better it will get at imitating the nuanced facial gestures we can emotionally connect with.”
“The longer the context window of the conversation, the more context-sensitive these gestures will become,” he added.
Facial Expression as a Missing Capability
The research team believes emotional facial expression represents a major gap in current humanoid robotics.
“Much of humanoid robotics today is focused on leg and hand motion, for activities like walking and grasping,” Lipson said. “But facial affection is equally important for any robotic application involving human interaction.”
Lipson and Hu expect lifelike faces to become increasingly important as humanoid robots are used in entertainment, education, healthcare, and elder care. Some economists estimate that more than one billion humanoid robots could be produced over the next decade.
“There is no future where all these humanoid robots don’t have a face. And when they finally have a face, they will need to move their eyes and lips properly, or they will forever remain uncanny,” Lipson said.
“We humans are just wired that way, and we can’t help it. We are close to crossing the uncanny valley,” Hu added.
Risks and Responsible Development
This work builds on Lipson’s long term effort to help robots form stronger connections with humans by learning facial behaviors such as smiling, eye contact, and speech. He argues these skills must be learned through observation rather than encoded through rigid programming.
“Something magical happens when a robot learns to smile or speak just by watching and listening to humans,” he said. “I’m a jaded roboticist, but I can’t help but smile back at a robot that spontaneously smiles at me.”
Hu noted that the human face is one of the most powerful tools for communication and that researchers are only beginning to understand how it works.
“Robots with this ability will clearly have a much better ability to connect with humans because such a significant portion of our communication involves facial body language, and that entire channel is still untapped,” Hu said.
The team also recognizes the ethical concerns involved in creating machines that can emotionally engage with people.
“This will be a powerful technology. We have to go slowly and carefully, so we can reap the benefits while minimizing the risks,” Lipson said.
Reference: “Learning realistic lip motions for humanoid face robots” by Yuhang Hu, Jiong Lin, Judah Allen Goldfeder, Philippe M. Wyder, Yifeng Cao, Steven Tian, Yunzhe Wang, Jingran Wang, Mengmeng Wang, Jie Zeng, Cameron Mehlman, Yingke Wang, Delin Zeng, Boyuan Chen and Hod Lipson, 14 January 2026, Science Robotics.
DOI: 10.1126/scirobotics.adx3017
Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.