Robotics
AI-Powered Robots Learn Human Lip Movement

Columbia Engineers have created a robot capable of mimicking and learning human lip movements during speech. The upgraded design combines advanced robotics with AI, enabling the device—named Emo—to learn from observing human expressions and replicate human emotions when appropriate. Here’s what you need to know.
Why Humanoid Robots Trigger the Uncanny Valley
Since the earliest days of robotics, there has been a quest to create humanoid robots. This task is much easier said than done, as robotic engineers have continually made strides in that direction, but have never fully achieved their goal of creating a device that looks and feels like a real human.
Anyone who has been around even the most basic humanoid robots can attest to the uneasiness that the devices cause in terms of their ability to blend in as humans. The slightest inaccuracies, such as unnatural eye movements or facial expressions, can create this feeling in observers.
The Uncanny Valley
The Japanese roboticist Masahiro Mori noticed this phenomenon in the 1970s. In his now-famous “Bukimi no Tani Gensho” (Valley of Eeriness) essay, he goes into detail on the concept. The paper describes how humanoid robots always reach a point of sharp disconnect with their observers due to subtle flaws.
In 1978, the term made its way into Western scientific circles via Jasia Reichardt’s book “Robots: Fact, Fiction, and Prediction,” which translated the term into its now popular usage, “uncanny valley.” This work builds on Mori’s discussion, describing how the smallest differences can cause adverse reactions in the observer’s connection.
Human Faces are the Hardest Part of the Equation
Over the last few decades, several milestones have been made towards creating humanoid robots. New technology, like LLMs, makes it possible for these devices to communicate using natural language, helping to bridge the gap. However, one of the biggest areas that still requires a lot of attention is the human face.

The human face is a complex mix of tissue, nerves, and muscle that is capable of demonstrating thousands of different expressions, many of which help to communicate feelings to others. In this way, the face is seen as the ultimate communication device.
Robotic engineers have long recognized the importance and difficulty in creating robotic faces that operate like humans. Through years of hard work, robots have managed to obtain human-looking faces, with skin and expressions. Yet, despite billions in research, the connection still lacks.
Swipe to scroll →
| Feature | Human Face | Traditional Humanoid Robots | Columbia AI Lip System |
|---|---|---|---|
| Muscle Complexity | 30+ facial muscles with continuous motion | Limited motors with rigid constraints | 26 motors with soft silicone articulation |
| Lip–Audio Synchronization | Naturally synchronized during speech | Predefined, often delayed movements | Learned dynamically via vision-to-action AI |
| Emotional Expression | Subtle, context-aware micro-expressions | Minimal or exaggerated expressions | Emotionally coherent lip and facial cues |
| Adaptability | Learns continuously through interaction | Static motion libraries | Self-improving through observational learning |
| Uncanny Valley Effect | None | High observer discomfort | Significantly reduced uncanny response |
The Importance of Lips in Communication
Roboticists have continually bumped up against one significant issue when creating humanoid devices—it’s nearly impossible to recreate lip movement. Your lips do more than direct the sound of your voice and help you to pronounce words.
Your lips actually display emotion on a subtle level, which, through millennia of evolution, has become vital to human communication. Notably, your lip motions are one of the most highly focused traits of your face during conversations. Consequently, your brain dedicates more thinking power towards these gestures than other actions like crunching your forehead or winking.
Robots’ Lips Look Unnatural
Despite robots gaining the ability to look nearly human, they still lack in terms of lip facial expression. Decades of research have proven that the tech doesn’t exist to achieve the proper lip-audio synchronization required to create realistic behaviour. As such, robots always appear to have their conversations dubbed rather than spoken. This dubbed voice effect causes these devices to look clumsy and lifeless.
Keenly, human faces rely on dozens of muscles to create emotional responses, and robotic lips don’t have this level of complexity yet. It would require a new type of design to achieve this level of complexity. Additionally, the majority of robotic lip movements are predefined motions set to match certain vocal broadcasts rather than movements designed to create the word naturally. Since robots aren’t actually producing the sound with their lips, the movements come across as unnatural and uncanny.
Columbia Study: Teaching Robots Realistic Lip Movement
Thankfully, a team of Columbia Engineers may have figured out how to cross the uncanny valley. The “Learning realistic lip motions for humanoid face robots¹” study introduces a new type of robotic face that focuses primarily on lip movement and synchronization.
Specialized Hardware
One of the main hurdles the team had to overcome was the stiffness of today’s robotic faces. While there have been many new designs that provide motor-powered reactions in the face, none can support the complexity needed to enable realistic lip movements.
To overcome this limitation, the engineers used purpose-built silicone lips designed to provide maximum expression. Then, they embedded 26 facial motors, a facial action transformer, and a variational autoencoder (VAE).
Vision-to-Action (VLA)
At the core of this technological breakthrough is the vision-to-action AI model. Using this model, a robotic face can autonomously create realistic lips that don’t rely on predefined mechanical settings for movement.
To create the model, the team utilized observational learning methods. This style of programming enables the device to ascertain exact lip dynamics during speech in real time. As such, the first step was to enter the algorithm into a self-supervised learning pipeline.

This step required the engineers to place the robot’s face in front of a mirror and instruct it to create thousands of faces. This action allowed the algorithm to capture its facial expression capabilities. From there, the robot then watched hours of YouTube content.
The combination of audio and lip motion was carefully tracked and used to program the robot’s facial lip AI algorithm. Over a few days, it learned exactly how its face should look from human expression rather than using input parameters. Engineers then added audio and began testing.
How the Lip-Sync AI Was Tested Across Languages
The team tested their theory across 10 different languages and linguistic contexts. The test used completely new languages to the model, ensuring that it would have to compute the proper facial expression and lip movements versus recalling previously trained words. Interestingly, the test also used context and songs.
Uncanny Robots Test Results
The test results showed visually coherent lip-audio synchronization across the board. Notably, the algorithm-powered robot provided realistic lip movement that accurately matched several audio clips. Impressively, it successfully synchronized its lip movements across 10 languages and even sang a song from its AI-generated debut album, hello world_.
Notably, the team did find some limitations to the tech. For one, the robot was unable to consistently reproduce hard lip movements associated with words like “pop”. It also struggled with puckered words like “whistle.” Keenly, the engineers noted that these small imperfections will work themselves out as the algorithm improves over time. This self-learning feature is the best aspect of the algorithm. It will continually improve as it captures more data from humans over time, opening the door for more meaningful human-machine interactions in the future.
Key Benefits of Realistic Humanoid Robotics
There are several benefits that this technology brings to the market. For one, it will allow humans to form a deeper connection with machines. Most people are unaware of just how much communication occurs via facial expressions subconsciously.
This study opens the door for lip sync tech and conversational AI to create human-like experiences that could help fight the loneliness epidemic and more. Using this technology, humanoid robots will be able to get one step closer to crossing the uncanny valley and pushing robotics to a new plateau.
Real-World Applications & Timeline
There are many applications for this technology that stretch across several industries. The obvious use of this tech is to help drive humanoid robotic tech forward. The ability to project soft, warm faces on cold robots could help to drive adoption. Here are some other applications to think about.
Elder Care
While not considered the most tech-savvy people, the elderly have begun to embrace robotics on an entirely new level. The elder care assistive robots market is on the rise, with statistics showing it reached $3.38B in 2025. The same reports predict it will surpass $9.85B by 2033.
The elderly would be more willing to interact and accept robots if they didn’t seem technologically complicated. As such, a robotic assistant that could communicate using speech alongside realistic facial movements could be the perfect fit. Elderly patients could find a connection alongside much-needed assistance.
Entertainment
The entertainment industry could be among the first to adopt this technology. Filmmakers rely on robotics heavily in today’s entertainment industry. From animatronics like those used at theme parks like Disney to motion capture robots used in major films, the devices have pushed the entertainment industry forward.
Today’s entertainment robots sector surpasses $4.72B. This value is predicted to grow to $26.94B by 2034, powered by stronger demand for realistic CGI characters. In the near future, this technology could fill that niche, enabling actors to share their faces with characters in new and more direct ways.
Education
The educational sector is another place where this technology could flourish. Here, these devices could be set up as personalized tutors. Already, some reports have shown that students achieved a 30% boost in math comprehension using robot-adapted lessons.
Adoption Timeline
You can expect to see this technology start to filter into everyday life within the next 5-10 years. Robots are already in many factories and workplaces, with integration only predicted to rise. Roboticists understand that integrating this type of technology can help make their devices more relatable.
Key Researchers at Columbia
The study was hosted by Columbia’s Creative Machines Lab. The paper lists Yuhang Hu, Jiong Lin, Judah Allen Goldfeder, Philippe M. Wyder, Yifeng Cao, Steven Tian, Yunzhe Wang, Jingran Wang, Mengmeng Wang, Jie Zeng, Cameron Mehlman, Yingke Wang, Delin Zeng, Boyuan Chen, and Hod Lipson as contributors.
What Comes Next for Human-Like Robots
The team will now put its focus on perfecting the algorithm further. This step will involve more human interactions and could even evolve into multiple units that are capable of learning in real time and sharing that data with a centralized model.
Investing in Robotics Innovation
The robotics industry is a fast-paced sector that has experienced heavy growth over the last 5 years. The introduction of new technologies like LLMs and 3D printers has helped to drive innovation to new levels. For a comprehensive look at the broader market opportunities, read our guide on investing in Physical AI and humanoid robots in 2026.
Here’s one company that has been at the forefront of this revolution.
Teradyne ($36B)
Teradyne, Inc. (TER -8.05%) is the parent company of Universal Robots (UR), the market leader in “cobots” (collaborative robots). While Teradyne does not build humanoid faces, it is currently the leading player in bringing the “watch-and-learn” AI described in the Columbia study to the factory floor.
Crucially, Teradyne has formed a strategic partnership with Nvidia (NVDA -4.16%) to integrate the “Isaac Manipulator” platform. This allows Teradyne’s robots to use AI cameras to “see” their environment and dynamically adjust their pathing—much like the Emo robot learns to adjust its lips—rather than relying on rigid, pre-written code.
Teradyne, Inc. (TER -8.05%)
2026 Performance & Valuation: Teradyne is widely considered a “blue chip” robotics stock. Its shares surged nearly 50% in 2025 and have continued to rally in early 2026, trading near the $230 range.
Latest Teradyne (TER) News and Performance
Teradyne (TER) Declines More Than Market: Some Information for Investors
Teradyne, Inc. $TER Shares Sold by E. Ohman J or Asset Management AB
3 Top Ranked High Growth Stocks You Can't Ignore: ROKU, TER, CRDO
Assenagon Asset Management S.A. Purchases 882,723 Shares of Teradyne, Inc. $TER
Teradyne Drives Robotics With AI: Is the Growth Thesis Strengthening?
Teradyne to Showcase Leading Test Solutions at SEMICON China 2026
Conclusion
The introduction of realistic robotic faces makes perfect sense. LLMs are now capable of replicating human speech, and when combined with realistic facial expressions, these devices are going to provide a new level of training, learning, healthcare, and more. For now, the team will focus on ironing out imperfections and finding strategic partners and funding.
Learn about other cool robotics breakthroughs here.
References
1. Yuhang Hu et al., Learning realistic lip motions for humanoid face robots. Science Robotics 11, eadx3017 (2026). DOI:10.1126/scirobotics.adx3017










