Comment: Why humanoid robots need specialised social interaction skills

The recent unveiling of 1X’s Neo Beta, a bipedal humanoid robot designed for home use, marks a significant milestone, says Samer Al Moubayed, co-founder & CEO of Furhat Robotics.

Robots should be enabled to engage in meaningful social interactions
Robots should be enabled to engage in meaningful social interactions - AdobeStock

As the CEO of a company at the forefront of humanoid robotics, I’ve witnessed firsthand the rapid advancements transforming our field. For example, the recent unveiling of 1X’s Neo Beta - a bipedal humanoid robot designed for home use - marks a significant milestone. This development isn’t just about introducing a new robot to the market; it’s a symbol of how far we have come in integrating robots into everyday life.

Yet as we celebrate these achievements, we must also acknowledge a critical area that remains largely unexplored, which is in enabling robots to engage in meaningful social interactions. The future of robots like Neo Beta isn’t just in performing household chores, but it is in their ability to connect with people through natural, lifelike, fluid interactions.

The evolution of humanoid robots

Humanoid robots have advanced from science fiction to tangible reality, driven by breakthroughs in AI and robotics technology. Robots are now able to leverage advanced neural networks and machine learning, allowing them to learn motor behaviours from vision and to execute complex tasks like picking up objects, navigation and even understanding basic decision-making in dynamic environments. This approach, based on real-world demonstrations, allows robots to adapt to new tasks through fine-tuning, inching us closer to the goal of general-purpose robotic assistance. 

Voice-controlled interfaces and natural language processing (NLP) have further enhanced robots’ ability to chain tasks and expand context, creating systems for long-horizon dynamic behaviours. Models (LLMs) like Open AI’s GPT series have also revolutionised human-robot communication, making interactions more intuitive and user-friendly. While these breakthroughs mark a major leap in task performance, robots however still lack meaningful human interaction skills. 

The limitation: lack of meaningful social interaction

To be truly useful in daily life, humanoid robots need to go beyond executing commands, they must understand, respond, and engage with humans in ways that feel natural and emotionally resonant. This limitation becomes evident when robots perform tasks in silence, devoid of facial expressions or verbal acknowledgements, leading to interactions that feel mechanical and disconnected.

In human communication, non-verbal cues such as eye contact, gestures like nodding, and facial expressions play a pivotal role. Robots that lack these elements can inadvertently cause fear or frustration amongst users. For example, a robot that doesn’t make eye contact or that fails to respond to social cues may be perceived as inattentive or unresponsive, diminishing the user experience. 

Bridging the gap with advanced social capabilities

To address this challenge, we must equip humanoid robots with advanced social interaction skills that mirror the complexity of human communication. This involves not only processing verbal language but also interpreting and responding to non-verbal signals in real time. Achieving this however requires integrating multiple technical components. 

Multi-Modal Sensory Input

Robots need to have advanced sensors to perceive visual, auditory and even tactile information. High-resolution cameras, depth sensors and microphone arrays allow robots to detect facial expressions, body language and speech nuances. For instance, eye-tracking technology can enable a robot to maintain eye contact, enhancing the sense of engagement

Emotion Recognition Algorithms

We need to implement machine learning models that can interpret human emotions based on facial expressions and vocal tones which will, in turn, enable robots to respond appropriately. 

Contextual Understanding

Beyond immediate interactions, it is essential that robots understand the broader context, including cultural norms and individual user preferences. This requires sophisticated natural language understanding and knowledge representation systems.

Adaptive Behaviour Generation

It is also key that robots are able to generate responses that are not only contextually appropriate, but that dynamically adapt to the flow of conversation. This involves complex decision-making algorithms that consider timing, social norms and emotional impact.

Challenges with current AI models

While LLMs have advanced this field considerably, they are not a panacea for all aspects of human-robot interaction. They excel at generating contextually relevant text but lack real-time responsiveness and the ability to process non-verbal cues. They also struggle with multi-turn conversations that require memory of previous interactions and awareness of the conversational dynamics between (groups of) people. Moreover, LLMs do not inherently possess situational awareness or the ability to translate verbal commands into precise physical actions without additional layers of processing. 

The role of specialised training data and customised models

Developing these advanced social capabilities therefore goes far beyond simple language processing. It hinges on training models with specialised datasets that capture the intricacies of human behaviour in the physical world, in all of its complexity. This includes annotated examples of social cues, multi-party conversations, and non-verbal communication patterns. Collecting and curating such data is a significant challenge but it is nonetheless essential for creating robots that can interact naturally with humans. 

Integrating verbal and non-verbal communication

Balancing verbal and non-verbal communication is one of the most complex challenges in humanoid robotics. Even a perfectly articulated verbal response can feel awkward if the robot’s gaze or gestures seem off. Synchronising speech with appropriate facial expressions and body language requires precise timing and coordination. Technologies such as real-time gesture recognition and motion planning algorithms are essential for this integration. 

Future outlook: embracing the next frontier in humanoid robots

The evolution of humanoid robots like 1X’s Neo Beta from task performers to trusted companions’ hinges on our ability to equip them with social interaction skills. We are currently standing at the cusp of a new era where humanoid robots become integral partners in our daily life. The journey ahead is challenging but immensely rewarding and, by focusing on developing advanced social interaction skills, we can unlock the full potential of humanoid robots. As engineers and innovators, we have the responsibility and opportunity to shape this future and, together, pave the way for robots that not only work alongside us but that also understand and authentically resonate with the human experience.

Samer Al Moubayed, co-founder & CEO of Furhat Robotics