Development

Building ReachyArchi: A Voice-Driven Robotic AWS Solutions Architect

How we combined a Reachy humanoid robot with Amazon Bedrock Nova Sonic to create an AI-powered Solutions Architect for AWS Summits

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

5 min read
Share:

The Problem

At AWS Summits, 10,000+ attendees compete for time with only 12 β€œAsk an Architect” booths. Wait times of 30+ minutes are common, and many attendees leave without getting architecture guidance.

What if we could scale the Solutions Architect experience using AI and robotics?

The Solution

ReachyArchi is an AI-powered robotic Solutions Architect that combines:

The result: instant, personalized AWS architecture consultations with an engaging robotic presence.

Architecture Overview

Why Single BidiAgent?

After evaluating multiple agentic patterns, we chose Single BidiAgent:

PatternVerdictRationale
Single BidiAgentChosenBest voice performance, no handoff latency
GraphRejectedOverkill for mostly linear conversation flow
SwarmRejectedNo parallel independent agents needed
HierarchyRejectedSingle agent handles all phases efficiently

The bidirectional streaming model is essential for voice interactions - it allows continuous audio input/output while the agent reasons and calls tools concurrently.

The Integration Challenge

Tools run on Amazon Bedrock AgentCore Runtime, but the Reachy SDK runs locally on the robot. The solution: WebSocket Command Events.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     WebSocket      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AgentCore (AWS)    │◄──────────────────►│  Reachy Mini        β”‚
β”‚  - BidiAgent        β”‚   robot_command    β”‚  - SDK Control      β”‚
β”‚  - Robot Tools      β”‚   {action, params} β”‚  - Audio I/O        β”‚
β”‚  - Arch Tools       β”‚                    β”‚  - Motor Execution  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tools send JSON command events; the client translates them to SDK calls.

Implementation Deep Dive

Robot Tools: Cloud-to-Hardware Bridge

Each tool is an async function that sends commands via WebSocket - never importing the Reachy SDK directly:

@tool
async def nod_yes() -> str:
    """Nod to show agreement or understanding."""
    await _send({
        "type": "robot_command",
        "action": "animation",
        "params": {"name": "nod_yes"}
    })
    return "Nodded yes"

This pattern cleanly separates cloud reasoning from local hardware control.

System Prompt Engineering

The system prompt ensures ReachyArchi is expressive, not just a voice assistant:

CRITICAL: TOOL CALLING
ALWAYS call robot tools - NEVER just say their names. Every response needs movement!
- WRONG: Saying "wave_hello" or "I'm nodding"
- CORRECT: Actually invoke wave_hello() as function calls

MOVEMENT RULES - EVERY RESPONSE!
Call at least one robot tool per response to feel alive.

Barge-In Handling

Voice UX requires handling interruptions gracefully:

model = BidiNovaSonicModel(
    provider_config={
        "turn_detection": {
            "endpointingSensitivity": "HIGH"  # Fast barge-in for booth demos
        }
    }
)

When a user speaks mid-response, the client clears the audio buffer immediately and processes new input.

6-Phase Conversation Flow

ReachyArchi follows a state machine for structured interactions:

IDLE β†’ GREETING β†’ INCEPTION β†’ DESIGN β†’ ITERATION β†’ DELIVERY β†’ FAREWELL
                     ↑           β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ (needs_more_info)
  • GREETING: Wave hello, introduce self in French
  • INCEPTION: Ask 1-2 targeted questions (tilt head with look_curious())
  • DESIGN: Generate PNG + JSON diagrams in parallel
  • ITERATION: Refine based on feedback
  • DELIVERY: Generate QR code for companion app
  • FAREWELL: Wave goodbye, reset session

Demo in Action

Watch ReachyArchi in action: Demo Video on YouTube

A 90-second interaction showcases the full experience:

[User] "Bonjour Reachy!"
[Reachy] *waves and responds in French*

[User] "I want to build a mobile app with a REST API and a database"
[Reachy] "What type of workload? Serverless or containers?"

[User] "Serverless, high traffic"
[Reachy] "I recommend [API Gateway](https://aws.amazon.com/api-gateway/), [Lambda](https://aws.amazon.com/lambda/), and [DynamoDB](https://aws.amazon.com/dynamodb/). Generate the diagram?"

[User] "Oui!"
[Reachy] *generates architecture diagram - React frontend updates live*

[User] "Reachy, tu connais Werner Vogels?"
[Reachy] *dances* "Everything fails, all the time!"

The companion app updates in real-time via AWS AppSync as diagrams are generated.

Key Takeaways

  1. WebSocket command events: Cleanly separate cloud AI from local hardware. Tools send JSON, clients execute - no SDK imports in cloud code.

  2. Explicit tool invocation prompts: LLMs may β€œdescribe” tool calls instead of executing them. Be explicit: β€œALWAYS call tools, NEVER just say their names.”

  3. HIGH barge-in sensitivity: Essential for natural booth conversations. Users will interrupt - handle it gracefully.

Try It Yourself

The project is open source! Check out the code and try it yourself:

Tech stack:

Check out the Strands Agents documentation to build your own voice-driven agent.

What’s Next

  • Multi-language support (French/English/German)
  • Human SA escalation notification system
  • Load testing for AWS Summit booth capacity

Target Metrics:

MetricTarget
Interactions per summit500+
Wait time reduction30 min β†’ <2 min
Customer satisfaction4.0+/5.0

Have questions? Connect with me on LinkedIn or check out more posts on agiusalexandre.com.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Related Posts

Back to Blog