Skip to content
Development

Building ReachyArchi: A Voice-Driven Robotic AWS Solutions Architect

How we combined a Reachy humanoid robot with Amazon Bedrock Nova Sonic to create an AI-powered Solutions Architect for AWS Summits

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

5 min read
Share:

The Problem

At AWS Summits, 10,000+ attendees compete for time with only 12 β€œAsk an Architect” booths. Wait times of 30+ minutes are common, and many attendees leave without getting architecture guidance.

What if we could scale the Solutions Architect experience using AI and robotics?

The Solution

ReachyArchi is an AI-powered robotic Solutions Architect that combines:

The result: instant, personalized AWS architecture consultations with an engaging robotic presence.

Architecture Overview

Why Single BidiAgent?

After evaluating multiple agentic patterns, we chose Single BidiAgent:

PatternVerdictRationale
Single BidiAgentChosenBest voice performance, no handoff latency
GraphRejectedOverkill for mostly linear conversation flow
SwarmRejectedNo parallel independent agents needed
HierarchyRejectedSingle agent handles all phases efficiently

The bidirectional streaming model is essential for voice interactions - it allows continuous audio input/output while the agent reasons and calls tools concurrently.

The Integration Challenge

Tools run on Amazon Bedrock AgentCore Runtime, but the Reachy SDK runs locally on the robot. The solution: WebSocket Command Events.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     WebSocket      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AgentCore (AWS)    │◄──────────────────►│  Reachy Mini        β”‚
β”‚  - BidiAgent        β”‚   robot_command    β”‚  - SDK Control      β”‚
β”‚  - Robot Tools      β”‚   {action, params} β”‚  - Audio I/O        β”‚
β”‚  - Arch Tools       β”‚                    β”‚  - Motor Execution  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tools send JSON command events; the client translates them to SDK calls.

Implementation Deep Dive

Robot Tools: Cloud-to-Hardware Bridge

Each tool is an async function that sends commands via WebSocket - never importing the Reachy SDK directly:

@tool
async def nod_yes() -> str:
    """Nod to show agreement or understanding."""
    await _send({
        "type": "robot_command",
        "action": "animation",
        "params": {"name": "nod_yes"}
    })
    return "Nodded yes"

This pattern cleanly separates cloud reasoning from local hardware control.

System Prompt Engineering

The system prompt ensures ReachyArchi is expressive, not just a voice assistant:

CRITICAL: TOOL CALLING
ALWAYS call robot tools - NEVER just say their names. Every response needs movement!
- WRONG: Saying "wave_hello" or "I'm nodding"
- CORRECT: Actually invoke wave_hello() as function calls

MOVEMENT RULES - EVERY RESPONSE!
Call at least one robot tool per response to feel alive.

Barge-In Handling

Voice UX requires handling interruptions gracefully:

model = BidiNovaSonicModel(
    provider_config={
        "turn_detection": {
            "endpointingSensitivity": "HIGH"  # Fast barge-in for booth demos
        }
    }
)

When a user speaks mid-response, the client clears the audio buffer immediately and processes new input.

6-Phase Conversation Flow

ReachyArchi follows a state machine for structured interactions:

IDLE β†’ GREETING β†’ INCEPTION β†’ DESIGN β†’ ITERATION β†’ DELIVERY β†’ FAREWELL
                     ↑           β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ (needs_more_info)
  • GREETING: Wave hello, introduce self in French
  • INCEPTION: Ask 1-2 targeted questions (tilt head with look_curious())
  • DESIGN: Generate PNG + JSON diagrams in parallel
  • ITERATION: Refine based on feedback
  • DELIVERY: Generate QR code for companion app
  • FAREWELL: Wave goodbye, reset session

Demo in Action

Watch ReachyArchi in action: Demo Video on YouTube

A 90-second interaction showcases the full experience:

[User] "Bonjour Reachy!"
[Reachy] *waves and responds in French*

[User] "I want to build a mobile app with a REST API and a database"
[Reachy] "What type of workload? Serverless or containers?"

[User] "Serverless, high traffic"
[Reachy] "I recommend [API Gateway](https://aws.amazon.com/api-gateway/), [Lambda](https://aws.amazon.com/lambda/), and [DynamoDB](https://aws.amazon.com/dynamodb/). Generate the diagram?"

[User] "Oui!"
[Reachy] *generates architecture diagram - React frontend updates live*

[User] "Reachy, tu connais Werner Vogels?"
[Reachy] *dances* "Everything fails, all the time!"

The companion app updates in real-time via AWS AppSync as diagrams are generated.

Key Takeaways

  1. WebSocket command events: Cleanly separate cloud AI from local hardware. Tools send JSON, clients execute - no SDK imports in cloud code.

  2. Explicit tool invocation prompts: LLMs may β€œdescribe” tool calls instead of executing them. Be explicit: β€œALWAYS call tools, NEVER just say their names.”

  3. HIGH barge-in sensitivity: Essential for natural booth conversations. Users will interrupt - handle it gracefully.

Try It Yourself

The project is open source! Check out the code and try it yourself:

Tech stack:

Check out the Strands Agents documentation to build your own voice-driven agent.

Do It Yourself

Key takeaways

  • WebSocket commands, not SDK imports β€” Cloud tools should emit JSON command events, not directly control hardware. This separation allows local clients to translate commands to SDK calls while keeping cloud code hardware-agnostic.
  • Explicit tool prompts prevent LLM β€œdescriptions” β€” Models will often say β€œI’m waving” instead of calling wave_hello(). Add clear instructions: β€œALWAYS call tools, NEVER just describe them.”
  • Barge-in sensitivity is critical for voice UX β€” Set endpointingSensitivity to HIGH for natural conversations. Users will interrupt, and handling it gracefully makes or breaks the experience.

Try it now

  1. Build a BidiAgent voice app: Start with the Strands Agents quickstart to create your first bidirectional streaming agent with Amazon Bedrock Nova Sonic.

  2. Clone and run ReachyArchi: Fork the ReachyArchi GitHub repo and follow the setup guide to run it locally with a simulated robot (no hardware required).

  3. Explore Bedrock voice models: Test Nova Sonic’s voice capabilities using the Bedrock Converse API examples β€” experiment with turn detection settings and tool calling patterns.


Have questions? Connect with me on LinkedIn or check out more posts on agiusalexandre.com.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Never miss a post

Get notified when I publish new articles about AI, Cloud, and AWS.

No spam, unsubscribe anytime.

Comments

Sign in to leave a comment

Related Posts

Development

AWS Backup Cost Analysis

EBS snapshot costs were growing month-over-month with no clear explanation or optimization strategy.

4 min