A native macOS app for recording multi-agent AI conversations for YouTube. Paul hosts live discussions with his AI assistants (Sunday, Friday, and others) where all participants can hear and respond to each other in real time.

Overview

The Mac app is the hub. It owns the entire audio pipeline and coordinates the conversation through MQTT. There is no separate channel server - the agents participate purely through MQTT topics they're already subscribed to.

                       MQTT Broker (192.168.0.9:1883)
                                  |
               +------------------+------------------+
               |                  |                  |
          Mac App             Sunday             Friday
       (hub + UI)          (Claude Code)       (Claude Code)
            |
   +--------+--------+
   |        |        |
  Mic     STT      TTS
Capture  (ElevenLabs) (ElevenLabs)

Core Flow

1. Session Start

When Paul hits "Start Session" in the Mac app:

  1. App publishes a context message to each participating agent's MQTT inbox with session rules, participant list, episode topic, and turn-taking instructions
  2. App begins capturing mic audio and streaming to ElevenLabs STT
  3. UI transitions from idle to active, showing participant indicators

2. Paul Speaks

  1. Mac app captures mic audio via AVFoundation (AVAudioEngine)
  2. Raw PCM streamed to ElevenLabs Scribe Realtime STT over WebSocket
  3. Partial transcripts displayed live in the UI
  4. On committed transcript, app publishes to MQTT conversation topic with speaker, text, type, and session_id
  5. UI state: paul_speaking while STT active, idle when done

3. Agents Respond

  1. Both agents receive Paul's transcript via MQTT subscription
  2. Agents decide whether to respond based on natural language cues (name addressing, open floor, etc.)
  3. Agent publishes reply to the conversation topic