Make your AI agent talk back

Voice interaction is a conversation. If you can talk to your agent but it can only type back, something is missing. dictare closes the loop: your agent speaks to you through text-to-speech, locally, with the voice you choose.

TTS engines¶

dictare ships with support for multiple TTS engines. All run locally.

Kokoro¶

Neural TTS with natural-sounding voices. This is the default recommendation for quality. It runs as an isolated subprocess worker, so it doesn't interfere with the main process.

[tts]
engine = "kokoro"
voice = "af_heart"

Multiple voices available — American English, British English, and other languages. Latency is low enough for interactive use.

Piper¶

Lightweight and fast. Piper is built for low-latency scenarios where you want instant feedback. The voice quality is good, not quite Kokoro-level, but the response time is excellent.

[tts]
engine = "piper"
voice = "en_US-lessac-medium"

Piper supports dozens of voices across many languages. Voices are downloaded on first use.

espeak¶

The veteran. espeak has been around forever, it sounds robotic, and it's available on every Linux distro. But it's instant — zero startup time, zero model loading. Good for quick status announcements where natural speech isn't important.

[tts]
engine = "espeak"

macOS `say`¶

On macOS, dictare can use the built-in speech synthesis. Zero dependencies, zero setup. Quality depends on the system voice you've configured.

[tts]
engine = "say"
voice = "Samantha"

OuteTTS¶

Neural TTS with voice cloning capabilities. If you want the agent to sound like a specific person (with their consent), OuteTTS can do that from a short audio sample.

[tts]
engine = "outetts"

Agent announcements¶

When you switch agents with your voice ("agent claude"), dictare announces the switch via TTS. You hear "Claude" spoken aloud, confirming the switch happened. This is subtle but important — it keeps you in flow without checking the terminal.

Other events can trigger TTS feedback too: recording start/stop, errors, transcription confirmations. Configure which events get voice feedback in the sounds config.

The `dictare speak` command¶

Beyond agent feedback, dictare exposes TTS as a standalone command:

# Speak a string
echo "Build complete, no errors" | dictare speak

# Speak with a specific voice
echo "Tests passed" | dictare speak --voice af_heart

This is a building block. Combine it with anything.

The pipe pattern¶

The real power of dictare speak comes from Unix pipes:

# Voice-powered LLM conversation
dictare transcribe --auto-submit | llm | dictare speak

This creates a full voice loop: you speak, it gets transcribed, the LLM processes it, and the response is spoken back. No typing, no reading. Pure voice.

# Voice-controlled shell
dictare transcribe --auto-submit | sh 2>&1 | dictare speak

Speak shell commands, hear the output. Dangerous? Yes. Fun? Absolutely.

# Voice notes to file with spoken confirmation
dictare transcribe --auto-submit | tee notes.md | dictare speak

Dictate notes, hear them read back for verification, and save to a file simultaneously.

Choosing an engine¶

Quick guide:

Need	Engine
Best quality	Kokoro
Lowest latency	espeak or `say`
Good balance	Piper
Voice cloning	OuteTTS
Zero setup on macOS	`say`
Zero setup on Linux	espeak

Start with Kokoro if your machine can handle it (any modern hardware can). Drop down to Piper or espeak if you want faster responses at the cost of voice quality.

Configuration¶

Full TTS config example:

[tts]
engine = "kokoro"
voice = "af_heart"
speed = 1.0

[audio.sounds.agent_announce]
enabled = true
volume = 0.3

The agent talking back changes the dynamic completely. It stops feeling like you're dictating into a text box and starts feeling like a real collaboration. Try it.