← dragfly
dragfly

Make your AI agent talk back

2026-04-13

Voice interaction is a conversation. If you can talk to your agent but it can only type back, something is missing. dictare closes the loop: your agent speaks to you through text-to-speech, locally, with the voice you choose.

TTS engines

dictare ships with support for multiple TTS engines. All run locally.

Kokoro

Neural TTS with natural-sounding voices. This is the default recommendation for quality. It runs as an isolated subprocess worker, so it doesn't interfere with the main process.

[tts]
engine = "kokoro"
voice = "af_heart"

Multiple voices available — American English, British English, and other languages. Latency is low enough for interactive use.

Piper

Lightweight and fast. Piper is built for low-latency scenarios where you want instant feedback. The voice quality is good, not quite Kokoro-level, but the response time is excellent.

[tts]
engine = "piper"
voice = "en_US-lessac-medium"

Piper supports dozens of voices across many languages. Voices are downloaded on first use.

espeak

The veteran. espeak has been around forever, it sounds robotic, and it's available on every Linux distro. But it's instant — zero startup time, zero model loading. Good for quick status announcements where natural speech isn't important.

[tts]
engine = "espeak"

macOS say

On macOS, dictare can use the built-in speech synthesis. Zero dependencies, zero setup. Quality depends on the system voice you've configured.

[tts]
engine = "say"
voice = "Samantha"

OuteTTS

Neural TTS with voice cloning capabilities. If you want the agent to sound like a specific person (with their consent), OuteTTS can do that from a short audio sample.

[tts]
engine = "outetts"

Agent announcements

When you switch agents with your voice ("agent claude"), dictare announces the switch via TTS. You hear "Claude" spoken aloud, confirming the switch happened. This is subtle but important — it keeps you in flow without checking the terminal.

Other events can trigger TTS feedback too: recording start/stop, errors, transcription confirmations. Configure which events get voice feedback in the sounds config.

The dictare speak command

Beyond agent feedback, dictare exposes TTS as a standalone command:

# Speak a string
echo "Build complete, no errors" | dictare speak

# Speak with a specific voice
echo "Tests passed" | dictare speak --voice af_heart

This is a building block. Combine it with anything.

The pipe pattern

The real power of dictare speak comes from Unix pipes:

# Voice-powered LLM conversation
dictare transcribe --auto-submit | llm | dictare speak

This creates a full voice loop: you speak, it gets transcribed, the LLM processes it, and the response is spoken back. No typing, no reading. Pure voice.

# Voice-controlled shell
dictare transcribe --auto-submit | sh 2>&1 | dictare speak

Speak shell commands, hear the output. Dangerous? Yes. Fun? Absolutely.

# Voice notes to file with spoken confirmation
dictare transcribe --auto-submit | tee notes.md | dictare speak

Dictate notes, hear them read back for verification, and save to a file simultaneously.

Choosing an engine

Quick guide:

Need Engine
Best quality Kokoro
Lowest latency espeak or say
Good balance Piper
Voice cloning OuteTTS
Zero setup on macOS say
Zero setup on Linux espeak

Start with Kokoro if your machine can handle it (any modern hardware can). Drop down to Piper or espeak if you want faster responses at the cost of voice quality.

Configuration

Full TTS config example:

[tts]
engine = "kokoro"
voice = "af_heart"
speed = 1.0

[audio.sounds.agent_announce]
enabled = true
volume = 0.3

The agent talking back changes the dynamic completely. It stops feeling like you're dictating into a text box and starts feeling like a real collaboration. Try it.