dictare's default behavior works out of the box. But the pipeline — the chain of steps between your voice and the agent — is fully customizable. You control what triggers submission, how muting works, how agent switching responds, and what confidence thresholds to accept.
All of it lives in config.toml.
Pipeline architecture¶
Every transcription flows through two stages:
Filters inspect the text and decide what to do with it. They can transform it, consume it (preventing further processing), or let it pass through unchanged.
Executors act on the filtered text. They deliver it to the agent, toggle mute state, switch agents, or perform other actions.
The default pipeline has three filters and three executors:
Voice → [MuteFilter] → [AgentFilter] → [SubmitFilter] → [MuteExecutor | AgentSwitchExecutor | InputExecutor]
Each filter checks for specific trigger phrases. If it matches, it consumes the text and signals the corresponding executor. If it doesn't match, the text passes to the next filter.
Submit triggers¶
The SubmitFilter decides when to submit text to the agent. By default, double-tapping the hotkey submits. But you can also submit with your voice.
[pipeline.submit_filter.triggers]
"*" = [
["ok|okay", "send|submit"],
]
Now saying "OK send" at the end of your sentence submits it immediately, without the double-tap. The trigger phrase gets stripped from the text before delivery.
You can also set a confidence threshold — if the STT engine isn't confident enough in the transcription, it won't submit:
[pipeline.submit_filter]
confidence_threshold = 0.7
Mute control¶
The MuteFilter lets you pause and resume voice capture without touching the keyboard.
[pipeline.mute_filter.mute_triggers]
"*" = [["ok|okay", "mute|stop"]]
[pipeline.mute_filter.listen_triggers]
"*" = [["ok|okay", "listen"]]
Say "OK mute" and dictare stops processing your voice. Background conversations, phone calls, side discussions — none of it reaches the agent. Say "OK listen" to resume.
The mute state is reflected in the status bar and tray icon, so you always know whether dictare is listening.
Agent switching¶
The AgentFilter handles voice-based agent switching.
[pipeline.agent_filter]
triggers = ["agent", "switch to"]
When the filter hears "agent claude" or "switch to codex", it extracts the agent name and signals the AgentSwitchExecutor. The executor handles the actual switch, including TTS feedback announcing the new agent.
Putting it together¶
Here's a complete pipeline configuration:
[pipeline.agent_filter]
triggers = ["agent", "switch to", "talk to"]
[pipeline.submit_filter]
confidence_threshold = 0.6
[pipeline.submit_filter.triggers]
"*" = [
["ok|okay", "send|submit"],
["do", "it"],
]
With this config:
- "OK mute" or "OK stop" — pauses listening
- "OK listen" — resumes listening
- "Agent claude", "switch to codex", "talk to aider" — switches agents
- "Create a REST endpoint for user profiles, OK send" — transcribes and submits immediately
- "do it" — submits whatever was just transcribed
Custom filters¶
The pipeline is loaded by PipelineLoader, which uses dependency injection. If the built-in filters don't cover your needs, you can write your own. A filter is any class that implements the Filter protocol:
class Filter(Protocol):
def process(self, text: str, context: dict) -> FilterResult:
...
Return FilterResult.PASS to let the text continue down the pipeline, or FilterResult.CONSUME to stop processing and trigger an action.
Tips¶
- Trigger phrases are case-insensitive and use fuzzy matching to account for STT variations.
- Keep triggers short and distinct. "OK send" works better than "please submit this to the agent" because shorter phrases have less room for transcription errors.
- Test your triggers by running
dictare transcribein a terminal and speaking them. You'll see exactly what the STT engine produces, so you can tune your trigger phrases to match.
The pipeline is where dictare becomes yours. Spend ten minutes with config.toml and you'll have a voice workflow that fits exactly how you think.
