Voice & TTS
Pawz supports text-to-speech so agents can speak their responses aloud.Setup
Go to Settings → Voice to configure TTS.Providers
Google Cloud TTS
No API key needed — uses the free web endpoint. Chirp 3 HD voices: Puck, Charon, Kore, Fenrir, Leda, Orus, Zephyr, Aoede, Callirhoe, Autonoe Neural2 voices: en-US-Neural2-A through F Journey voices: en-US-Journey-D, en-US-Journey-F, en-US-Journey-OOpenAI TTS
Requires an OpenAI API key. Voices: alloy, ash, coral, echo, fable, nova, onyx, sage, shimmerElevenLabs
Requires anELEVENLABS_API_KEY.
Voices: Sarah, Charlie, George, Callum, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill
Models:
| Model | Best for |
|---|---|
eleven_multilingual_v2 | Multi-language, highest quality |
eleven_turbo_v2_5 | Low latency, English-focused |
eleven_monolingual_v1 | English only, legacy |
- Stability (0–1, default 0.5) — higher = more consistent
- Similarity boost (0–1, default 0.75) — higher = closer to reference voice
Settings
| Setting | Default | Description |
|---|---|---|
| Provider | — | Google / OpenAI / ElevenLabs |
| Voice | — | Voice name from the selected provider |
| Speed | 1.0 | Playback speed multiplier |
| Language | — | Language code (13 supported) |
| Auto-speak | Off | Automatically speak every response |
Speech-to-text (STT)
Pawz uses OpenAI Whisper for speech-to-text transcription:| Backend | Setup | Latency | Cost |
|---|---|---|---|
| Whisper API | OpenAI API key (from Models settings) | ~1–2s | $0.006/min |
| Whisper Local | Install whisper binary | ~3–5s | Free |
Audio capture settings
The microphone input uses these Web Audio constraints:| Setting | Value |
|---|---|
| Echo cancellation | Enabled |
| Noise suppression | Enabled |
| Sample rate | 16 kHz |
| Format | audio/webm;codecs=opus (preferred) |
Voice activity detection (VAD)
Talk Mode includes built-in voice activity detection to avoid sending silence to the transcription API:| Parameter | Value | Description |
|---|---|---|
| Recording window | 8 seconds | Records in 8-second chunks |
| Minimum audio size | 8 KB | Chunks under 8 KB are treated as silence and skipped |
| Inter-cycle delay | 500 ms | Brief pause between recording cycles after errors |
| Empty transcript | Skipped | If Whisper returns blank text, the cycle restarts |
Talk mode
Click the microphone icon in the chat header to enter talk mode. Your speech is transcribed and sent to the agent, and the response is spoken back. Requires either:- Whisper API skill (OpenAI API key)
- Whisper Local skill (install
whisperbinary)
How talk mode works
- Listen — microphone captures audio in 8-second windows
- Transcribe — audio is sent to Whisper STT → text
- Process — transcribed text is sent to your agent as a chat message
- Speak — agent’s response is synthesized via your configured TTS provider
- Repeat — next recording cycle starts after playback finishes
Voice command mode vs dictation mode
| Mode | Behavior | Use case |
|---|---|---|
| Voice command (default) | Each utterance is sent as a standalone chat message | Giving instructions, asking questions |
| Dictation | Utterances are accumulated into a text buffer | Composing long-form content, emails |

