Model Pricing & Budget
Pawz tracks token usage and costs across all AI providers, enforces daily budgets, and can automatically route tasks to cheaper models when appropriate.Per-model pricing
Pawz uses built-in pricing data to estimate costs in real time. Prices are per 1 million tokens.| Model | Input ($/1M tokens) | Output ($/1M tokens) | Provider |
|---|---|---|---|
| Claude 3 Haiku | $0.25 | $1.25 | Anthropic |
| Claude Haiku 4 | $1.00 | $5.00 | Anthropic |
| Claude Sonnet 4.x | $3.00 | $15.00 | Anthropic |
| Claude Opus 4.x | $15.00 | $75.00 | Anthropic |
| Gemini Flash | $0.15 | $0.60 | |
| Gemini Pro | $1.25 | $10.00 | |
| GPT-4.1 | $2.50 | $10.00 | OpenAI |
| GPT-4.1-mini / nano | $0.15 | $0.60 | OpenAI |
| GPT-4o | $2.50 | $10.00 | OpenAI |
| GPT-4o-mini | $0.15 | $0.60 | OpenAI |
| o3 / o1 | $10.00 | $40.00 | OpenAI |
| o3-mini / o4-mini | $1.10 | $4.40 | OpenAI |
| DeepSeek Chat | $0.27 | $1.10 | DeepSeek |
| DeepSeek Reasoner | $0.55 | $2.19 | DeepSeek |
Cost comparison
To put these numbers in perspective, here’s the approximate cost for a 10,000-token conversation (5K input + 5K output):| Model | Estimated cost |
|---|---|
| Gemini Flash | $0.004 |
| GPT-4o-mini | $0.004 |
| GPT-4.1-mini | $0.004 |
| DeepSeek Chat | $0.007 |
| Claude 3 Haiku | $0.008 |
| o3-mini / o4-mini | $0.028 |
| Claude Haiku 4 | $0.030 |
| Gemini Pro | $0.056 |
| GPT-4o | $0.063 |
| GPT-4.1 | $0.063 |
| Claude Sonnet 4.x | $0.090 |
| o3 / o1 | $0.250 |
| Claude Opus 4.x | $0.450 |
Cache token accounting
When providers support prompt caching (e.g., Anthropic), Pawz applies reduced rates for cached tokens:| Token type | Cost multiplier |
|---|---|
| Normal tokens | 100% (full price) |
| Cache reads | 10% of normal cost |
| Cache creation | 25% of normal cost |
Daily budget
Pawz can enforce a daily spending limit to prevent runaway costs.Configuration
| Setting | Default | Description |
|---|---|---|
daily_budget_usd | $10.00 | Maximum daily spend across all models |
0 to disable budget enforcement entirely.
Configure the budget in Settings → Agent Defaults or directly in the engine config:
Budget enforcement
The budget is checked before each API call. Pawz provides progressive warnings as spending increases:| Threshold | Action |
|---|---|
| 50% of budget | Warning notification |
| 75% of budget | Elevated warning |
| 90% of budget | Urgent warning |
| 100% of budget | Hard block — API calls are rejected |
DailyTokenTracker
Pawz maintains per-model cost tracking through theDailyTokenTracker:
- Tracks input and output tokens separately per model
- Calculates costs using the pricing table above
- Applies cache token discounts automatically
- Resets daily at midnight
complete events with usage stats).
Auto-tier model selection
Theauto_tier feature automatically routes tasks to cheaper or more expensive models based on complexity.
Task complexity classification
Pawz analyzes each message to determine if it’s simple or complex:| Classification | Routing | Example tasks |
|---|---|---|
| Simple | Routes to cheap_model | ”What time is it?”, “Convert 5kg to lbs”, simple Q&A |
| Complex | Routes to default_model | Multi-step reasoning, code generation, research |
auto_tier is enabled in your model routing config, simple messages skip the expensive default model entirely.
Model routing configuration
| Field | Description |
|---|---|
boss_model | Powerful model for the orchestrator/boss agent |
worker_model | Default model for sub-agents (cheaper/faster) |
cheap_model | Budget model for simple tasks when auto_tier is on |
auto_tier | Enable automatic model selection by task complexity |
specialty_models | Per-specialty overrides (e.g., coder, researcher) |
agent_models | Per-agent overrides (highest priority) |
- Per-agent override (
agent_models) - Per-specialty override (
specialty_models) cheap_model(ifauto_tierenabled and task is simple)default_model
Tips for cost optimization
:::tip Cost optimization strategies-
Use
auto_tier: Enable automatic model selection so simple queries use cheap models. This alone can cut costs 50%+. - Set a daily budget: Even a generous budget prevents accidental runaway costs from automated tasks or cron jobs.
- Use Ollama for development: Local models are free. Use them for testing agent configurations before switching to paid providers.
- Match model to task: Don’t use Claude Opus for simple questions. Create chat modes (see Foundry) for different tiers.
-
Enable session compaction: Long sessions consume more tokens per turn. Use
/compactor let auto-compaction manage context size. - Leverage prompt caching: Anthropic’s cache reads cost only 10% of normal input tokens. Consistent system prompts and skill instructions benefit most.
-
Use
worker_modelfor orchestration: In multi-agent projects, the boss agent should use a capable model, but workers can use cheaper alternatives. -
Monitor the
completeevents: Each response includes token usage stats (input/output/total tokens and model name) so you can track spending in real time.

