Skip to main content

Model Pricing & Budget

Pawz tracks token usage and costs across all AI providers, enforces daily budgets, and can automatically route tasks to cheaper models when appropriate.

Per-model pricing

Pawz uses built-in pricing data to estimate costs in real time. Prices are per 1 million tokens.
ModelInput ($/1M tokens)Output ($/1M tokens)Provider
Claude 3 Haiku$0.25$1.25Anthropic
Claude Haiku 4$1.00$5.00Anthropic
Claude Sonnet 4.x$3.00$15.00Anthropic
Claude Opus 4.x$15.00$75.00Anthropic
Gemini Flash$0.15$0.60Google
Gemini Pro$1.25$10.00Google
GPT-4.1$2.50$10.00OpenAI
GPT-4.1-mini / nano$0.15$0.60OpenAI
GPT-4o$2.50$10.00OpenAI
GPT-4o-mini$0.15$0.60OpenAI
o3 / o1$10.00$40.00OpenAI
o3-mini / o4-mini$1.10$4.40OpenAI
DeepSeek Chat$0.27$1.10DeepSeek
DeepSeek Reasoner$0.55$2.19DeepSeek
:::info Prices are based on each provider’s published rates. Local models (Ollama) have zero cost. For providers not listed (custom, OpenRouter, etc.), costs are estimated based on the closest matching model name. :::

Cost comparison

To put these numbers in perspective, here’s the approximate cost for a 10,000-token conversation (5K input + 5K output):
ModelEstimated cost
Gemini Flash$0.004
GPT-4o-mini$0.004
GPT-4.1-mini$0.004
DeepSeek Chat$0.007
Claude 3 Haiku$0.008
o3-mini / o4-mini$0.028
Claude Haiku 4$0.030
Gemini Pro$0.056
GPT-4o$0.063
GPT-4.1$0.063
Claude Sonnet 4.x$0.090
o3 / o1$0.250
Claude Opus 4.x$0.450

Cache token accounting

When providers support prompt caching (e.g., Anthropic), Pawz applies reduced rates for cached tokens:
Token typeCost multiplier
Normal tokens100% (full price)
Cache reads10% of normal cost
Cache creation25% of normal cost
:::tip Anthropic’s prompt caching can dramatically reduce costs for repetitive tasks. Long system prompts and skill instructions benefit the most from caching since they remain constant across turns. :::

Daily budget

Pawz can enforce a daily spending limit to prevent runaway costs.

Configuration

SettingDefaultDescription
daily_budget_usd$10.00Maximum daily spend across all models
Set to 0 to disable budget enforcement entirely. Configure the budget in Settings → Agent Defaults or directly in the engine config:
{
  "daily_budget_usd": 10.0
}

Budget enforcement

The budget is checked before each API call. Pawz provides progressive warnings as spending increases:
ThresholdAction
50% of budgetWarning notification
75% of budgetElevated warning
90% of budgetUrgent warning
100% of budgetHard block — API calls are rejected
:::warning When the daily budget is reached, all AI API calls are blocked until the next day (midnight UTC). Ongoing conversations will stop receiving responses. Set a budget that accommodates your expected daily usage. :::

DailyTokenTracker

Pawz maintains per-model cost tracking through the DailyTokenTracker:
  • Tracks input and output tokens separately per model
  • Calculates costs using the pricing table above
  • Applies cache token discounts automatically
  • Resets daily at midnight
The tracker enables the budget enforcement system and powers the cost display in the chat interface (shown in complete events with usage stats).

Auto-tier model selection

The auto_tier feature automatically routes tasks to cheaper or more expensive models based on complexity.

Task complexity classification

Pawz analyzes each message to determine if it’s simple or complex:
ClassificationRoutingExample tasks
SimpleRoutes to cheap_model”What time is it?”, “Convert 5kg to lbs”, simple Q&A
ComplexRoutes to default_modelMulti-step reasoning, code generation, research
The classification uses keyword signals and heuristics to decide. When auto_tier is enabled in your model routing config, simple messages skip the expensive default model entirely.

Model routing configuration

{
  "model_routing": {
    "boss_model": "claude-opus-4-6",
    "worker_model": "claude-sonnet-4-6",
    "cheap_model": "claude-3-haiku",
    "auto_tier": true,
    "specialty_models": {
      "coder": "gemini-2.5-pro"
    },
    "agent_models": {
      "agent-uuid-here": "gpt-4o"
    }
  }
}
FieldDescription
boss_modelPowerful model for the orchestrator/boss agent
worker_modelDefault model for sub-agents (cheaper/faster)
cheap_modelBudget model for simple tasks when auto_tier is on
auto_tierEnable automatic model selection by task complexity
specialty_modelsPer-specialty overrides (e.g., coder, researcher)
agent_modelsPer-agent overrides (highest priority)
Resolution order (highest priority first):
  1. Per-agent override (agent_models)
  2. Per-specialty override (specialty_models)
  3. cheap_model (if auto_tier enabled and task is simple)
  4. default_model

Tips for cost optimization

:::tip Cost optimization strategies
  1. Use auto_tier: Enable automatic model selection so simple queries use cheap models. This alone can cut costs 50%+.
  2. Set a daily budget: Even a generous budget prevents accidental runaway costs from automated tasks or cron jobs.
  3. Use Ollama for development: Local models are free. Use them for testing agent configurations before switching to paid providers.
  4. Match model to task: Don’t use Claude Opus for simple questions. Create chat modes (see Foundry) for different tiers.
  5. Enable session compaction: Long sessions consume more tokens per turn. Use /compact or let auto-compaction manage context size.
  6. Leverage prompt caching: Anthropic’s cache reads cost only 10% of normal input tokens. Consistent system prompts and skill instructions benefit most.
  7. Use worker_model for orchestration: In multi-agent projects, the boss agent should use a capable model, but workers can use cheaper alternatives.
  8. Monitor the complete events: Each response includes token usage stats (input/output/total tokens and model name) so you can track spending in real time.
:::