Model Pricing & Budget

Pawz tracks token usage and costs across all AI providers, enforces daily budgets, and can automatically route tasks to cheaper models when appropriate.

Per-model pricing

Pawz uses built-in pricing data to estimate costs in real time. Prices are per 1 million tokens.

Model	Input ($/1M tokens)	Output ($/1M tokens)	Provider
Claude 3 Haiku	$0.25	$1.25	Anthropic
Claude Haiku 4	$1.00	$5.00	Anthropic
Claude Sonnet 4.x	$3.00	$15.00	Anthropic
Claude Opus 4.x	$15.00	$75.00	Anthropic
Gemini Flash	$0.15	$0.60	Google
Gemini Pro	$1.25	$10.00	Google
GPT-4.1	$2.50	$10.00	OpenAI
GPT-4.1-mini / nano	$0.15	$0.60	OpenAI
GPT-4o	$2.50	$10.00	OpenAI
GPT-4o-mini	$0.15	$0.60	OpenAI
o3 / o1	$10.00	$40.00	OpenAI
o3-mini / o4-mini	$1.10	$4.40	OpenAI
DeepSeek Chat	$0.27	$1.10	DeepSeek
DeepSeek Reasoner	$0.55	$2.19	DeepSeek

:::info Prices are based on each provider’s published rates. Local models (Ollama) have zero cost. For providers not listed (custom, OpenRouter, etc.), costs are estimated based on the closest matching model name. :::

Cost comparison

To put these numbers in perspective, here’s the approximate cost for a 10,000-token conversation (5K input + 5K output):

Model	Estimated cost
Gemini Flash	$0.004
GPT-4o-mini	$0.004
GPT-4.1-mini	$0.004
DeepSeek Chat	$0.007
Claude 3 Haiku	$0.008
o3-mini / o4-mini	$0.028
Claude Haiku 4	$0.030
Gemini Pro	$0.056
GPT-4o	$0.063
GPT-4.1	$0.063
Claude Sonnet 4.x	$0.090
o3 / o1	$0.250
Claude Opus 4.x	$0.450

Cache token accounting

When providers support prompt caching (e.g., Anthropic), Pawz applies reduced rates for cached tokens:

Token type	Cost multiplier
Normal tokens	100% (full price)
Cache reads	10% of normal cost
Cache creation	25% of normal cost

:::tip Anthropic’s prompt caching can dramatically reduce costs for repetitive tasks. Long system prompts and skill instructions benefit the most from caching since they remain constant across turns. :::

Daily budget

Pawz can enforce a daily spending limit to prevent runaway costs.

Configuration

Setting	Default	Description
`daily_budget_usd`	`$10.00`	Maximum daily spend across all models

Set to 0 to disable budget enforcement entirely. Configure the budget in Settings → Agent Defaults or directly in the engine config:

{
  "daily_budget_usd": 10.0
}

Budget enforcement

The budget is checked before each API call. Pawz provides progressive warnings as spending increases:

Threshold	Action
50% of budget	Warning notification
75% of budget	Elevated warning
90% of budget	Urgent warning
100% of budget	Hard block — API calls are rejected

:::warning When the daily budget is reached, all AI API calls are blocked until the next day (midnight UTC). Ongoing conversations will stop receiving responses. Set a budget that accommodates your expected daily usage. :::

DailyTokenTracker

Pawz maintains per-model cost tracking through the DailyTokenTracker:

Tracks input and output tokens separately per model
Calculates costs using the pricing table above
Applies cache token discounts automatically
Resets daily at midnight

The tracker enables the budget enforcement system and powers the cost display in the chat interface (shown in complete events with usage stats).

Auto-tier model selection

The auto_tier feature automatically routes tasks to cheaper or more expensive models based on complexity.

Task complexity classification

Pawz analyzes each message to determine if it’s simple or complex:

Classification	Routing	Example tasks
Simple	Routes to `cheap_model`	”What time is it?”, “Convert 5kg to lbs”, simple Q&A
Complex	Routes to `default_model`	Multi-step reasoning, code generation, research

The classification uses keyword signals and heuristics to decide. When auto_tier is enabled in your model routing config, simple messages skip the expensive default model entirely.

Model routing configuration

{
  "model_routing": {
    "boss_model": "claude-opus-4-6",
    "worker_model": "claude-sonnet-4-6",
    "cheap_model": "claude-3-haiku",
    "auto_tier": true,
    "specialty_models": {
      "coder": "gemini-2.5-pro"
    },
    "agent_models": {
      "agent-uuid-here": "gpt-4o"
    }
  }
}

Field	Description
`boss_model`	Powerful model for the orchestrator/boss agent
`worker_model`	Default model for sub-agents (cheaper/faster)
`cheap_model`	Budget model for simple tasks when `auto_tier` is on
`auto_tier`	Enable automatic model selection by task complexity
`specialty_models`	Per-specialty overrides (e.g., `coder`, `researcher`)
`agent_models`	Per-agent overrides (highest priority)

Resolution order (highest priority first):

Per-agent override (agent_models)
Per-specialty override (specialty_models)
cheap_model (if auto_tier enabled and task is simple)
default_model

Tips for cost optimization

:::tip Cost optimization strategies

Use auto_tier: Enable automatic model selection so simple queries use cheap models. This alone can cut costs 50%+.
Set a daily budget: Even a generous budget prevents accidental runaway costs from automated tasks or cron jobs.
Use Ollama for development: Local models are free. Use them for testing agent configurations before switching to paid providers.
Match model to task: Don’t use Claude Opus for simple questions. Create chat modes (see Foundry) for different tiers.
Enable session compaction: Long sessions consume more tokens per turn. Use /compact or let auto-compaction manage context size.
Leverage prompt caching: Anthropic’s cache reads cost only 10% of normal input tokens. Consistent system prompts and skill instructions benefit most.
Use worker_model for orchestration: In multi-agent projects, the boss agent should use a capable model, but workers can use cheaper alternatives.
Monitor the complete events: Each response includes token usage stats (input/output/total tokens and model name) so you can track spending in real time.

:::

Core

Integrations

Workflows

Tools

Platform

Model Pricing & Budget

Model Pricing & Budget

Per-model pricing

Cost comparison

Cache token accounting

Daily budget

Configuration

Budget enforcement

DailyTokenTracker

Auto-tier model selection

Task complexity classification

Model routing configuration

Tips for cost optimization

Core

Integrations

Workflows

Tools

Platform

​Model Pricing & Budget

​Per-model pricing

​Cost comparison

​Cache token accounting

​Daily budget

​Configuration

​Budget enforcement

​DailyTokenTracker

​Auto-tier model selection

​Task complexity classification

​Model routing configuration

​Tips for cost optimization

Model Pricing & Budget

Per-model pricing

Cost comparison

Cache token accounting

Daily budget

Configuration

Budget enforcement

DailyTokenTracker

Auto-tier model selection

Task complexity classification

Model routing configuration

Tips for cost optimization