OpenAI API Pricing Explained: Complete Guide for 2026
Deep dive into OpenAI's API pricing in 2026 — GPT-5.5, GPT-5 mini, o4-mini. Standard rates, cached input savings, Batch API discounts, and how to actually optimize your bill.
If you're shipping production features on top of OpenAI's API in 2026, the headline price-per-token is no longer the number that matters. Between prompt caching, Batch API discounts, and the divergent pricing across GPT-5.5, GPT-5 mini, and o4-mini, the real cost of a workload can vary by 5×–10× depending on how you architect it.
This guide breaks down OpenAI's full 2026 pricing surface and shows where the real savings live.
OpenAI's 2026 lineup at a glance
| Model | Input ($/1M) | Output ($/1M) | Context | Best for |
|---|---|---|---|---|
| GPT-5.5 | $5.00 | $20.00 | 256K | Frontier reasoning, multimodal |
| GPT-5 mini | $0.20 | $0.80 | 200K | High-volume production, vision |
| o4-mini | $0.90 | $3.60 | 200K | STEM reasoning, coding, math |
Compared to the GPT-4 era (where GPT-4 launched at $30/$60), the 2026 lineup is dramatically cheaper for the same capability tier — driven by competition from Anthropic, Google, and DeepSeek.
The pricing surface that matters
OpenAI bills you across at least four dimensions for any production workload. Most pricing pages only show two.
1. Standard input / output
These are the rates everyone knows. You pay per million tokens at the headline rate.
For GPT-5.5 with a 1,000-token system prompt and 500-token response, that's about $0.015 per call. Easy math.
2. Cached input — the savings nobody talks about
Since late 2024, OpenAI has supported automatic prompt caching: any prompt prefix that you've already sent within the last 5–10 minutes is billed at 50% off for cached portions on most current models.
For GPT-5.5: standard input is $5.00/1M, cached input is $1.25/1M — a 75% discount on cached portions.
This matters enormously for:
- RAG applications where you re-send the same retrieved context across many calls
- Chat threads where the conversation history grows but the system prompt stays
- Function-calling agents that re-send a long tool spec
Real numbers: a chat product with a 2,000-token system prompt + tool spec, called 10× per session, saves about 75% of the input bill with caching enabled — the prefix is billed at full rate once, then cached for the rest.
3. Batch API — half off, with a 24-hour catch
The OpenAI Batch API processes requests asynchronously within 24 hours and bills at 50% of the standard rate on all dimensions.
For GPT-5.5: $2.50/1M input, $10/1M output — half off, full stop.
When does this make sense?
- Overnight content generation — write 10,000 product descriptions during your sleep cycle
- Periodic data enrichment — categorize a CSV of customer support tickets weekly
- Embedding refreshes for large knowledge bases
- Synthetic data generation for fine-tuning datasets
When it doesn't:
- Anything user-facing in real time
- Anything where you need streaming
- Workloads where the 50% saving doesn't justify the operational complexity of an async pipeline
4. Fine-tuned model rates
If you fine-tune a base model, the fine-tuned version is billed at separate (typically higher) rates plus a one-time training fee. For GPT-5 mini fine-tuning, expect roughly 2× the base rate at inference time, plus training charges per token of training data.
The real cost of a real workload
Let's price a customer-support chatbot built on GPT-5 mini:
- Volume: 10,000 conversations / day, average 5 messages each
- Average input: 1,500 tokens (system prompt 1,200 + user message 300)
- Average output: 200 tokens
- System prompt is identical across conversations
Naive calculation (what most calculators show you)
- Input: 10,000 × 5 × 1,500 / 1M × $0.20 = $15.00 / day
- Output: 10,000 × 5 × 200 / 1M × $0.80 = $8.00 / day
- Total: $23.00 / day = $690 / month
With caching
About 80% of input tokens are the cached system prompt:
- Cached input: 10,000 × 5 × 1,200 / 1M × $0.05 = $3.00 / day
- Uncached input: 10,000 × 5 × 300 / 1M × $0.20 = $3.00 / day
- Output: $8.00 / day
- Total: $14.00 / day = $420 / month
Caching alone saves ~40% of the bill — about $270 / month on this workload.
With caching + selective batching
If half of the conversations are summary generation that runs nightly (batchable):
- Realtime portion (with caching): ~$10 / day
- Batch portion (50% off): ~$2 / day
- Total: $12 / day = $360 / month
Combined savings: $330 / month, ~48% vs the naive number.
How to actually optimize your OpenAI bill
In rough order of impact:
1. Always enable prompt caching
Caching is automatic on supported models — there's no flag to flip. But you must structure your prompts so that the cacheable prefix is stable: put your system prompt and tool definitions first, dynamic content last. If your system prompt changes per user, you lose the cache.
2. Match the model to the task
A common mistake is using GPT-5.5 for everything. Reality:
- Chatbots, classification, simple Q&A → GPT-5 mini (25× cheaper)
- Code generation, hard reasoning → GPT-5.5 or o4-mini (the latter often beats GPT-5.5 on STEM at 1/5 the cost)
- Image/document understanding → GPT-5 mini handles 80% of vision tasks at flagship-level cost
3. Use Batch API for anything non-realtime
If the user isn't actively waiting, batch it. The 50% discount stacks with caching where supported.
4. Cap output tokens explicitly
Output is 4× more expensive than input. Use max_tokens to bound responses. Many "expensive" calls end up that way because the model wandered into a 2,000-token answer when 200 was sufficient.
5. Optimize your prompt
Every 100 tokens of unnecessary system prompt × thousands of calls/day = real money. Audit your prompts the way you'd audit SQL queries.
6. Consider OpenAI alternatives for cheap workloads
DeepSeek V4 ($0.30 input / $1.20 output) and Gemini 3.0 Flash ($0.25 / $2.00) are 60–80% cheaper than GPT-5 mini for many classification and extraction tasks. Test them on a sample before committing.
When OpenAI is worth the premium
Despite cheaper alternatives, OpenAI's advantages remain:
- Tooling ecosystem — Most agent frameworks default to OpenAI; switching costs are real
- Reliability and uptime — Mature infrastructure
- Function calling reliability — GPT-5.5's tool use is still best-in-class
- Vision quality — Top tier
- o4-mini reasoning — Competitive with Claude Opus 4.7 on STEM benchmarks at 1/15 the cost
Bottom line
OpenAI's 2026 pricing is friendly to volume buyers if you actually use the discount mechanisms. Naive usage leaves 40–60% on the table.
If you're spending more than $500/month on OpenAI and haven't measured your cache hit rate or considered batching, you're almost certainly overpaying.
Use the calculator on the homepage to model your specific workload — including caching slider and Batch API toggle — and see exactly where you'd land.
Pricing in this article reflects OpenAI's published rates as of May 2026. Always verify with OpenAI's official pricing page before committing budget.