Back to blog
·9 min read·AI Cost Calc Team

OpenAI API Pricing Explained: Complete Guide for 2026

Deep dive into OpenAI's API pricing in 2026 — GPT-5.5, GPT-5 mini, o4-mini. Standard rates, cached input savings, Batch API discounts, and how to actually optimize your bill.

openaipricinggpt-5cost-optimization

If you're shipping production features on top of OpenAI's API in 2026, the headline price-per-token is no longer the number that matters. Between prompt caching, Batch API discounts, and the divergent pricing across GPT-5.5, GPT-5 mini, and o4-mini, the real cost of a workload can vary by 5×–10× depending on how you architect it.

This guide breaks down OpenAI's full 2026 pricing surface and shows where the real savings live.

OpenAI's 2026 lineup at a glance

Model Input ($/1M) Output ($/1M) Context Best for
GPT-5.5 $5.00 $20.00 256K Frontier reasoning, multimodal
GPT-5 mini $0.20 $0.80 200K High-volume production, vision
o4-mini $0.90 $3.60 200K STEM reasoning, coding, math

Compared to the GPT-4 era (where GPT-4 launched at $30/$60), the 2026 lineup is dramatically cheaper for the same capability tier — driven by competition from Anthropic, Google, and DeepSeek.

The pricing surface that matters

OpenAI bills you across at least four dimensions for any production workload. Most pricing pages only show two.

1. Standard input / output

These are the rates everyone knows. You pay per million tokens at the headline rate.

For GPT-5.5 with a 1,000-token system prompt and 500-token response, that's about $0.015 per call. Easy math.

2. Cached input — the savings nobody talks about

Since late 2024, OpenAI has supported automatic prompt caching: any prompt prefix that you've already sent within the last 5–10 minutes is billed at 50% off for cached portions on most current models.

For GPT-5.5: standard input is $5.00/1M, cached input is $1.25/1M — a 75% discount on cached portions.

This matters enormously for:

  • RAG applications where you re-send the same retrieved context across many calls
  • Chat threads where the conversation history grows but the system prompt stays
  • Function-calling agents that re-send a long tool spec

Real numbers: a chat product with a 2,000-token system prompt + tool spec, called 10× per session, saves about 75% of the input bill with caching enabled — the prefix is billed at full rate once, then cached for the rest.

3. Batch API — half off, with a 24-hour catch

The OpenAI Batch API processes requests asynchronously within 24 hours and bills at 50% of the standard rate on all dimensions.

For GPT-5.5: $2.50/1M input, $10/1M output — half off, full stop.

When does this make sense?

  • Overnight content generation — write 10,000 product descriptions during your sleep cycle
  • Periodic data enrichment — categorize a CSV of customer support tickets weekly
  • Embedding refreshes for large knowledge bases
  • Synthetic data generation for fine-tuning datasets

When it doesn't:

  • Anything user-facing in real time
  • Anything where you need streaming
  • Workloads where the 50% saving doesn't justify the operational complexity of an async pipeline

4. Fine-tuned model rates

If you fine-tune a base model, the fine-tuned version is billed at separate (typically higher) rates plus a one-time training fee. For GPT-5 mini fine-tuning, expect roughly 2× the base rate at inference time, plus training charges per token of training data.

The real cost of a real workload

Let's price a customer-support chatbot built on GPT-5 mini:

  • Volume: 10,000 conversations / day, average 5 messages each
  • Average input: 1,500 tokens (system prompt 1,200 + user message 300)
  • Average output: 200 tokens
  • System prompt is identical across conversations

Naive calculation (what most calculators show you)

  • Input: 10,000 × 5 × 1,500 / 1M × $0.20 = $15.00 / day
  • Output: 10,000 × 5 × 200 / 1M × $0.80 = $8.00 / day
  • Total: $23.00 / day = $690 / month

With caching

About 80% of input tokens are the cached system prompt:

  • Cached input: 10,000 × 5 × 1,200 / 1M × $0.05 = $3.00 / day
  • Uncached input: 10,000 × 5 × 300 / 1M × $0.20 = $3.00 / day
  • Output: $8.00 / day
  • Total: $14.00 / day = $420 / month

Caching alone saves ~40% of the bill — about $270 / month on this workload.

With caching + selective batching

If half of the conversations are summary generation that runs nightly (batchable):

  • Realtime portion (with caching): ~$10 / day
  • Batch portion (50% off): ~$2 / day
  • Total: $12 / day = $360 / month

Combined savings: $330 / month, ~48% vs the naive number.

How to actually optimize your OpenAI bill

In rough order of impact:

1. Always enable prompt caching

Caching is automatic on supported models — there's no flag to flip. But you must structure your prompts so that the cacheable prefix is stable: put your system prompt and tool definitions first, dynamic content last. If your system prompt changes per user, you lose the cache.

2. Match the model to the task

A common mistake is using GPT-5.5 for everything. Reality:

  • Chatbots, classification, simple Q&A → GPT-5 mini (25× cheaper)
  • Code generation, hard reasoning → GPT-5.5 or o4-mini (the latter often beats GPT-5.5 on STEM at 1/5 the cost)
  • Image/document understanding → GPT-5 mini handles 80% of vision tasks at flagship-level cost

3. Use Batch API for anything non-realtime

If the user isn't actively waiting, batch it. The 50% discount stacks with caching where supported.

4. Cap output tokens explicitly

Output is 4× more expensive than input. Use max_tokens to bound responses. Many "expensive" calls end up that way because the model wandered into a 2,000-token answer when 200 was sufficient.

5. Optimize your prompt

Every 100 tokens of unnecessary system prompt × thousands of calls/day = real money. Audit your prompts the way you'd audit SQL queries.

6. Consider OpenAI alternatives for cheap workloads

DeepSeek V4 ($0.30 input / $1.20 output) and Gemini 3.0 Flash ($0.25 / $2.00) are 60–80% cheaper than GPT-5 mini for many classification and extraction tasks. Test them on a sample before committing.

When OpenAI is worth the premium

Despite cheaper alternatives, OpenAI's advantages remain:

  • Tooling ecosystem — Most agent frameworks default to OpenAI; switching costs are real
  • Reliability and uptime — Mature infrastructure
  • Function calling reliability — GPT-5.5's tool use is still best-in-class
  • Vision quality — Top tier
  • o4-mini reasoning — Competitive with Claude Opus 4.7 on STEM benchmarks at 1/15 the cost

Bottom line

OpenAI's 2026 pricing is friendly to volume buyers if you actually use the discount mechanisms. Naive usage leaves 40–60% on the table.

If you're spending more than $500/month on OpenAI and haven't measured your cache hit rate or considered batching, you're almost certainly overpaying.

Use the calculator on the homepage to model your specific workload — including caching slider and Batch API toggle — and see exactly where you'd land.


Pricing in this article reflects OpenAI's published rates as of May 2026. Always verify with OpenAI's official pricing page before committing budget.

Try the calculator

Plug in your workload and see exactly what the numbers in this article mean for your bill.

Open calculator