How to Save 60% on AI API Costs with Prompt Batching
By Promptster Team · 2026-04-10
If you are making AI API calls in a loop -- one request per user query, one request per document, one request per row -- you are probably overpaying. Prompt batching is one of the most effective ways to cut your AI costs, and most teams are not taking advantage of it.
We have seen teams reduce their monthly AI API spend by 40-60% by restructuring how they send requests. The savings come from three places: batch pricing discounts, reduced overhead per request, and smarter prompt design that eliminates redundant calls. Here is how to implement each one.
What Is Prompt Batching?
Prompt batching means combining multiple pieces of work into fewer API calls. Instead of sending 100 individual requests, you structure your prompts so that a single request handles multiple items at once.
There are two forms of batching:
Application-level batching -- You redesign your prompts to process multiple items in a single call. For example, instead of classifying 50 support tickets one at a time, you send all 50 in one prompt and ask the model to return classifications for each.
Provider-level batch APIs -- Some providers offer dedicated batch endpoints that process requests asynchronously at a discount. You submit a batch of requests, and the provider returns results within a time window (usually 24 hours) at reduced pricing.
Which Providers Offer Batch Pricing?
Not all providers offer formal batch APIs, but the ones that do provide significant discounts:
| Provider | Batch API Available | Discount | Turnaround |
|---|---|---|---|
| OpenAI | Yes | 50% off input/output tokens | Up to 24 hours |
| Anthropic | Yes (Message Batches) | 50% off | Up to 24 hours |
| Google AI | Gemini API — Context caching (reduced input costs); batch available via Vertex AI | Varies | Up to 24 hours |
| DeepSeek | No formal batch API | -- | -- |
| Together AI | No formal batch API | -- | -- |
| Fireworks AI | No formal batch API | -- | -- |
Even for providers without batch APIs, application-level batching still saves money by reducing the number of calls and the overhead tokens (system prompts, instructions) that get repeated with every request.
Application-Level Batching in Practice
Here is a concrete example. Say you are extracting structured data from customer reviews.
Before: One Request Per Review
Prompt (sent 100 times):
"Extract the sentiment (positive/negative/neutral) and key topics
from this review: [single review text]"
Cost: 100 API calls x ~150 input tokens each = ~15,000 input tokens
Plus system prompt repeated 100x = ~10,000 additional tokens
Total input: ~25,000 tokens
After: Batched Into Groups of 10
Prompt (sent 10 times):
"Extract the sentiment (positive/negative/neutral) and key topics
from each of the following 10 reviews. Return results as a JSON array.
Review 1: [text]
Review 2: [text]
...
Review 10: [text]"
Cost: 10 API calls x ~1,200 input tokens each = ~12,000 input tokens
Plus system prompt repeated 10x = ~1,000 additional tokens
Total input: ~13,000 tokens
That is a 48% reduction in input tokens just from eliminating redundant system prompts and instructions. The output token savings are smaller but still meaningful -- the model returns a structured array instead of 100 individual responses with repeated formatting.
Choosing the Right Batch Size
Bigger batches are not always better. There is a sweet spot:
| Batch Size | Pros | Cons |
|---|---|---|
| 1 (no batching) | Simple, easy error handling | Maximum cost, redundant tokens |
| 5-10 | Good cost savings, reliable output | Slightly more complex parsing |
| 20-50 | Maximum token savings | Higher risk of truncation, harder to retry on failure |
| 100+ | Diminishing returns | Models lose accuracy on items at the end of long lists |
We recommend batches of 5-15 items for most use cases. Above 20, accuracy starts to degrade as models pay less attention to items later in the list.
Beyond Batching: Complementary Cost Strategies
Batching works best when combined with other optimization techniques.
Prompt Caching
If you send the same system prompt or context with every request, look into prompt caching. Anthropic and OpenAI both offer prompt caching that reduces costs on repeated prefixes. For applications where the system prompt is much larger than the user input, caching can cut input costs by 80-90% on cached portions.
Model Routing
Not every task needs your most expensive model. A classification task that GPT-4o handles at $0.0025/1K input tokens might be handled equally well by a model at one-tenth the cost.
This is where testing across providers pays off. Run your batch workload through Promptster with a sample of your actual data. You might find that a cheaper model produces identical results for your specific task, even if it scores lower on general benchmarks.
Deduplication
Before sending a batch, check for duplicate or near-duplicate inputs. If 100 users ask roughly the same question, you only need to process the unique variants and cache the results. Simple string hashing catches exact duplicates; embedding-based similarity catches near-duplicates.
Response Length Control
Set max_tokens appropriately for your task. If you only need a one-word classification, do not leave max_tokens at 4,000. Every output token costs money, and models will use available space if you let them.
Calculating Your Potential Savings
Here is a framework for estimating what batching could save you. Use your actual numbers:
| Metric | Before Batching | After Batching (batch of 10) |
|---|---|---|
| Monthly API calls | 30,000 | 3,000 |
| Avg input tokens per call | 250 | 1,800 |
| Total input tokens | 7,500,000 | 5,400,000 |
| System prompt overhead | 3,000,000 (repeated) | 300,000 (10x fewer) |
| Total input tokens | 10,500,000 | 5,700,000 |
| Savings | -- | ~46% |
Add a provider batch API discount on top of that and you are looking at 60%+ total savings.
Finding the Cheapest Provider for Your Batch Workload
The cheapest provider depends on your specific task, not just the published per-token rates. A model with lower token pricing but worse accuracy might require more retries, wiping out the savings.
The most reliable way to find your optimal provider is to test with real data. Take a sample batch from your workload, run it through multiple providers in Promptster, and compare both cost and quality. Promptster shows real-time cost calculations based on actual token usage, so you can see exactly what each provider charges for your specific prompts.
For a deeper dive into finding the cheapest model for different task types, check out our guide on finding the cheapest AI model for your use case.
Start Optimizing
Pick one high-volume API call in your application, batch it, and measure the difference. The savings compound quickly -- a 50% reduction on your most expensive endpoint often translates to a 20-30% reduction in your total AI spend.
Test your batched prompts across providers in Promptster to make sure quality holds up at your chosen batch size. The goal is to spend less without getting worse results, and the only way to verify that is to measure both.