Skip to main content

Unlimited Rate Limits

SaveGate removes the TPM (Tokens Per Minute) and RPM (Requests Per Minute) restrictions that plague direct provider APIs.

No More Rate Limit Errors

Scale your applications without worrying about hitting rate limits. SaveGate provides maximum available throughput for all models.

Why SaveGate Has No Limits

SaveGate uses enterprise-grade infrastructure with:
  • Load balancing across multiple accounts
  • Automatic failover
  • Distributed request handling
  • Optimized routing
Direct provider limits (for reference):OpenAI Free Tier:
  • GPT-4: 40K TPM, 500 RPM
  • GPT-3.5: 90K TPM, 3,500 RPM
Anthropic Free Tier:
  • Claude: 50K TPM, 50 RPM
SaveGate:
  • All models: Unlimited TPM/RPM ✨
  1. Pooled Accounts: Shared infrastructure spreads load
  2. Smart Routing: Requests distributed optimally
  3. Enterprise Agreements: Higher base limits
  4. Automatic Scaling: Dynamic resource allocation

Fair Usage

While we don’t impose hard limits, we ask for responsible usage:

Reasonable Requests

Make requests at a reasonable pace for your use case. No need to throttle, but avoid intentional abuse.

No DDoS

Don’t use SaveGate for DDoS attacks or similar malicious activities. This violates our terms of service.

Production Use

SaveGate is built for production. Feel free to scale without worry.

Monitor Usage

Track your usage in the dashboard to understand patterns and optimize costs.

Best Practices

Even without rate limits, follow these best practices:

1. Implement Retry Logic

Always implement exponential backoff for transient errors:
import time
from openai import OpenAI

client = OpenAI(
    api_key="your-savegate-api-key",
    base_url="https://api.savegate.ai/v1"
)

def chat_with_retry(message, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=[{"role": "user", "content": message}]
            )
            return response.choices[0].message.content
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)

2. Use Async/Concurrent Requests

Process multiple requests efficiently:
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="your-savegate-api-key",
    base_url="https://api.savegate.ai/v1"
)

async def process_multiple_messages(messages):
    tasks = []
    for msg in messages:
        task = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": msg}]
        )
        tasks.append(task)

    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# Usage
messages = ["Message 1", "Message 2", "Message 3"]
results = asyncio.run(process_multiple_messages(messages))

3. Batch When Possible

For compatible use cases, batch multiple items in a single request:
# Instead of multiple requests:
for item in items:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": f"Process: {item}"}]
    )

# Batch in one request:
batch_content = "\n".join([f"{i}. {item}" for i, item in enumerate(items)])
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": f"Process these items:\n{batch_content}"}]
)

4. Monitor Your Usage

Keep track of your API usage:
1

Check Dashboard

View real-time usage statistics in your SaveGate Dashboard
2

Set Alerts

Configure alerts for unusual usage patterns or budget thresholds
3

Analyze Patterns

Review usage trends to optimize your application

Streaming Responses

Streaming is especially valuable without rate limits:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Performance Metrics

SaveGate delivers excellent performance:

Response Time

150ms average to first tokenFaster than most direct API calls due to optimized routing

Throughput

10M+ requests/dayProven scale with enterprise customers

Uptime

99.9% SLAAutomatic failover ensures reliability

Latency

Under 50ms p99Consistent performance even at scale

Handling Errors

Even without rate limits, handle errors gracefully:
from openai import OpenAI, APIError, APIConnectionError, RateLimitError

client = OpenAI(
    api_key="your-savegate-api-key",
    base_url="https://api.savegate.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    # Should be rare with SaveGate
    print(f"Rate limit hit (unusual): {e}")
except APIConnectionError as e:
    # Network error
    print(f"Connection error: {e}")
except APIError as e:
    # Other API errors
    print(f"API error: {e}")

Enterprise Features

For enterprise customers, we offer additional features:
  • Dedicated Capacity: Reserved throughput for your applications
  • Custom Limits: Set your own internal rate limits
  • Priority Routing: Guaranteed low-latency access
  • SLA Guarantees: Contractual uptime commitments

Contact Sales

Learn about enterprise features and custom configurations

Migration from Rate-Limited APIs

If you’re migrating from rate-limited APIs:
You can safely remove rate limiting and throttling code:
# Before (with rate limiting)
rate_limiter = RateLimiter(max_requests=10, time_window=60)

for item in items:
    rate_limiter.wait_if_needed()  # Not needed with SaveGate!
    response = client.chat.completions.create(...)

# After (with SaveGate)
for item in items:
    response = client.chat.completions.create(...)
Complex queuing systems can be simplified:
# Before (with queuing)
request_queue = Queue(maxsize=100)
# Complex queue management...

# After (with SaveGate)
# Just make requests directly!
responses = await asyncio.gather(*[
    client.chat.completions.create(...) for item in items
])

Questions?

Need Help?

Contact our support team if you have questions about rate limits or scaling