|

Optimizing Nano Banana 2 API concurrency performance: 5 practical tips for bandwidth, memory, and Base64 image transmission

"How much concurrency should I set?"—this is the question developers ask most often when using the Nano Banana 2 API for batch image generation. The answer doesn't lie in platform limits, but in how much Base64 image data your bandwidth and memory can handle.

Core Value: After reading this article, you'll understand the primary bottlenecks of concurrent Nano Banana 2 API calls, learn how to calculate the optimal concurrency based on your server's capacity, and gain 5 proven performance optimization tips.

nano-banana-2-api-concurrency-bandwidth-optimization-guide-en 图示

The Core Issue with Nano Banana 2 API Concurrency: The Bottleneck is Your Pipeline, Not the Platform

Many developers' first reaction is, "How much concurrency can the platform support?" But in reality, the APIYI platform doesn't limit concurrency. An RPM (Requests Per Minute) of 1,000 per user is perfectly fine, and we can even increase your quota if needed.

The real bottleneck is this: The Gemini image generation API uses Base64 encoding to transmit image data. This means every image upload and download is a massive JSON string rather than an efficient binary stream. This puts immense pressure on your bandwidth and memory.

Why Base64 is the Core Concurrency Bottleneck

The official Gemini API (including the gemini-3.1-flash-image-preview used by Nano Banana 2) only supports Base64 encoding for image transmission. Base64 encoding inflates binary data by approximately 33%, which means:

Resolution Original Image Size After Base64 Encoding Single API Response Size
512px (0.5K) ~400 KB ~530 KB ~600 KB – 1 MB
1K (Default) ~1.5 MB ~2 MB ~2 MB
2K ~4 MB ~5.3 MB ~5-8 MB
4K ~15 MB ~20 MB ~20 MB

A 4K image API response is 20 MB. If you initiate 10 concurrent 4K requests simultaneously, you'll have 200 MB of response data flowing through your network and memory.

Nano Banana 2 API Parameter Quick Reference

Parameter Value
Model ID gemini-3.1-flash-image-preview
Input Context 131,072 tokens
Output Limit 32,768 tokens
Supported Resolutions 512px / 1K / 2K / 4K
Supported Aspect Ratios 14 types (1:1, 3:2, 4:3, 16:9, 9:16, 21:9, etc.)
Max Reference Images 14 (10 objects + 4 characters)
Generation Speed 3-5 seconds/image
APIYI RPM 1000/user (quota can be increased)
APIYI Concurrency Limit Unlimited

🎯 Technical Advice: The APIYI (apiyi.com) platform places no limits on Nano Banana 2 concurrency, and the RPM supports 1,000 requests per user. The bottleneck lies in your local environment—your bandwidth and memory determine how much concurrency you can actually handle.


Calculating Nano Banana 2 API Concurrency: Choosing the Best Plan for Your Environment

You shouldn't just guess your concurrency limit; it needs to be calculated based on your actual environment. There are three key metrics: bandwidth, memory, and target resolution.

nano-banana-2-api-concurrency-bandwidth-optimization-guide-en 图示

Step 1: Confirm Your Bandwidth

Bandwidth determines how much data can be transmitted simultaneously. The formula is:

Max Concurrency (Bandwidth) = Available Bandwidth (MB/s) ÷ Single Response Size (MB)
Network Environment Available Bandwidth 1K Concurrency Limit 2K Concurrency Limit 4K Concurrency Limit
Home Broadband (100Mbps) ~12 MB/s 6 2 0-1
Enterprise Network (500Mbps) ~60 MB/s 30 10 3
Cloud Server (1Gbps) ~120 MB/s 60 20 6
High-Performance Server (10Gbps) ~1200 MB/s 600 200 60

Step 2: Confirm Your Available Memory

Each concurrent request must hold the full Base64 response data in memory until decoding and disk writing are complete. The memory formula is:

Required Memory = Concurrency × Single Response Size × 2.5 (Decoding Buffer Factor)

We multiply by 2.5 because during Base64 decoding, both the original string and the decoded binary data exist in memory simultaneously, plus the overhead of JSON parsing.

Available Memory 1K Concurrency Limit 2K Concurrency Limit 4K Concurrency Limit
2 GB 400 100 40
4 GB 800 200 80
8 GB 1600 400 160

Step 3: Take the Lower of the Two

Recommended Concurrency = min(Bandwidth Limit, Memory Limit)

In practice, for most scenarios, bandwidth is the true bottleneck, not memory.

Recommended Concurrency for Actual Scenarios

Scenario Recommended Resolution Recommended Concurrency Expected Throughput
Personal Dev/Testing 1K 3-5 ~1 image/sec
Small Team Batch Generation 1K 10-20 ~4 images/sec
Enterprise Production 1K-2K 20-50 ~10 images/sec
High-Throughput Image Service 1K 50-100 ~20 images/sec
Need 4K HD Images 4K 3-5 ~1 image/sec

💡 Practical Advice: If you're unsure about how much concurrency to set, start with 5, then gradually increase to 10 or 20 while monitoring response times and error rates. If response times rise significantly or you start seeing timeouts, you're approaching your bottleneck. When testing on the APIYI (apiyi.com) platform, don't worry about platform-side limits; just focus on your local performance.

Nano Banana 2 API Quick Start: 3 Steps to Integration

Step 1: Install Dependencies

pip install openai Pillow

Step 2: Minimal Invocation Example

import openai
import base64
from pathlib import Path

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"  # APIYI unified interface
)

response = client.chat.completions.create(
    model="gemini-3.1-flash-image-preview",
    messages=[
        {
            "role": "user",
            "content": "Generate a cute cat wearing sunglasses on a beach"
        }
    ]
)

# Extract Base64 image data and save
for part in response.choices[0].message.content:
    if hasattr(part, "image") and part.image:
        img_bytes = base64.b64decode(part.image.data)
        Path("output.png").write_bytes(img_bytes)
        print("Image saved: output.png")
View full code for concurrent batch generation
import openai
import base64
import asyncio
import aiohttp
import time
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"  # APIYI unified interface
)

# Configuration parameters
MAX_CONCURRENCY = 10       # Maximum concurrency, adjust based on your bandwidth
OUTPUT_DIR = Path("output")
OUTPUT_DIR.mkdir(exist_ok=True)

def generate_single_image(prompt: str, index: int) -> dict:
    """Generate a single image and save it immediately to free up memory"""
    start = time.time()
    try:
        response = client.chat.completions.create(
            model="gemini-3.1-flash-image-preview",
            messages=[{"role": "user", "content": prompt}]
        )

        for part in response.choices[0].message.content:
            if hasattr(part, "image") and part.image:
                # Decode and save immediately to prevent Base64 strings from occupying memory
                img_bytes = base64.b64decode(part.image.data)
                filepath = OUTPUT_DIR / f"image_{index:04d}.png"
                filepath.write_bytes(img_bytes)

                elapsed = time.time() - start
                size_mb = len(img_bytes) / (1024 * 1024)
                return {
                    "index": index,
                    "success": True,
                    "time": elapsed,
                    "size_mb": size_mb,
                    "path": str(filepath)
                }

    except Exception as e:
        return {
            "index": index,
            "success": False,
            "error": str(e),
            "time": time.time() - start
        }

def batch_generate(prompts: list[str]):
    """Use a thread pool for concurrent image generation"""
    results = []
    total = len(prompts)
    completed = 0

    with ThreadPoolExecutor(max_workers=MAX_CONCURRENCY) as executor:
        futures = {
            executor.submit(generate_single_image, p, i): i
            for i, p in enumerate(prompts)
        }

        for future in futures:
            result = future.result()
            completed += 1
            status = "OK" if result["success"] else "FAIL"
            print(f"[{completed}/{total}] {status} - {result['time']:.1f}s")
            results.append(result)

    # Statistics
    success = [r for r in results if r["success"]]
    print(f"\nFinished: {len(success)}/{total} successful")
    if success:
        avg_time = sum(r["time"] for r in success) / len(success)
        total_size = sum(r["size_mb"] for r in success)
        print(f"Average time: {avg_time:.1f}s | Total size: {total_size:.1f} MB")

# Usage example
prompts = [
    "A futuristic city at sunset",
    "A cozy coffee shop interior",
    "An underwater coral reef scene",
    "A mountain landscape with aurora",
    "A cute robot playing guitar",
]

batch_generate(prompts)

Step 3: Uploading a Reference Image (Image-to-Image)

Image-to-image scenarios require uploading a reference image, also using Base64 encoding:

import base64

# Read local image and convert to Base64
with open("reference.png", "rb") as f:
    img_base64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemini-3.1-flash-image-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Convert this photo into a watercolor painting style"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{img_base64}"
                    }
                }
            ]
        }
    ]
)

Note: When uploading a reference image, the total request size must not exceed 20 MB. If the reference image is large, we recommend compressing it to below 1K resolution first.


5 Practical Tips for Nano Banana 2 API Concurrency Optimization

nano-banana-2-api-concurrency-bandwidth-optimization-guide-en 图示

Tip 1: Choose Resolution on Demand, Avoid Default 4K

This is the simplest and most effective optimization. Many developers default to 4K requests, but 1K is sufficient for most scenarios:

Use Case Recommended Resolution Single File Size Concurrency Efficiency
Social Media 1K ~2 MB High
E-commerce 2K ~6 MB Medium
Print/Poster 4K ~20 MB Low
Preview/Thumbnail 512px ~0.7 MB Very High

Switching from 4K to 1K increases concurrency capacity by about 10x under the same conditions.

Tip 2: Streaming Reception + Immediate Disk Writing

Don't wait for the entire JSON response to be received before processing. Use streaming to decode and write to disk as data arrives:

import gc

def generate_and_save(prompt, filepath):
    """Generate image and save immediately, actively releasing memory"""
    response = client.chat.completions.create(
        model="gemini-3.1-flash-image-preview",
        messages=[{"role": "user", "content": prompt}]
    )

    for part in response.choices[0].message.content:
        if hasattr(part, "image") and part.image:
            # Decode immediately
            img_bytes = base64.b64decode(part.image.data)
            # Clear Base64 string reference immediately
            del part.image.data
            # Write to disk immediately
            Path(filepath).write_bytes(img_bytes)
            del img_bytes
            gc.collect()  # Manually trigger garbage collection

Tip 3: Token Bucket Rate Limiter to Control Concurrency

Don't send all requests at once; use the token bucket algorithm to distribute requests evenly:

import threading
import time

class TokenBucket:
    """Token bucket rate limiter"""
    def __init__(self, rate: float, capacity: int):
        self.rate = rate          # Refill rate per second
        self.capacity = capacity  # Bucket capacity
        self.tokens = capacity
        self.lock = threading.Lock()
        self.last_refill = time.monotonic()

    def acquire(self):
        while True:
            with self.lock:
                now = time.monotonic()
                elapsed = now - self.last_refill
                self.tokens = min(
                    self.capacity,
                    self.tokens + elapsed * self.rate
                )
                self.last_refill = now
                if self.tokens >= 1:
                    self.tokens -= 1
                    return
            time.sleep(0.05)

# Usage: Max 10 requests per second, peak 20
limiter = TokenBucket(rate=10, capacity=20)

def rate_limited_generate(prompt, index):
    limiter.acquire()  # Wait for token
    return generate_single_image(prompt, index)

Tip 4: Exponential Backoff for 429 Errors

When encountering rate limits (HTTP 429), use an exponential backoff strategy:

import random

def generate_with_retry(prompt, index, max_retries=5):
    """Retry mechanism with exponential backoff"""
    for attempt in range(max_retries):
        try:
            return generate_single_image(prompt, index)
        except openai.RateLimitError:
            delay = min(60, (2 ** attempt)) + random.uniform(0, 0.5)
            print(f"Rate limited, retrying in {delay:.1f}s...")
            time.sleep(delay)
    return {"index": index, "success": False, "error": "max retries"}

Tip 5: Use Batch API for Bulk Tasks to Save 50%

For bulk tasks that don't require real-time results, Nano Banana 2 supports the Batch API, cutting costs in half:

Mode 1K Image Unit Price 4K Image Unit Price Latency Suitable Scenario
Real-time API $0.067 $0.151 3-5s Interactive apps
Batch API $0.034 $0.076 Minutes-Hours Bulk pre-generation

💰 Cost Optimization: If your scenario allows for waiting, calling the Batch API via APIYI (apiyi.com) can save 50% on costs. This is especially suitable for bulk e-commerce product image generation, marketing material pre-production, etc.

Nano Banana 2 API: A Deep Dive into Resolution Costs and Token Consumption

Understanding token consumption is key to keeping your costs in check. Here’s the breakdown:

Resolution Output Token Consumption Standard Price Batch Price (50% off) Cost per 100 Images
512px 747 tokens $0.045 $0.022 $4.50 / $2.20
1K 1,120 tokens $0.067 $0.034 $6.70 / $3.40
2K 1,680 tokens $0.101 $0.050 $10.10 / $5.00
4K 2,520 tokens $0.151 $0.076 $15.10 / $7.60

🚀 Quick Start: You can call Nano Banana 2 via the APIYI (apiyi.com) platform. We offer the same pricing as the official source, with no concurrency limits and support for 1,000 RPM per user. Sign up now to get your testing credits.


Nano Banana 2 vs. Previous Generations

Comparison Item Nano Banana Nano Banana Pro Nano Banana 2
Model ID gemini-2.5-flash (Image) gemini-3-pro-image-preview gemini-3.1-flash-image-preview
Max Resolution 1024×1024 4K 4K
1K Unit Price $0.039 $0.134 $0.067
4K Unit Price N/A $0.240 $0.151
Generation Speed 2-4 seconds 5-8 seconds 3-5 seconds
Batch API No No Yes (50% off)
Max Reference Images 5 10 14
Available on APIYI

Compared to the Pro version, Nano Banana 2 offers a 37% reduction in 4K pricing and a 40% boost in speed, all while adding support for the Batch API.

Nano Banana 2 API Concurrency Performance Monitoring

When running concurrent tasks, it's recommended to monitor the following metrics:

import psutil
import time

class PerformanceMonitor:
    """Concurrency performance monitor"""
    def __init__(self):
        self.start_time = time.time()
        self.request_count = 0
        self.total_bytes = 0
        self.errors = 0

    def record(self, success: bool, size_bytes: int = 0):
        self.request_count += 1
        if success:
            self.total_bytes += size_bytes
        else:
            self.errors += 1

    def report(self):
        elapsed = time.time() - self.start_time
        mem = psutil.Process().memory_info().rss / (1024**2)

        print(f"--- Performance Report ---")
        print(f"Runtime: {elapsed:.1f}s")
        print(f"Completed requests: {self.request_count}")
        print(f"Success rate: {(self.request_count-self.errors)/max(1,self.request_count)*100:.1f}%")
        print(f"Throughput: {self.request_count/elapsed:.2f} req/s")
        print(f"Data volume: {self.total_bytes/(1024**2):.1f} MB")
        print(f"Bandwidth usage: {self.total_bytes/(1024**2)/elapsed:.1f} MB/s")
        print(f"Memory usage: {mem:.0f} MB")

FAQ

Q1: Does the APIYI platform impose concurrency limits on Nano Banana 2?

The APIYI platform does not limit the concurrency for Nano Banana 2. The default RPM (requests per minute) is 1,000 per user, and you can contact customer support if you need a higher quota. The actual concurrency bottleneck usually depends on your local bandwidth and memory. We recommend running tests via the APIYI apiyi.com platform to find the optimal concurrency for your specific environment.

Q2: Why does the Gemini image API only support Base64 transmission?

This is a current design choice for the Google Gemini API. Base64 encoding allows image data to be embedded directly into JSON responses without needing extra file storage or CDN distribution. The downside is that it increases data size by about 33%, which isn't ideal for bandwidth and memory. The developer community has provided feedback to Google requesting JPEG output and temporary download URLs, but these features haven't been implemented yet.

Q3: Is there a significant difference between 1K and 4K resolution?

It depends on your use case. For social media images, web displays, or app interfaces, 1K resolution is usually sufficient, and the difference is barely noticeable to the naked eye. 4K is primarily for printing, posters, high-definition wallpapers, or scenarios where you need to zoom in to see fine details. We suggest testing with 1K first and only switching to 4K if you confirm you need higher clarity. You can flexibly switch resolutions at any time via APIYI apiyi.com.

Q4: What should I do if I encounter frequent 429 errors?

A 429 error means you've hit a rate limit. Solutions include: (1) reducing the concurrency; (2) using a token bucket rate limiter to distribute requests evenly; (3) implementing exponential backoff retries; or (4) switching to the Batch API for bulk tasks. If you encounter rate limiting on the APIYI platform, feel free to contact customer support to increase your RPM quota.

Q5: How can I estimate the total cost of batch generation?

Use the formula: Total Cost = Number of Images × Unit Price. For example, generating 1,000 1K images: Standard mode costs 1,000 × $0.067 = $67, while Batch mode costs 1,000 × $0.034 = $34. Pricing on APIYI apiyi.com is consistent with official rates and supports flexible top-ups, making it perfect for pay-as-you-go needs.


Summary: Finding the Best Concurrency Strategy for Your Nano Banana 2 API

The key to optimizing concurrency for the Nano Banana 2 API isn't about "how much the platform allows," but rather "how much your pipeline can handle." Keep these 3 key points in mind:

  1. Resolution is Everything: Scaling down from 4K to 1K can boost your concurrency by 10x and cut costs by 56%.
  2. Bandwidth is the Real Bottleneck: Base64 encoding makes every image 33% larger than its actual size, putting far more pressure on your bandwidth than on your CPU.
  3. Scale Up Gradually: Start with 5 concurrent requests, monitor your response times and error rates, and then slowly dial it up to the sweet spot.

We recommend using the APIYI (apiyi.com) platform to call the Nano Banana 2 API. It offers unlimited concurrency, 1000 RPM per user, and pricing that matches the official rates—letting you focus on optimizing your pipeline performance without worrying about platform-side limitations.

nano-banana-2-api-concurrency-bandwidth-optimization-guide-en 图示

References

  1. Gemini 3.1 Flash Image Preview: Model specifications and API documentation

    • Link: ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-image-preview
  2. Gemini Image Generation API: User guide for the image generation API

    • Link: ai.google.dev/gemini-api/docs/image-generation
  3. Gemini API Rate Limits: Official rate limit documentation

    • Link: ai.google.dev/gemini-api/docs/rate-limits
  4. APIYI Nano Banana 2 Integration Documentation: Unified API interface specifications

    • Link: api.apiyi.com

📝 Author: APIYI Team | The technical team at APIYI specializes in the AI image generation API field. Through apiyi.com, we provide developers with Nano Banana 2 API access featuring unlimited concurrency and flexible billing.

Similar Posts