Nano Banana Pro Multi-turn Dialogue Image Generation API Practice: 3 Steps to Build Contextual Image Generation

title: "Mastering Multi-turn Image Generation with Nano Banana Pro: A Developer's Guide"
description: "Learn how to implement multi-turn image generation with Nano Banana Pro. Master the contents array, thoughtSignature mechanism, and state management for seamless workflows."

Author's Note: This guide provides a deep dive into the field structure, contents array construction, thoughtSignature mechanism, and code implementation for the Nano Banana Pro (gemini-3-pro-image-preview) multi-turn image generation API.

Many developers encounter the same hurdle when first integrating Nano Banana Pro: on the gemini.google.com web interface, you can follow up with prompts like "change the background to sunset" or "add a cat," and the model remembers the previous image perfectly. However, when calling the official API, the model seems to have "amnesia." The reason is that the Gemini API is stateless; multi-turn context must be manually constructed by the caller. This article will clarify the underlying fields of the Nano Banana Pro multi-turn API, provide both Python SDK and REST implementations, and explain the critical thoughtSignature mechanism to help you build a smooth, web-like image generation experience in just three steps.

Core Value: After reading this, you'll master the correct construction of the contents array, be able to implement "edit based on the previous image" workflows in your own applications, and avoid the three common pitfalls: "image amnesia," "token waste," and "signature loss."

Nano Banana Pro Multi-turn Image Generation: Key Takeaways

Key Point	Description	Value
Stateless API	The `gemini-3-pro-image-preview` interface doesn't remember any history.	Multi-turn context must be actively maintained by the caller.
Contents Array	Alternating `user`/`model` roles; carry full history in each request.	Allows the model to "see" past conversations in one go.
Image Feedback	Previously generated images must be fed back into `contents` as `inline_data`.	Enables the model to perform continuous edits instead of regenerating.
thoughtSignature	Encrypted thought signature that preserves reasoning context across turns.	Ensures critical editing instructions aren't forgotten.
SDK Automation	The official Python SDK's `chat` object automatically manages history.	Saves 80% of code compared to direct REST migration.

The Essential Difference Between Multi-turn Generation and Web-based Agents

gemini.google.com is a Google-built Agent application. It maintains a complete "conversation state" on the frontend (including text, generated images, and thought signatures for each turn). Every time you input a new message, this Agent packages all history and sends it to the underlying model. This is why the web experience is so seamless—the Agent handles all the "memory" work.

When you call the generateContent API directly, you're getting a "naked" model invocation interface. Each HTTP request is an independent inference, and the model has no concept of your previous conversation. To replicate the multi-turn experience of the web version, you essentially need to implement an Agent yourself in your code—filling the contents with historical user messages, model responses (including images and signatures) according to the specification, and then initiating the request.

Nano Banana Pro Multi-turn Image Generation: Field Structure Explained

Core Specifications for the `contents` Array

The contents field is the standard way the Gemini API represents conversation history. It's a JSON array where each element represents a single turn in the conversation:

Field	Type	Description
`role`	string	`"user"` or `"model"`, must strictly alternate
`parts`	array	Content fragments for that turn, can mix text/images/signatures
`parts[].text`	string	Text content, such as instructions or dialogue
`parts[].inline_data.mime_type`	string	Image format, typically `"image/png"`
`parts[].inline_data.data`	string	Base64 encoded image data
`parts[].thought_signature`	string	Encrypted signature generated by the model (appears only in `model` role)

A complete two-turn conversation request body looks like this:

{
  "contents": [
    {"role": "user", "parts": [{"text": "Generate an image of a golden retriever running on the beach"}]},
    {"role": "model", "parts": [
      {"inline_data": {"mime_type": "image/png", "data": "<base64 of the first generated image>"}},
      {"thought_signature": "<encrypted signature>"}
    ]},
    {"role": "user", "parts": [{"text": "Change the scene to sunset"}]}
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {"aspectRatio": "16:9", "imageSize": "2K"}
  }
}

Two Ways to Pass Back Images

In the second turn, the model must be able to "see" the image generated in the first turn. Nano Banana Pro supports two ways to handle this:

# Method 1: inline_data with embedded base64 (best for small images, simple and direct)
{
    "inline_data": {
        "mime_type": "image/png",
        "data": base64.b64encode(image_bytes).decode()
    }
}

# Method 2: file_data referencing resources uploaded via Files API (best for large images or reuse)
{
    "file_data": {
        "mime_type": "image/png",
        "file_uri": "files/abc123xyz"
    }
}

Pro Tip: inline_data is the most common way for direct, one-off scenarios. The file_data reference mode is ideal for scenarios where you need to reuse the same large image across multiple turns, significantly reducing request body size and upload overhead.

Nano Banana Pro Multi-turn Image Generation: Quick Start

Minimalist Example (Automatic Management with Python SDK)

If you're using the official Python SDK, you only need 10 lines of code:

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")
chat = client.chats.create(model="gemini-3-pro-image-preview")

# Turn 1: Generate initial image
r1 = chat.send_message("Generate an image of a golden retriever running on the beach")

# Turn 2: Edit based on the first image (the chat object automatically carries the history)
r2 = chat.send_message("Change the scene to sunset and add a flying seagull")

# Turn 3: Continue with further modifications
r3 = chat.send_message("Change the dog's color to dark brown")

The chat object maintains the full contents list internally (including the thoughtSignature for each turn), so you don't have to worry about field details. Every send_message call automatically packages and sends the history.

View Full Calling Example for OpenAI-Compatible Interfaces

If you are using an OpenAI-compatible platform like APIYI (apiyi.com) to call Nano Banana Pro, you can reuse the OpenAI SDK directly:

import openai
import base64

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://vip.apiyi.com/v1"
)

# Maintain a local messages list (equivalent to the contents concept)
messages = [
    {"role": "user", "content": "Generate an image of a golden retriever running on the beach"}
]

# Turn 1
response1 = client.chat.completions.create(
    model="gemini-3-pro-image-preview",
    messages=messages
)
img1_url = response1.choices[0].message.content  # Extract image URL or base64

# Add model response to history
messages.append({"role": "assistant", "content": img1_url})

# Turn 2: Add new instruction
messages.append({"role": "user", "content": "Change the scene to sunset"})
response2 = client.chat.completions.create(
    model="gemini-3-pro-image-preview",
    messages=messages
)

# Continue for Turn 3...
messages.append({"role": "assistant", "content": response2.choices[0].message.content})
messages.append({"role": "user", "content": "Add a flying seagull"})
response3 = client.chat.completions.create(
    model="gemini-3-pro-image-preview",
    messages=messages
)

Key Point: In OpenAI-compatible mode, the messages array is equivalent to the native contents array. The role field is changed from "model" to "assistant", and the platform layer handles the conversion automatically.

Recommendation: For multi-turn editing scenarios, we recommend using the SDK's chat object or maintaining a local messages list to avoid manual concatenation of contents every time. You can register for free credits at APIYI (apiyi.com) to get started with the SDK before considering REST optimizations.

Nano Banana Pro Multi-turn Image Generation via Manual REST

Pure REST Implementation Without SDK Dependencies

In certain scenarios (such as backend proxies, ComfyUI nodes, or low-code platforms), you might not be able to use the official SDK and will need to construct REST requests manually. Here is the complete curl implementation:

# Round 1: Generate an image from a text prompt
curl -X POST \
  "https://vip.apiyi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Generate a golden retriever running on the beach"}]}
    ],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

# The response will contain: parts[0].inline_data.data (base64 image)
# and parts[0].thought_signature

For the second round of requests, you must include the entire model response from the first round (including the image and signature) back into the contents array:

curl -X POST \
  "https://vip.apiyi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Generate a golden retriever running on the beach"}]},
      {"role": "model", "parts": [
        {"inline_data": {"mime_type": "image/png", "data": "<base64 from round 1>"}},
        {"thought_signature": "<signature from round 1>"}
      ]},
      {"role": "user", "parts": [{"text": "Change the scene to sunset"}]}
    ],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

Comparison of Three Invocation Methods

Method	History Management	Best For	Learning Curve
Official Python SDK (`chat` object)	Automatic	Backend services, Notebook experiments	⭐ Lowest
OpenAI Compatible API (messages array)	Semi-automatic	Migrating existing OpenAI projects	⭐⭐ Low
Native REST (contents array)	Fully manual	ComfyUI, low-code, cross-language	⭐⭐⭐ Medium

Data Note: The chart above illustrates the core differences between automatic Agent management and manual API management. You can compare the actual performance of both invocation methods directly on the APIYI (apiyi.com) platform.

Nano Banana Pro Multi-turn Image Generation: The thoughtSignature Mechanism

What is thoughtSignature?

thoughtSignature is the "encrypted thought signature" introduced in the Gemini 3 series. It's a compact encoding of the model's internal reasoning state. While it's not human-readable, the model uses it to quickly restore context in subsequent turns. Its key roles include:

Preserving Decision Details: For example, if the model "decides" on a light color palette in the first turn, it inherits this style in the second turn via the signature.
Improving Consistency: Keeps characters, scenes, and compositions stable across multi-turn edits.
Saving Tokens: Eliminates the need to repeatedly state "maintain the original style" in your prompt.

When must you include the signature?

Scenario	Signature Required?
Single-turn independent request (one-off image)	❌ No
Multi-turn editing (modifying based on previous image)	✅ Yes
Restoring history across sessions	✅ Yes (must be persisted manually)
Text-only conversation (no images)	✅ Yes, for reasoning continuity

Practical Guide: Code Pattern for Manual Signature Management

import requests
import base64
import json

API_BASE = "https://vip.apiyi.com/v1beta"
MODEL = "gemini-3-pro-image-preview"
HEADERS = {
    "x-goog-api-key": "YOUR_API_KEY",
    "Content-Type": "application/json"
}

class NanoBananaChat:
    """A minimalist chat client that manually maintains contents + signature"""
    def __init__(self):
        self.contents = []

    def send(self, text: str, attach_image_b64: str = None) -> dict:
        # Construct the user message for this turn
        user_parts = [{"text": text}]
        if attach_image_b64:
            user_parts.append({
                "inline_data": {"mime_type": "image/png", "data": attach_image_b64}
            })
        self.contents.append({"role": "user", "parts": user_parts})

        # Send the request
        resp = requests.post(
            f"{API_BASE}/models/{MODEL}:generateContent",
            headers=HEADERS,
            json={
                "contents": self.contents,
                "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
            }
        ).json()

        # Append the model response (including signature) back to contents as-is
        model_parts = resp["candidates"][0]["content"]["parts"]
        self.contents.append({"role": "model", "parts": model_parts})
        return model_parts

# Usage Example
chat = NanoBananaChat()
parts1 = chat.send("Generate a golden retriever running on the beach")
parts2 = chat.send("Change the scene to sunset")  # Automatically carries history and signature
parts3 = chat.send("Add a flying seagull")

Optimization Tip: When accessing via APIYI (apiyi.com), the platform layer will pass through the thought_signature field as-is. Developers only need to ensure they "append the entire model parts array back to contents"—no need to worry about the specific content of the signature itself.

Nano Banana Pro Multi-turn Image Generation: Practical Scenarios

Scenario 1: Progressive Brand Design

A common marketing requirement: based on a product concept image, gradually adjust the copy, color scheme, and layout. The advantage of the multi-turn image generation API is that you only need to describe the "incremental changes" each time, rather than describing the entire image from scratch:

chat = client.chats.create(model="gemini-3-pro-image-preview")

chat.send_message("Design a coffee brand poster with a dark blue gradient background, place the product image on the left")
chat.send_message("Change the headline copy to 'Awaken Your Morning'")
chat.send_message("Add a QR code placeholder in the bottom right corner")
chat.send_message("Make the overall style more modern and remove the decorative lace")

Scenario 2: Multi-turn Editing Based on Reference Images

Nano Banana Pro supports up to 14 reference images in a single request. Combined with multi-turn conversations, you can build powerful image fusion workflows:

# Upload a portrait + a clothing reference image
chat.send_message([
    "Dress the person in the first image with the clothing from the second image",
    {"inline_data": {"mime_type": "image/png", "data": person_b64}},
    {"inline_data": {"mime_type": "image/png", "data": outfit_b64}}
])

# Subsequent fine-tuning
chat.send_message("Change the neckline to a V-neck")
chat.send_message("Change the background to a simple gray")

Scenario 3: Restoring History Across Sessions

If a user closes the page and reopens it, wanting to continue the previous conversation, you need to persist the contents array to a database:

import json

# Save
with open(f"sessions/{user_id}.json", "w") as f:
    json.dump(chat.get_history(), f)

# Restore
with open(f"sessions/{user_id}.json") as f:
    history = json.load(f)
restored_chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    history=history
)
restored_chat.send_message("Continuing from before, make the background a bit brighter")

Context Window Limits

Resource	Limit
Input Context	64K tokens
Output Context	32K tokens
Max Reference Images per Request	14
Recommended History Turns	No more than 8-10 turns
Max Single Image Resolution	2K (1K default)

Scenario Advice: When the conversation exceeds 8-10 turns, it's recommended to proactively "truncate" the earlier history or replace it with an LLM summary; otherwise, tokens will quickly approach the 64K limit. In production environments, be sure to include a token counter and make truncation decisions on the client side in advance.

FAQ

Q1: The API is stateless. How do I implement continuous conversation like the web version?

Since the API is stateless, your code must maintain a local contents array (or use the chat object in the SDK). You need to send the complete history—including user text, images generated by the model, and the thought_signature—back with every request so the model "remembers" the previous turns. The easiest way is to use the official Python SDK's client.chats.create(), which handles this management for you automatically.

Q2: What fields should I pass for images generated in the previous turn?

You need to include the image in the parts array of the "previous model turn" using the inline_data format (base64 encoded + mime_type). Also, make sure to include the thought_signature returned by the model. If you're using an OpenAI-compatible interface like APIYI (apiyi.com), the platform will automatically handle these field mappings, so you only need to maintain a standard messages list.

Q3: Is the thoughtSignature mandatory? What happens if I don’t pass it?

It's highly recommended to include it. If you don't, the model might "forget" key decisions from the previous turn (such as style, color palette, or composition) during multi-turn editing, causing it to act as if it's generating from scratch every time. Official documentation clearly states that the signature must be preserved in multi-turn scenarios. While the SDK handles this automatically, you'll need to manually append the full model parts back into the contents when using REST mode.

Q4: What should I do if the history gets too long? Will it error out if it exceeds 64K tokens?

Yes, input exceeding 64K tokens will be rejected. Common optimization strategies include:

Truncation: Keep only the most recent 4-6 turns of history.
Image Downsampling: Pass historical images at 1K resolution instead of 2K.
Summarization: Use an LLM to compress earlier turns into a short text description.
Segmented Sessions: Start a new session proactively when the conversation topic changes.

Q5: How can I quickly test the multi-turn image generation performance of Nano Banana Pro?

We recommend using an aggregation platform that supports Gemini models, such as APIYI (apiyi.com), for quick verification:

Register an account to get your API key and free credits.
Select the gemini-3-pro-image-preview model.
Use the Python SDK example code from this article to initiate 3-5 consecutive editing turns.
Compare the coherence of the output from each turn to see if it meets your business requirements.

Summary

Key takeaways for the Nano Banana Pro multi-turn conversation and image generation API:

Stateless Nature: The API doesn't store any history; the caller must maintain the contents array.
Role Alternation: User and model roles must strictly alternate, and each turn's parts can contain a mix of text, images, and signatures.
Image Feedback: Images generated in the previous turn must be passed back as inline_data, otherwise the model cannot "see" them.
Signature Mechanism: The thought_signature is crucial for multi-turn consistency and must be manually included in REST mode.
SDK Simplification: The official Python SDK's chat object automatically manages all the details mentioned above.

For developers looking to quickly implement a web-like experience, the best path is to use the official SDK's chat object or the OpenAI-compatible messages mode to avoid the complexity of manual REST construction.

We recommend accessing Nano Banana Pro's multi-turn image generation capabilities via APIYI (apiyi.com). The platform supports both native Gemini fields and OpenAI-compatible modes, offers free testing credits, and makes it easy to verify multi-turn editing effects while smoothly migrating existing projects.

📚 References

Gemini API Image Generation Official Documentation: The authoritative guide for multi-turn image generation.
- Link: ai.google.dev/gemini-api/docs/image-generation
- Description: Includes specifications for the contents field, along with complete examples for the Python SDK and REST API.
Gemini 3 Pro Image Preview Model Card: Details on model capabilities and limitations.
- Link: ai.google.dev/gemini-api/docs/models/gemini-3-pro-image-preview
- Description: Key parameters such as context window, resolution, and the number of allowed reference images.
Google AI Developers Forum – Multi-turn Nano Banana: Real-world community examples.
- Link: discuss.ai.google.dev/t/multi-turn-nano-banana-example
- Description: Best practices for multi-turn conversations discussed by active developers.
Vertex AI Gemini 3 Pro Image Documentation: Reference for enterprise-grade deployment.
- Link: docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image
- Description: Covers advanced usage, including thought_signature and file_data references.
APIYI Nano Banana Pro Integration Guide: A quick-start guide for developers in China.
- Link: help.apiyi.com
- Description: Includes examples for both OpenAI-compatible interfaces and native Gemini interfaces.

Author: APIYI Technical Team
Technical Discussion: Feel free to share any real-world challenges you've encountered with multi-turn image generation in the comments. For more Nano Banana Pro configuration tips, visit the APIYI documentation center at docs.apiyi.com.

Nano Banana Pro Multi-turn Dialogue Image Generation API Practice: 3 Steps to Build Contextual Image Generation

title: "Mastering Multi-turn Image Generation with Nano Banana Pro: A Developer's Guide"
description: "Learn how to implement multi-turn image generation with Nano Banana Pro. Master the contents array, thoughtSignature mechanism, and state management for seamless workflows."

Nano Banana Pro Multi-turn Image Generation: Key Takeaways

The Essential Difference Between Multi-turn Generation and Web-based Agents

Nano Banana Pro Multi-turn Image Generation: Field Structure Explained

Core Specifications for the `contents` Array

Two Ways to Pass Back Images

Nano Banana Pro Multi-turn Image Generation: Quick Start

Minimalist Example (Automatic Management with Python SDK)

Nano Banana Pro Multi-turn Image Generation via Manual REST

Pure REST Implementation Without SDK Dependencies

Comparison of Three Invocation Methods

Nano Banana Pro Multi-turn Image Generation: The thoughtSignature Mechanism

What is thoughtSignature?

When must you include the signature?

Practical Guide: Code Pattern for Manual Signature Management

Nano Banana Pro Multi-turn Image Generation: Practical Scenarios

Scenario 1: Progressive Brand Design

Scenario 2: Multi-turn Editing Based on Reference Images

Scenario 3: Restoring History Across Sessions

Context Window Limits

FAQ

Summary

📚 References

Complete Guide to the Deprecation of the temperature Parameter in Claude Opus 4.7: Fixes for 2 Common 400 Errors

GPT-image-2 vs GPT-image-1.5: A Comprehensive Analysis of 8 Major Upgrades: What Has OpenAI Improved in Its Next-Generation Image Model?

Identify 4 low-cost application scenarios for the first generation of Nano Banana: the practical value of gemini-2.5-flash-image beyond Pro and the second generation

Nano Banana 2 Lite Released: A Complete Analysis of the Lightweight Image Model with 4-Second Image Generation and Reduced Costs

Master the Latest Pricing for Nano Banana 2: Pay-Per-Use at $0.045 or Volume-Based Discounts as Low as 30% Off Official Rates, A Complete Breakdown of Both Billing Plans

Fixing the Nano Banana Pro original image return issue: 5 major diagnostic reasons + 8 practical repair solutions

title: "Mastering Multi-turn Image Generation with Nano Banana Pro: A Developer's Guide" description: "Learn how to implement multi-turn image generation with Nano Banana Pro. Master the contents array, thoughtSignature mechanism, and state management for seamless workflows."

Nano Banana Pro Multi-turn Image Generation: Key Takeaways

The Essential Difference Between Multi-turn Generation and Web-based Agents

Nano Banana Pro Multi-turn Image Generation: Field Structure Explained

Core Specifications for the contents Array

Two Ways to Pass Back Images

Nano Banana Pro Multi-turn Image Generation: Quick Start

Minimalist Example (Automatic Management with Python SDK)

Nano Banana Pro Multi-turn Image Generation via Manual REST

Pure REST Implementation Without SDK Dependencies

Comparison of Three Invocation Methods

Nano Banana Pro Multi-turn Image Generation: The thoughtSignature Mechanism

What is thoughtSignature?

When must you include the signature?

Practical Guide: Code Pattern for Manual Signature Management

Nano Banana Pro Multi-turn Image Generation: Practical Scenarios

Scenario 1: Progressive Brand Design

Scenario 2: Multi-turn Editing Based on Reference Images

Scenario 3: Restoring History Across Sessions

Context Window Limits

FAQ

Summary

📚 References

Similar Posts

title: "Mastering Multi-turn Image Generation with Nano Banana Pro: A Developer's Guide"
description: "Learn how to implement multi-turn image generation with Nano Banana Pro. Master the contents array, thoughtSignature mechanism, and state management for seamless workflows."

Core Specifications for the `contents` Array