|

5-Step Complete Tutorial for Accessing Nano Banana Pro Image API Using OpenClaw

Many developers want to enable OpenClaw AI Agents to automatically generate images. This article will show you how to create a custom Skill for OpenClaw in 5 steps to connect the Nano Banana Pro image generation API, achieving a complete workflow from chat commands to automated image output.

Core Value: By the end of this article, you'll master how to create custom Skills for OpenClaw and learn how to connect to Nano Banana Pro via the APIYI platform for text-to-image, image editing, and batch processing, giving your AI Agent professional-grade image generation capabilities.

openclaw-nano-banana-pro-image-api-tutorial-en 图示


Overview of Nano Banana Pro Image API Core Capabilities

Before we dive into the integration, let's look at what Nano Banana Pro can bring to your OpenClaw Agent.

Nano Banana Pro (officially Gemini 3 Pro Image Preview) is a professional-grade AI image generation model from Google DeepMind. Built on Gemini 3 Pro, it's widely regarded as "the best model for generating images with correctly rendered and clearly readable text."

Nano Banana Pro Technical Parameters

Capability Parameter Description
Text Rendering Error rate < 10% Industry-leading accuracy for single-line text
Output Resolution 1K / 2K / 4K Native high resolution, no post-processing needed
Reference Images Up to 14 images Supports multi-image synthesis and character consistency
Character Consistency Up to 5 characters Face consistency across different scenes
Thinking Mode Thinking Process Visualization of complex composition reasoning
Search Grounding Google Search Visual generation based on real-time data

🎯 Access Advice: The official Nano Banana Pro requires a Google Cloud account with an overseas credit card, which can be a high barrier. We recommend using the APIYI (apiyi.com) platform. It offers flat pricing at $0.05/image (about 80% off the official 4K price), supports Alipay/WeChat Pay, and doesn't require an overseas card.


Complete Architecture for Integrating OpenClaw Skill with Nano Banana Pro

Understanding the overall architecture will make the rest of the configuration a breeze.

openclaw-nano-banana-pro-image-api-tutorial-en 图示

Why Choose Integration via APIYI

Comparison Google Official Direct Via APIYI (apiyi.com)
4K Unit Price $0.24/image $0.05/image (80% off)
Payment Method Overseas Credit Card Alipay/WeChat Pay/USDT
API Format Google Native SDK Gemini Native Format Compatible
Network Requirements Requires VPN/Proxy Direct Connection from Mainland China
Concurrency Limits Subject to Quota Limits Unlimited Concurrency, RPM 2000+
Enterprise Discounts None Up to 20% Bonus on Top-ups

Step 1: Create the OpenClaw Skill Directory Structure

# Create the Skill directory structure
mkdir -p ~/.openclaw/skills/nano-banana-pro/scripts
cd ~/.openclaw/skills/nano-banana-pro
touch SKILL.md scripts/generate.py scripts/edit.py

Step 2: Writing the SKILL.md Definition File

SKILL.md is the core file OpenClaw uses to recognize and invoke your Skill. It's made up of YAML front matter and Markdown instructions.

---
name: nano-banana-pro
description: Generate and edit images using Nano Banana Pro (Gemini 3 Pro Image) via APIYI platform. Supports text-to-image, image editing, multi-image composition, and batch processing.
version: 1.0.0
metadata:
  openclaw:
    requires:
      env:
        - APIYI_API_KEY
      bins:
        - python3
    primaryEnv: APIYI_API_KEY
emoji: "🎨"
---

# Nano Banana Pro Image Generation Skill


## Features
- **Text-to-Image**: Generate high-quality images (1K/2K/4K) based on text descriptions.
- **Image Editing**: Modify, composite, and transform the style of existing images.
- **Batch Processing**: Supports batch generation and batch editing.

How to Use

Generate Image

exec python3 scripts/generate.py --prompt "description" --aspect-ratio "16:9" --resolution "2K"

Edit Image

exec python3 scripts/edit.py --instruction "editing instruction" --image-url "image URL"

Important Notes

  • Uses Gemini native API format: /v1beta/models/{model}:generateContent
  • 4K images take about 60 seconds; we recommend setting exec timeout=120.
  • For best results, we recommend using English prompts.

💡 Dev Tip: Once primaryEnv: APIYI_API_KEY is declared, OpenClaw automatically injects the API key into the Skill's runtime environment. Your script can read it directly via environment variables—no hardcoding required!


Step 3: Writing the text-to-image script generate.py

This is the core script. It handles parameters passed from OpenClaw and calls the Nano Banana Pro API to generate images.

Quick curl Test (Gemini Native Format)

Start with a quick curl command to verify the API connection. It's the fastest way to test things out:

curl -s -X POST \
  "https://api.apiyi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "Authorization: Bearer sk-YOUR_APIYI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
      ]
    }],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
img_data = data['candidates'][0]['content']['parts'][0]['inlineData']['data']
sys.stdout.buffer.write(base64.b64decode(img_data))
" > gemini-native-image.png

Once successful, a gemini-native-image.png file will be generated in your current directory.

Python Script (Gemini Native Format)

#!/usr/bin/env python3
"""Nano Banana Pro Text-to-Image Script - OpenClaw Skill (Gemini Native Format)"""
import os, json, base64, argparse, requests
from datetime import datetime

API_KEY = os.environ.get("APIYI_API_KEY", "")
API_BASE = "https://api.apiyi.com/v1beta/models"  # APIYI Gemini native interface

def generate_image(prompt, aspect_ratio="16:9", resolution="2K"):
    url = f"{API_BASE}/gemini-3-pro-image-preview:generateContent"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "contents": [{"parts": [{"text": prompt}]}],
        "generationConfig": {
            "responseModalities": ["IMAGE"],
            "imageConfig": {
                "aspectRatio": aspect_ratio,
                "imageSize": resolution
            }
        }
    }
    response = requests.post(url, headers=headers, json=data, timeout=120)
    response.raise_for_status()
    result = response.json()

    # Gemini native format: Extract inlineData from candidates
    parts = result["candidates"][0]["content"]["parts"]
    for part in parts:
        if "inlineData" in part:
            img_bytes = base64.b64decode(part["inlineData"]["data"])
            filename = f"nano_banana_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
            with open(filename, "wb") as f:
                f.write(img_bytes)
            print(f"Image generated: {filename} ({len(img_bytes)/1024:.1f} KB)")
            return filename
    print("Failed to extract image data")
    return None

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--prompt", required=True)
    parser.add_argument("--aspect-ratio", default="16:9")
    parser.add_argument("--resolution", default="2K", help="Resolution: 1K/2K/4K")
    args = parser.parse_args()
    generate_image(args.prompt, args.aspect_ratio, args.resolution)

🚀 Quick Start: The code above uses the Gemini native format interface from APIYI (apiyi.com). The endpoint is /v1beta/models/{model}:generateContent, and the response structure is identical to the official Google API. Just create an API key in the APIYI console at api.apiyi.com to get started.


Step 4: Writing the Image Editing Script edit.py

Nano Banana Pro doesn't just generate images; it also supports editing existing ones.

#!/usr/bin/env python3
"""Nano Banana Pro Image Editing Script - OpenClaw Skill (Gemini Native Format)"""
import os, json, base64, argparse, requests
from datetime import datetime

API_KEY = os.environ.get("APIYI_API_KEY", "")
API_BASE = "https://api.apiyi.com/v1beta/models"

def edit_image(instruction, image_url, extra_images=None):
    url = f"{API_BASE}/gemini-3-pro-image-preview:generateContent"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

    # Build multimodal parts in Gemini native format
    parts = [{"text": instruction}]
    # Convert image URL to base64 inline data
    img_resp = requests.get(image_url, timeout=30)
    parts.append({"inlineData": {"mimeType": "image/png",
                                  "data": base64.b64encode(img_resp.content).decode()}})
    if extra_images:  # Support multi-image synthesis (up to 14 reference images)
        for img_url in extra_images[:13]:
            r = requests.get(img_url, timeout=30)
            parts.append({"inlineData": {"mimeType": "image/png",
                                          "data": base64.b64encode(r.content).decode()}})

    data = {
        "contents": [{"parts": parts}],
        "generationConfig": {"responseModalities": ["IMAGE"]}
    }
    response = requests.post(url, headers=headers, json=data, timeout=120)
    response.raise_for_status()
    result_parts = response.json()["candidates"][0]["content"]["parts"]
    for part in result_parts:
        if "inlineData" in part:
            filename = f"edited_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
            with open(filename, "wb") as f:
                f.write(base64.b64decode(part["inlineData"]["data"]))
            print(f"Editing complete: {filename}")
            return filename
    return None

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--instruction", required=True)
    parser.add_argument("--image-url", required=True)
    parser.add_argument("--extra-images", nargs="*")
    args = parser.parse_args()
    edit_image(args.instruction, args.image_url, args.extra_images)

Image Editing Capabilities Summary

Editing Type Prompt Example Processing Time
Background Replacement "Remove background and add sunset" ~10s
Style Transfer "Convert to anime style" ~15s
Element Addition "Add sunglasses to the person" ~10s
Multi-image Synthesis "Combine these logos and add…" ~20s
Text Addition "Add title text: Hello World" ~10s
Color Adjustment "Make the image warmer, increase contrast" ~8s

Step 5: Configuring OpenClaw and Testing

Configuring the API Key

Add the Skill configuration to your OpenClaw config file at ~/.openclaw/openclaw.json:

{
  "skills": {
    "entries": {
      "nano-banana-pro": {
        "enabled": true,
        "apiKey": "sk-your-APIYI-key"
      }
    }
  }
}

openclaw-nano-banana-pro-image-api-tutorial-en 图示

Testing the Skill

Send a test prompt in your messaging platform to verify the functionality:

Generate a modern minimalist e-commerce product display image, white background, product centered.

Verify that the Skill is correctly loaded:

openclaw skills list | grep nano-banana
openclaw skills test nano-banana-pro

💰 Cost Optimization: Nano Banana Pro is priced at a flat $0.05 per image through APIYI (apiyi.com), regardless of resolution. If your project averages over 5,000 images a day, you can snag a top-up bonus of up to 20%, cutting your cost per image to just $0.04.


Nano Banana Pro API Cost vs. Performance Comparison

Choosing the right integration method is crucial for long-term use.

openclaw-nano-banana-pro-image-api-tutorial-en 图示

The SVG chart above illustrates the detailed pricing comparison and performance specs. Key data: Accessing Nano Banana Pro via APIYI brings the cost per 4K image down to just $0.05 (an 80% discount compared to the official $0.24 price). With top-up bonuses for high-volume users, this can drop as low as $0.04 per image. The platform handles 100,000+ images daily with no concurrency limits.


FAQ

Q1: What dependencies are required to integrate OpenClaw with Nano Banana Pro?

You only need Python 3 and the requests library. We use the Gemini native format interface (/v1beta/models/{model}:generateContent) provided by APIYI (apiyi.com) for model invocation, so there's no need to install Google's google-generativeai SDK. Just run pip install requests. You can even test it directly with a curl command without any dependencies at all.

Q2: What should I do if OpenClaw reports a timeout when generating 4K images?

Generating 4K images with Nano Banana Pro takes about 60 seconds. In the usage instructions within SKILL.md, we recommend using exec timeout=120 or exec timeout=180 when calling the script in OpenClaw to prevent the task from being interrupted by the default timeout.

Q3: Can I use other image generation models instead of Nano Banana Pro?

Absolutely. The APIYI (apiyi.com) platform supports multiple image generation models simultaneously. You just need to modify the model parameter in generate.py to switch. The API format remains identical, so no other code changes are necessary.

Q4: How can I ensure the security of my OpenClaw Skill?

There are three key measures: First, inject your API key via environment variables—don't hardcode it into your scripts. Second, only use Skills you've created yourself; avoid installing image generation skills from unknown sources on ClawHub (over 820 malicious skills have already been identified). Third, regularly check your OpenClaw logs to ensure there are no unauthorized calls.

Q5: What editing operations are supported, and how is the quality?

Nano Banana Pro supports background replacement, style transfer, adding/removing elements, multi-image composition (up to 14 reference images), and text rendering. Text rendering is its standout feature, with a single-line text error rate below 10%, which far outperforms DALL-E 3 and Midjourney. By using the APIYI platform for editing functions, it costs only $0.05 per call and typically finishes in about 10 seconds.


Summary

By following the 5 steps in this tutorial, you've mastered the complete process of creating a custom Skill for OpenClaw to access the Nano Banana Pro image API:

  1. Create the Skill Directory — A standard structure with SKILL.md + scripts/.
  2. Write SKILL.md — Declare dependencies, environment variables, and usage instructions.
  3. Write the Text-to-Image Script — Call Nano Banana Pro via the APIYI Gemini native format interface.
  4. Write the Editing Script — Support multi-image composition, background replacement, and style transfer.
  5. Configure and Test — Set your key in openclaw.json and verify the functionality.

We recommend using APIYI (apiyi.com) to access Nano Banana Pro. You'll enjoy flat pricing of $0.05 per image (which is 80% off the official 4K price) and a unified interface for Claude, GPT-4o, DeepSeek, and more. You can also try out the image generation for free at imagen.apiyi.com.


This article was written by the APIYI technical team, focusing on Large Language Model API integration practices and Agent development tutorials. For more technical guides, please visit the APIYI Help Center: help.apiyi.com

Similar Posts