|

How to set Gemini 3.1 Pro thinking level? 3-level control + Deep Think Mini complete configuration tutorial

Gemini 3.1 Pro Preview has added a medium thinking level, which is one of the biggest differences compared to the previous Gemini 3 Pro. You can now precisely control the model's reasoning depth across three levels—low, medium, and high—and the high mode even activates the Deep Think Mini capabilities.

Core Value: By the end of this post, you'll master the full configuration of the thinkingLevel parameter and learn how to strike the perfect balance between quality, speed, and cost.

gemini-3-1-pro-preview-thinking-level-control-guide-en 图示


Gemini 3.1 Pro Thinking Level Support Matrix

Let's look at the big picture: different Gemini models support different thinking levels.

Thinking Level Gemini 3.1 Pro Gemini 3 Pro Gemini 3 Flash Description
minimal ❌ Not supported ❌ Not supported ✅ Supported Near-zero thinking; only Flash supports this.
low ✅ Supported ✅ Supported ✅ Supported Fast response, lowest cost.
medium New Support ❌ Not supported ✅ Supported Balanced reasoning; the core upgrade for 3.1 Pro.
high ✅ Supported (Default) ✅ Supported (Default) ✅ Supported (Default) Deepest reasoning; activates Deep Think Mini.

Key Changes: Thinking Level Upgrades from 3 Pro to 3.1 Pro

Comparison Gemini 3 Pro Gemini 3.1 Pro
Available Levels low, high (2 levels) low, medium, high (3 levels)
Default Level high high
Meaning of High Mode Deep reasoning Deep Think Mini (More powerful)
Can thinking be disabled? No No

Core Insight: Gemini 3 Pro's high reasoning depth is roughly equivalent to Gemini 3.1 Pro's medium. Meanwhile, 3.1 Pro's high mode is the brand-new Deep Think Mini, which offers reasoning depth far exceeding the previous generation.

🎯 Migration Tip: If you've been using Gemini 3 Pro's high mode, we recommend starting with medium when you switch to 3.1 Pro to maintain similar quality and cost. Only bump it up to high when you really need that deep reasoning power. APIYI (apiyi.com) supports all Gemini models and thinking levels.


Gemini 3.1 Pro Thinking Level API Setup

Calling via APIYI (OpenAI Compatible Format)

import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"  # APIYI Unified Interface
)

# LOW Mode: Fast response
response_low = client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[{"role": "user", "content": "Translate this English text to Chinese: Hello World"}],
    extra_body={
        "thinking": {"type": "enabled", "budget_tokens": 1024}
    }
)

# MEDIUM Mode: Balanced reasoning (New!)
response_med = client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[{"role": "user", "content": "Review this code for potential memory leak risks"}],
    extra_body={
        "thinking": {"type": "enabled", "budget_tokens": 8192}
    }
)

# HIGH Mode: Deep Think Mini
response_high = client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[{"role": "user", "content": "Prove: For all positive integers n, n^3-n is divisible by 6"}],
    extra_body={
        "thinking": {"type": "enabled", "budget_tokens": 32768}
    }
)

Calling via Native Google SDK

from google import genai
from google.genai import types

client = genai.Client()

# Using the thinkingLevel parameter
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Your prompt",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level="MEDIUM"  # "LOW" / "MEDIUM" / "HIGH"
        )
    ),
)

# Check thinking token consumption
print(f"Thinking tokens: {response.usage_metadata.thoughts_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")

REST API Call

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-pro-preview:generateContent

{
  "contents": [{"parts": [{"text": "Your prompt"}]}],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingLevel": "MEDIUM"
    }
  }
}

⚠️ Important Reminder: thinkingLevel and thinkingBudget cannot be used at the same time, otherwise you'll get a 400 error. We recommend using thinkingLevel for Gemini 3+ models, while Gemini 2.5 models use thinkingBudget.


Detailed Comparison of Gemini 3.1 Pro's 3 Thinking Levels

LOW: Fastest and Cheapest

Dimension Details
Reasoning Depth Minimal thinking tokens, but still outperforms non-thinking models
Response Speed Seconds (Fastest)
Cost Lowest (Fewer thinking tokens → fewer output tokens → lower cost)
Best For Autocomplete, classification, structured data extraction, simple translation, summarization
Not Suitable For Complex reasoning, mathematical proofs, multi-step debugging

MEDIUM: The Balanced Choice (New)

Dimension Details
Reasoning Depth Moderate thinking tokens, roughly equivalent to the "High" level in the old 3.0 Pro
Response Speed Moderate latency
Cost Medium
Best For Code reviews, document analysis, daily coding, standard API calls, Q&A
Not Suitable For IMO-level math, extremely complex multi-step reasoning

HIGH: Deep Think Mini (Default)

Dimension Details
Reasoning Depth Maximized reasoning, activates full Deep Think Mini capabilities
Response Speed May take several minutes (around 8 minutes for IMO problems)
Cost Highest (Large volume of thinking tokens billed at output rates)
Best For Complex debugging, algorithm design, mathematical proofs, research tasks, Agent workflows
Special Ability Thought signatures maintain reasoning continuity across API calls

gemini-3-1-pro-preview-thinking-level-control-guide-en 图示


Gemini 3.1 Pro Thinking Token Billing Rules

Understanding how billing works is key to choosing the right thinking level for your needs.

Core Billing Principles

Billing Item Description
Are thinking tokens billed? Yes, they're billed at the same price as output tokens.
Output token price $12.00 / 1M tokens (includes thinking tokens).
Billing basis Billed based on the full internal reasoning chain, not just the summary.
Thinking summary The API only returns a thinking summary, but you're billed for the total number of thinking tokens generated.

Official explanation from Google:

"Thinking models generate full thoughts to improve the quality of the final response, and then output summaries to provide insight into the thought process. Pricing is based on the full thought tokens the model needs to generate to create a summary, despite only the summary being output from the API."

Cost Estimates for the Three Levels

Level Estimated Thinking Tokens Per 1,000 Calls Monthly Cost Trend
LOW ~500-2K / call $6-24 Lowest
MEDIUM ~2K-8K / call $24-96 Medium
HIGH ~8K-32K+ / call $96-384+ Higher; more for complex tasks

💰 Cost Optimization: Not every request needs HIGH. By setting 80% of daily tasks to LOW or MEDIUM and only using HIGH for the 20% of truly complex tasks, you can slash your API spend by 50-70%. You can easily configure this through the APIYI (apiyi.com) platform.


Task Types and Gemini 3.1 Pro Thinking Level Matching Guide

Detailed Scenario Recommendations

Task Type Recommended Level Reason Expected Latency
Simple Translation LOW No reasoning required <5 seconds
Text Classification LOW Pattern matching task <5 seconds
Summary Extraction LOW Information compression, not reasoning <10 seconds
Auto-completion LOW Latency sensitive <3 seconds
Code Review MEDIUM Requires moderate analysis 10-30 seconds
Document Q&A MEDIUM Understanding + Answering 10-30 seconds
Daily Coding MEDIUM Standard code generation 15-40 seconds
Bug Analysis MEDIUM Medium complexity reasoning 20-40 seconds
Complex Debugging HIGH Multi-step reasoning chain 1-5 minutes
Math Proof HIGH Deep Think Mini 3-8 minutes
Algorithm Design HIGH Deep reasoning 2-5 minutes
Research Analysis HIGH Multi-dimensional deep analysis 2-5 minutes
Agent Workflow HIGH Thinking signatures maintain continuity Depends on task

Dynamic Level Selection: Best Practice Code

import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"  # APIYI unified interface
)

# Automatically select thinking level based on task type
THINKING_CONFIG = {
    "simple":  {"type": "enabled", "budget_tokens": 1024},   # LOW
    "medium":  {"type": "enabled", "budget_tokens": 8192},   # MEDIUM
    "complex": {"type": "enabled", "budget_tokens": 32768},  # HIGH
}

def smart_think(prompt, complexity="medium"):
    """Automatically set thinking level based on task complexity"""
    return client.chat.completions.create(
        model="gemini-3.1-pro-preview",
        messages=[{"role": "user", "content": prompt}],
        extra_body={"thinking": THINKING_CONFIG[complexity]}
    )

# Simple translation → LOW
resp1 = smart_think("Translate: Good morning", "simple")

# Code review → MEDIUM
resp2 = smart_think("Review the security of this code: ...", "medium")

# Math proof → HIGH (Deep Think Mini)
resp3 = smart_think("Prove a specific case of the Riemann Hypothesis", "complex")

Gemini 3.1 Pro vs 3 Pro: Evolution of Thinking Levels

gemini-3-1-pro-preview-thinking-level-control-guide-en 图示


Where Deep Think Mini Truly Shines

The Deep Think Mini, activated via the HIGH mode in Gemini 3.1 Pro, is the absolute standout feature of this upgrade.

What is Deep Think Mini?

Deep Think Mini isn't a standalone model. Instead, it's a special reasoning mode within Gemini 3.1 Pro that's triggered when you set the thinking level to HIGH. Google describes it as a "mini version of Gemini Deep Think"—where Deep Think is Google's heavy-duty reasoning specialist (boasting an ARC-AGI-2 score of 84.6%).

Deep Think Mini Performance Benchmarks

Test Item Deep Think Mini (3.1 Pro HIGH) Gemini 3 Pro HIGH Improvement
ARC-AGI-2 77.1% 31.1% +148%
IMO Math Problems Solved in ~8 mins Failed to solve From "Impossible" to "Possible"
Complex Planning Tasks Benchmark up 40-60% Compared to Gemini 2.5 Pro Significant improvement

Thought Signatures

Deep Think Mini introduces a unique technology called thought signatures. These are encrypted, tamper-proof representations of intermediate reasoning states.

In Agent workflows, a model's reasoning often spans multiple API calls. Thought signatures allow the reasoning context from a previous call to be passed seamlessly to the next, maintaining reasoning continuity. This is a game-changer for multi-step Agent tasks.

When is Deep Think Mini Worth It?

Worth using HIGH (Deep Think Mini) Not worth using HIGH
Competition-level math reasoning Simple arithmetic
Complex cross-file bug debugging Syntax error fixes
Algorithm design and optimization CRUD code generation
Academic paper methodology analysis Article summarization
Multi-step Agent long tasks Single-turn Q&A
Deep security vulnerability analysis Format conversion

💡 Pro Tip: Deep Think Mini's power comes at a price—both latency and costs are high. I'd recommend using HIGH only for tasks that truly require "deep thinking"; MEDIUM is plenty for daily tasks. You can flexibly switch between them at the request level via APIYI (apiyi.com).


thinkingLevel vs thinkingBudget: Don't Mix Them Up

Google uses two different parameters to control reasoning, depending on the model series:

Parameter Applicable Models Value Type Description
thinkingLevel Gemini 3+ (3 Flash, 3 Pro, 3.1 Pro) Enum: MINIMAL/LOW/MEDIUM/HIGH Recommended for the Gemini 3 series
thinkingBudget Gemini 2.5 (Pro, Flash, Flash Lite) Integer: 0-32768 Applicable to the 2.5 series

⚠️ You can't use both parameters at the same time! Sending both will return a 400 error.

Scenario Correct Approach Incorrect Approach
Calling Gemini 3.1 Pro Use thinkingLevel: "MEDIUM" Use thinkingBudget: 8192
Calling Gemini 2.5 Pro Use thinkingBudget: 8192 Use thinkingLevel: "MEDIUM"
Passing both parameters 400 Error ❌

🎯 Quick Tip: Gemini 3 series → thinkingLevel (string levels), Gemini 2.5 series → thinkingBudget (numeric token count). APIYI (apiyi.com) supports both parameter formats.


FAQ

Q1: What is the default level if I don’t set thinkingLevel?

The default is HIGH. This means if you don't actively set it, every call will use the full reasoning power of Deep Think Mini, consuming the maximum amount of thinking tokens. We recommend setting an appropriate level based on your actual task needs to save on costs. You can flexibly control this at the request level via APIYI (apiyi.com).

Q2: How are thinking tokens billed? Are they expensive?

Thinking tokens are billed at the same price as output tokens ($12.00 / 1M tokens). In HIGH mode, a complex request might consume over 30,000 thinking tokens, costing about $0.36. Meanwhile, the same request in LOW mode might only consume 1,000 thinking tokens, costing about $0.012. That's a potential 30x difference in cost.

Q3: Is 3.1 Pro’s MEDIUM the same as 3.0 Pro’s HIGH?

They're essentially equivalent. Google describes 3.1 Pro's MEDIUM as providing "balanced thinking suitable for most tasks," which aligns with the positioning of 3.0 Pro's HIGH. If you're migrating from 3.0 Pro to 3.1 Pro, changing HIGH to MEDIUM will help you maintain similar quality and costs. You can use APIYI (apiyi.com) to call both versions simultaneously for comparison.

Q4: Can I turn off the thinking feature?

You cannot completely disable thinking in Gemini 3.1 Pro. The lowest setting is LOW, which still performs basic reasoning. If you need a response with absolutely no thinking involved, consider using the MINIMAL mode of Gemini 3 Flash.


Common Misconceptions About Gemini 3.1 Pro Thinking Levels

Misconception The Reality
"HIGH level has the best quality, so I should use it all the time." For simple tasks, HIGH's quality is nearly identical to MEDIUM, but it costs 5-10x more.
"Reasoning capability at the LOW level is terrible." LOW is still superior to models that don't think at all; it just uses fewer thinking tokens.
"MEDIUM is a new feature and might be unstable." MEDIUM's reasoning depth is roughly equivalent to the old 3.0 Pro's HIGH and has been thoroughly validated.
"Thinking tokens aren't billed." They are! They're billed at the same rate as output tokens ($12/MTok).
"I can turn off thinking in 3.1 Pro." You can't. The lowest setting is LOW, which still performs basic reasoning.
"I can use thinkingLevel and thinkingBudget together." Nope! Using both at the same time will trigger a 400 error.
"Higher levels just have a bit more latency before results return." HIGH mode might take several minutes before it even starts responding—it's not just a slight delay.

Summary: Gemini 3.1 Pro Thinking Level Quick Reference

Level In a Nutshell Best For Relative Cost
LOW Fastest & Cheapest Translation, classification, summarization, completion 1x
MEDIUM The Balanced Choice (New) Coding, code reviews, analysis, Q&A 2-3x
HIGH Deep Think Mini Math, debugging, research, Agents 5-10x+

Core Recommendations:

  1. Use MEDIUM for daily development — It offers great quality at a reasonable cost and is equivalent to the old version's HIGH.
  2. Use LOW for simple tasks — You'll save over 70% on thinking token costs.
  3. Use HIGH for deep reasoning — Its "Deep Think Mini" capabilities are unique, but keep an eye on the cost.
  4. HIGH is the default — If you don't set a level, it defaults to the most expensive mode, so remember to adjust it manually.

We recommend dynamically switching thinking levels based on your task type via the APIYI (apiyi.com) platform to achieve the perfect balance between quality and cost.


References

  1. Google AI Documentation: Gemini Thinking Configuration Guide

    • Link: ai.google.dev/gemini-api/docs/thinking
    • Description: Full documentation for the thinkingLevel parameter.
  2. Google AI Documentation: Gemini 3.1 Pro Model Page

    • Link: ai.google.dev/gemini-api/docs/models/gemini-3.1-pro-preview
    • Description: Thinking level support matrix and key considerations.
  3. Gemini API Pricing Page: Thinking Token Billing

    • Link: ai.google.dev/gemini-api/docs/pricing
    • Description: Explains how thinking tokens are billed at the same rate as output tokens.
  4. VentureBeat: Deep Think Mini Deep Dive

    • Link: venturebeat.com/technology/google-gemini-3-1-pro-first-impressions
    • Description: Real-world test data showing an IMO problem solved in 8 minutes.
  5. Google Official Blog: Gemini 3.1 Pro Launch Announcement

    • Link: blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro
    • Description: Official introduction to the three-tier thinking system and Deep Think Mini.

📝 Author: APIYI Team | For technical discussions, visit APIYI at apiyi.com
📅 Updated: February 20, 2026
🏷️ Keywords: Gemini 3.1 Pro thinking levels, thinkingLevel, Deep Think Mini, LOW MEDIUM HIGH, API calls, reasoning control

Similar Posts