|

Understanding the 5 Core Concepts of the LiteLLM Unified Gateway: A Must-Read AI Agent Infrastructure Guide for Beginners

Ever run into this headache? You're building a project that uses OpenAI's GPT, Anthropic's Claude, and Google's Gemini, but every model has a different SDK, a different API format, and even different error-handling logic. One model swap means rewriting half your codebase.

That’s exactly what LiteLLM solves. Simply put, LiteLLM is the "universal translator" for AI Large Language Models—you only need to learn one way to call them (the OpenAI format), and it handles the translation into the specific API formats for over 100 different model providers.

Core Value: By the end of this article, you'll understand what LiteLLM is, why AI Agent frameworks are all using it, and how to get started in under 5 minutes.

litellm-beginner-guide-unified-api-gateway-ai-agent-tutorial-en 图示

What is LiteLLM: 5 Core Concepts

Before you dive in, let’s break down the 5 core concepts of LiteLLM in plain English. Once you grasp these, everything else will fall into place.

Core Concept Simple Explanation Problem Solved
Unified Interface Call all models the same way No need to learn a new SDK for every model
Provider Model vendors like OpenAI, Anthropic, etc. Manages connections for different vendors
Fallback Automatically switch to model B if model A fails Ensures service continuity
Virtual Key Issue "sub-accounts" to team members Controls usage and budgets
Proxy A standalone API proxy service Allows any language or tool to connect

What pain points does LiteLLM solve?

Imagine a world without LiteLLM:

Calling OpenAI:

from openai import OpenAI
client = OpenAI(api_key="sk-xxx")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Calling Anthropic:

import anthropic
client = anthropic.Anthropic(api_key="sk-ant-xxx")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,  # Anthropic requires this
    messages=[{"role": "user", "content": "Hello"}]
)

Calling Google Gemini:

import google.generativeai as genai
genai.configure(api_key="AIza-xxx")
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Hello")

See the issue? Three models, three SDKs, three different ways to write code. If your project needs to support model switching, your code ends up littered with if provider == "openai"... elif provider == "anthropic"... conditional statements.

With LiteLLM:

import litellm

# Call OpenAI
response = litellm.completion(model="gpt-4o", messages=[{"role": "user", "content": "Hello"}])

# Call Anthropic — same syntax
response = litellm.completion(model="anthropic/claude-sonnet-4-6", messages=[{"role": "user", "content": "Hello"}])

# Call Gemini — still the same syntax
response = litellm.completion(model="gemini/gemini-2.0-flash", messages=[{"role": "user", "content": "Hello"}])

Just one litellm.completion() call, and you just swap the model parameter. LiteLLM handles the format conversion, parameter adaptation, and response standardization in the background.

🎯 Pro Tip: The unified interface philosophy of LiteLLM is similar to APIYI (apiyi.com)—both provide a single interface to call multiple models. The difference is that LiteLLM is an open-source, self-hosted solution, while APIYI is a managed service that requires no deployment. Choose the one that best fits your team's technical capabilities.

Understanding the Two Usage Modes of LiteLLM

LiteLLM offers two distinct usage modes, each tailored to different scenarios. Understanding the differences between these two is key to choosing the right approach for your project.

litellm-beginner-guide-unified-api-gateway-ai-agent-tutorial-en 图示

Mode 1: Python SDK (Lightweight)

Simply import the litellm package into your Python code and use it just like a standard function call.

Best for:

  • Individual developers
  • Pure Python projects
  • Rapid prototyping
  • Scenarios where team management features aren't needed

Installation:

pip install litellm

Basic Usage:

import litellm
import os

# Set API keys (via environment variables)
os.environ["OPENAI_API_KEY"] = "sk-your-key"
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-your-key"

# Invoke any model
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain what an API gateway is"}]
)

print(response.choices[0].message.content)

Mode 2: Proxy Server (Enterprise Gateway)

Runs as an independent server, exposing an OpenAI-compatible HTTP interface. Any programming language or tool capable of sending HTTP requests can use it.

Best for:

  • Team collaboration
  • Multi-language projects (Java, Go, Node.js, etc.)
  • Cost tracking and budget management
  • Assigning virtual keys to different teams
  • Integrating with AI Agent frameworks

Installation and Startup:

# Installation
pip install 'litellm[proxy]'

# Start using a config file
litellm --config config.yaml --port 4000

# Or use Docker
docker run -p 4000:4000 \
  -e OPENAI_API_KEY=sk-xxx \
  ghcr.io/berriai/litellm:main-latest

Once started, any application can call it just as if it were calling OpenAI:

from openai import OpenAI

# Point base_url to your LiteLLM Proxy
client = OpenAI(
    api_key="sk-your-virtual-key",
    base_url="http://localhost:4000/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Comparison: LiteLLM SDK vs. Proxy Mode

Feature Python SDK Proxy Server
Installation pip install litellm pip install 'litellm[proxy]' or Docker
Invocation Python function call HTTP API (any language)
Configuration Set in code config.yaml file
Virtual Key Management Not supported Supported, with budget limits
Web Dashboard None Included, for visual management
Team Management Not supported Supported (users/teams/budgets)
Cost Tracking Basic (code-level) Full (database persistence)
Deployment Complexity None Requires server maintenance
Target Audience Individual developers Teams/Enterprises

💡 Recommendation: If you're an individual developer building a prototype, the SDK mode can be up and running in 5 minutes. If you're working in a team or production environment, the Proxy mode is a better fit. Of course, if you'd rather avoid the hassle of deploying and maintaining your own server, you can also use a managed unified interface service like APIYI (apiyi.com) for an out-of-the-box experience.

LiteLLM Quick Start Guide

Here are the complete steps to get started with LiteLLM from scratch.

Getting Started with LiteLLM SDK Mode

Step 1: Installation

pip install litellm

Step 2: Set environment variables

# macOS / Linux
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"

# Windows
set OPENAI_API_KEY=sk-your-key

Step 3: Write your code

import litellm

# Basic invocation
response = litellm.completion(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a technical assistant"},
        {"role": "user", "content": "What is an LLM gateway?"}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)
print(f"Token usage: {response.usage.total_tokens}")
print(f"Estimated cost: ${response._hidden_params.get('response_cost', 'N/A')}")
View full code: With Fallback and streaming
import litellm
import os

os.environ["OPENAI_API_KEY"] = "sk-your-key"
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-your-key"

# Invocation with Fallback: Automatically switch to Claude if GPT-4o fails
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain RESTful API"}],
    fallbacks=["anthropic/claude-sonnet-4-6"],
    num_retries=2
)

# Streaming output
stream = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem about programming"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Getting Started with LiteLLM Proxy Mode

Step 1: Create a config.yaml configuration file

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-flash
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: os.environ/GEMINI_API_KEY

litellm_settings:
  drop_params: true
  num_retries: 3

general_settings:
  master_key: sk-my-master-key

Step 2: Start the Proxy

litellm --config config.yaml --port 4000

Step 3: Call using the standard OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="sk-my-master-key",
    base_url="http://localhost:4000/v1"
)

# Call GPT-4o (via LiteLLM Proxy)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

You can also call it directly using cURL:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-my-master-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

🚀 Quick Start: LiteLLM Proxy requires you to manage your own server and API keys. If you prefer a unified interface without the need for deployment, check out APIYI (apiyi.com). It supports OpenAI-compatible calls for 100+ models without requiring any infrastructure setup.

The Core Role of LiteLLM in AI Agents

This is a common question for newcomers: Why do almost all mainstream AI Agent frameworks support or even recommend using LiteLLM?

Why do AI Agents need LiteLLM?

When AI Agents perform tasks, they often need to:

  1. Invoke different models: Use cheap, small models for simple tasks and large models for complex reasoning.
  2. Automatic fallback: Automatically switch to a backup model if the primary model is rate-limited or down.
  3. Control costs: Track and limit token usage across multiple parallel agents.
  4. Team collaboration: Share API resource pools among different developers.

LiteLLM perfectly addresses these needs. It acts as a "scheduling center" between the Agent and the models.

Integration of LiteLLM with Mainstream AI Agent Frameworks

Agent Framework Integration Method Typical Usage
LangChain / LangGraph Built-in SDK support ChatLiteLLM as LLM backend
CrewAI Proxy connection Multi-Agent shared model resource pool
AutoGen (Microsoft) Proxy connection Access via OpenAI-compatible endpoint
Dify Custom Provider Configured as an OpenAI-compatible endpoint
Open WebUI Proxy connection Backend API endpoint
Aider Proxy connection Model layer for code generation agents
Continue.dev Proxy connection Backend for AI coding assistant in IDE

Typical Architecture of LiteLLM in Multi-Agent Systems

In a multi-agent system, the LiteLLM Proxy typically works like this:

  1. Planning Agent → Calls Claude Opus (strong reasoning model)
  2. Execution Agent → Calls GPT-4o (balanced performance)
  3. Validation Agent → Calls GPT-4o-mini (fast and low cost)
  4. Summary Agent → Calls Gemini Flash (large context window)

All agents call through the same LiteLLM Proxy endpoint, and the Proxy automatically routes to the correct backend model. Administrators can use the dashboard to centrally view token usage and costs for all agents.

litellm-beginner-guide-unified-api-gateway-ai-agent-tutorial-en 图示

🎯 Technical Advice: In production multi-agent systems, LiteLLM Proxy needs to be paired with PostgreSQL and Redis to fully utilize cost tracking and caching features. If your team is small or you prefer not to manage extra infrastructure, APIYI (apiyi.com) provides similar unified interface capabilities with built-in cost tracking and usage statistics, without the need to deploy additional databases.

Deep Dive into Advanced LiteLLM Features

Once you've mastered the basics, these three advanced features are essential for production environments.

Advanced Feature 1: Model Fallback

When your primary model hits rate limits, timeouts, or errors, LiteLLM automatically switches to a backup model, ensuring your service stays up and running.

Configuring Fallback in the SDK:

response = litellm.completion(
    model="gpt-4o",
    messages=messages,
    fallbacks=["anthropic/claude-sonnet-4-6", "gemini/gemini-2.0-flash"],
    num_retries=2
)

Execution logic: Try GPT-4o first → if it fails, try Claude Sonnet → if that fails, try Gemini Flash.

Configuring Fallback in the Proxy (config.yaml):

litellm_settings:
  fallbacks:
    - gpt-4o: [claude-sonnet, gemini-flash]
    - claude-sonnet: [gpt-4o, gemini-flash]

Advanced Feature 2: Load Balancing

You can configure multiple backend deployments for the same model name, and LiteLLM will automatically distribute requests across them.

model_list:
  # Two different backends for the same model name
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_KEY_1

  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-deployment
      api_key: os.environ/AZURE_KEY_1
      api_base: https://my-azure.openai.azure.com

router_settings:
  routing_strategy: least-busy  # Prioritize the least busy backend
  # Other strategies: simple-shuffle, latency-based

When calling the model, just specify model="gpt-4o", and LiteLLM will automatically balance traffic between your OpenAI direct connection and your Azure deployment.

Advanced Feature 3: Cost Tracking and Virtual Keys

Generating Virtual Keys (Proxy Mode):

curl http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-key" \
  -H "Content-Type: application/json" \
  -d '{
    "max_budget": 50.0,
    "budget_duration": "monthly",
    "models": ["gpt-4o", "claude-sonnet"],
    "metadata": {"user": "developer-01"}
  }'

This generates a virtual key with a $50 monthly budget, restricted to calling only GPT-4o and Claude Sonnet.

Cost Tracking:

LiteLLM includes built-in pricing tables for various models, automatically calculating costs for every model invocation. You can view these in the Proxy dashboard:

  • Total spend by model
  • Detailed spend by user/team
  • Spending trends over time
  • Token usage statistics

💰 Cost Optimization: LiteLLM's cost tracking helps you identify which model invocations are the most expensive. Combined with the pricing advantages of APIYI (apiyi.com), you can often secure better rates for the same model invocations, further reducing your AI application's operational costs.

Overview of 100+ Model Providers Supported by LiteLLM

LiteLLM supports a massive number of providers. Here are the most commonly used categories:

Category Provider Model Prefix Representative Models
Commercial LLMs OpenAI openai/ GPT-4o, GPT-4o-mini, o3
Anthropic anthropic/ Claude Opus 4, Sonnet 4, Haiku
Google gemini/ Gemini 2.0 Flash, Gemini 2.5 Pro
Cloud Platforms Azure OpenAI azure/ GPT series deployed on Azure
AWS Bedrock bedrock/ Claude/Llama hosted on Bedrock
Google Vertex AI vertex_ai/ Gemini hosted on Vertex
Inference Acceleration Groq groq/ Llama 3.1 70B (ultra-fast inference)
Together AI together_ai/ Various open-source models
Fireworks AI fireworks_ai/ High-performance inference
Local Deployment Ollama ollama/ Locally running Llama/Mistral
vLLM openai/ (custom base) Self-hosted inference engine
Domestic Models Deepseek deepseek/ Deepseek Chat/Coder
Search Enhanced Perplexity perplexity/ Sonar Pro
Aggregation Platforms OpenRouter openrouter/ Various models

🎯 Selection Advice: The right model depends on your specific use case. If you're unsure which one to use, you can quickly test different models on the APIYI (apiyi.com) platform, which also supports OpenAI-compatible API calls for most of the models listed above.

litellm-beginner-guide-unified-api-gateway-ai-agent-tutorial-en 图示

LiteLLM FAQ

Q1: What’s the difference between LiteLLM and using the OpenAI SDK directly?

The OpenAI SDK is limited to calling OpenAI models. LiteLLM extends the OpenAI SDK, allowing you to use the same code format to call 100+ model providers like Anthropic, Google, and Azure. If your project only uses OpenAI models, the OpenAI SDK is perfectly fine. However, if you need multi-model support, failover, or cost management, LiteLLM is the better choice.

Q2: Is LiteLLM free?

The core functionality of LiteLLM is completely open-source and free (MIT License). Keep in mind: while LiteLLM itself is free, the model APIs you call through it are not. You'll need to obtain your own API keys from official sources like OpenAI or Anthropic and pay for your model usage. If you don't want to manage multiple API keys separately, you can use a unified interface platform like APIYI (apiyi.com) to simplify key management.

Q3: What server configuration does the LiteLLM Proxy require?

The LiteLLM Proxy itself is very lightweight and can run on a server with 1 vCPU and 1GB of RAM. However, if you need full features (like cost tracking or virtual key management), you'll also need a PostgreSQL database and Redis. For production environments, we recommend at least 2 vCPUs, 4GB of RAM, plus PostgreSQL and Redis.

Q4: What’s the difference between LiteLLM and OpenRouter?

The biggest difference is that LiteLLM is an open-source self-hosted solution, while OpenRouter is a managed service.

  • LiteLLM: Free, self-hosted, you manage your own API keys, and you have full control over data flow.
  • OpenRouter: Ready to use out of the box, but it adds a markup to API call prices, and your data passes through a third party.

If you prioritize data privacy or have your own API keys, choose LiteLLM. If you want a zero-deployment, quick-start solution, consider a managed service like APIYI (apiyi.com).

Q5: Does LiteLLM support streaming?

Yes, it does. Whether you're using the SDK or the Proxy mode, LiteLLM fully supports SSE streaming. Streaming responses from all providers are normalized into the OpenAI chunk format, ensuring a consistent streaming experience.

# Streaming example
stream = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
Q6: Should a beginner choose the SDK mode or the Proxy mode?

If you're a Python developer just starting out, the SDK mode is the easiest way to go—pip install litellm and you're up and running in a few lines of code. Once you need team collaboration, multi-language support, or production deployment, you can migrate to the Proxy mode. Since the core invocation method is the same for both, the migration cost is very low.

Q7: Where should I put the LiteLLM config.yaml file?

There's no fixed location. Just specify the path using the --config parameter when starting the Proxy:

litellm --config /path/to/your/config.yaml

We generally recommend keeping it in your project root or a dedicated configuration directory. If you're deploying with Docker, you can mount it into the container using a volume.

LiteLLM Quick Decision Guide

Choose the best solution based on your specific needs:

Your Situation Recommended Solution Reason
Individual developer, Python project LiteLLM SDK Zero deployment, 5-minute setup
Team development, need budget control LiteLLM Proxy Virtual keys + cost tracking
Don't want to manage infrastructure APIYI (apiyi.com) Managed service, ready to use
Multi-agent system LiteLLM Proxy Unified routing + load balancing
Using only OpenAI models OpenAI SDK No extra layer needed
High priority on data privacy LiteLLM Self-hosted Data doesn't pass through third parties

Summary

LiteLLM is an incredibly practical piece of infrastructure for AI application development. Its core value can be summed up in one sentence: Use a single set of OpenAI-formatted code to call APIs from 100+ model providers.

For those just getting started, here are the key takeaways:

  1. LiteLLM is a "translator": It helps translate requests in a unified format into the specific API formats required by different model providers.
  2. Two modes: SDK (a lightweight Python package) and Proxy (a standalone gateway server).
  3. Core value: Unified interface + Fallback + Load balancing + Cost tracking.
  4. Standard for Agent frameworks: Almost all major frameworks like LangChain, CrewAI, and AutoGen support LiteLLM.
  5. Completely open-source and free: Released under the MIT license, there are no costs for self-hosting.

If you find the operational overhead of self-hosting a LiteLLM Proxy too high, you can also use managed unified interface services like APIYI (apiyi.com). These allow you to call all mainstream models with a single API key, saving you the burden of deployment and maintenance.


Author: APIYI Technical Team
Technical Support: Visit APIYI at apiyi.com for more tutorials and technical support on model invocation.
Last Updated: April 2026
Applicable Version: LiteLLM v1.x+


References:

  1. LiteLLM Official Documentation: docs.litellm.ai
  2. LiteLLM GitHub Repository: github.com/BerriAI/litellm
  3. LiteLLM Official Website: litellm.ai
  4. BerriAI Official Website: berri.ai

Similar Posts