Complete tutorial for configuring LiteLLM with a third-party API proxy service: 5 steps to connect to APIYI

How can you make LiteLLM orchestrate multiple Large Language Models like OpenAI, Claude, Gemini, and DeepSeek simultaneously without getting blocked by overseas account, network, or payment issues? The answer is to connect LiteLLM to an OpenAI-compatible API proxy service. In this article, we’ll use LiteLLM + APIYI (apiyi.com) as an example to walk you through the configuration step-by-step.

Core Value: After reading this, you’ll master the 3 mainstream ways to configure LiteLLM with an API proxy service (SDK, Proxy YAML, and Environment Variables) and be able to complete your APIYI integration in under 5 minutes.

litellm-configure-third-party-api-relay-tutorial-apiyi-en 图示

Key Points for Configuring LiteLLM with an API Proxy Service

LiteLLM is an open-source LLM gateway/SDK designed to invoke 100+ Large Language Models using the OpenAI-compatible format. It natively supports any "OpenAI-compatible" endpoint; you simply need to point api_base to the proxy service and replace the api_key with the one issued by the proxy. APIYI (apiyi.com) is a standard OpenAI-compatible proxy, making the two a perfect match.

Key Point Description Value
OpenAI Compatibility LiteLLM routes to OpenAI clients via the openai/ prefix Connect to any proxy with one line of config
Three Config Methods SDK inline / Proxy YAML / Environment Variables Adapts to scripts, production, and CLI scenarios
Unified Model Naming openai/<provider-model> or custom model_name Upper-level code doesn't need to know about underlying changes
Error Troubleshooting base_url must end with /v1 90% of 404 errors stem from this
Fallback & Load Balancing YAML mode supports multi-channel and failover Maximizes availability in production environments

Detailed Breakdown of LiteLLM Proxy Configuration

The official LiteLLM documentation clearly states: As long as you add the openai/ prefix to the model name and specify the api_base, LiteLLM will use the OpenAI client to access your endpoint. This means that whether the proxy is connected to GPT-5, Claude Opus 4.6, Gemini 3 Pro, or DeepSeek, to LiteLLM, it's just "an OpenAI endpoint."

The base_url for APIYI (apiyi.com) is https://api.apiyi.com/v1, which follows standard /v1/chat/completions, /v1/embeddings, and /v1/images/generations specifications, making it perfectly compatible with LiteLLM without any patches.

litellm-configure-third-party-api-relay-tutorial-apiyi-en 图示

Getting Started with LiteLLM and Third-Party API Proxy Services

Preparation

Before we dive in, make sure you have the following ready:

  1. APIYI API Key: Register at apiyi.com and create a new key in the dashboard (we recommend naming it litellm-prod).
  2. base_url: https://api.apiyi.com/v1 (Note: the /v1 suffix is mandatory).
  3. Python Environment: Python 3.9+.
  4. Install Dependencies: pip install litellm.

Minimal Example: SDK Inline Configuration

The fastest way to get started is by passing the api_key and api_base directly in your code:

import litellm

response = litellm.completion(
    model="openai/gpt-5",                              # Key: openai/ prefix
    api_key="YOUR_APIYI_KEY",
    api_base="https://api.apiyi.com/v1",               # APIYI API proxy service address
    messages=[
        {"role": "user", "content": "Introduce LiteLLM in one sentence."}
    ],
)

print(response.choices[0].message.content)

💡 Tip: Once you've obtained test credits from the APIYI apiyi.com dashboard, you can swap gpt-5 for other model names like claude-opus-4-6 or gemini-3-pro without changing any other code—this is the true power of the OpenAI-compatible protocol.

View full runnable example (including error handling and streaming)
import os
import litellm
from litellm import completion

# We recommend managing keys via environment variables
os.environ["OPENAI_API_KEY"] = "YOUR_APIYI_KEY"
os.environ["OPENAI_API_BASE"] = "https://api.apiyi.com/v1"

litellm.set_verbose = False  # Set to True for debugging

def chat_with_apiyi(model: str, prompt: str, stream: bool = False):
    """Call any OpenAI-compatible model via LiteLLM + APIYI"""
    try:
        response = completion(
            model=f"openai/{model}",
            messages=[{"role": "user", "content": prompt}],
            stream=stream,
            temperature=0.7,
            max_tokens=1024,
        )
        if stream:
            for chunk in response:
                delta = chunk.choices[0].delta.content or ""
                print(delta, end="", flush=True)
            print()
        else:
            return response.choices[0].message.content
    except Exception as e:
        print(f"Call failed: {e}")
        return None

if __name__ == "__main__":
    # Non-streaming
    print(chat_with_apiyi("gpt-5", "Explain what an LLM gateway is."))
    # Streaming
    chat_with_apiyi("claude-opus-4-6", "Introduce the advantages of LiteLLM in 100 words.", stream=True)

Proxy YAML Configuration: Recommended for Production

If you're running LiteLLM as a standalone service (on port 4000 for team use), we recommend the YAML approach. Create a litellm_config.yaml file:

model_list:
  - model_name: gpt-5                       # Model name exposed to clients
    litellm_params:
      model: openai/gpt-5                   # openai/ prefix routes to OpenAI client
      api_base: https://api.apiyi.com/v1    # APIYI proxy address
      api_key: os.environ/APIYI_KEY         # Reference environment variable

  - model_name: claude-opus-4-6
    litellm_params:
      model: openai/claude-opus-4-6
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY

  - model_name: gemini-3-pro
    litellm_params:
      model: openai/gemini-3-pro
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY

litellm_settings:
  drop_params: true                          # Automatically drop unsupported parameters
  num_retries: 2                             # Retry count per call

router_settings:
  fallbacks:
    - gpt-5: ["claude-opus-4-6", "gemini-3-pro"]

Start the Proxy:

export APIYI_KEY=sk-xxxxxxxxxxxxxxxx
litellm --config ./litellm_config.yaml --port 4000

Now, any OpenAI SDK can call it via http://localhost:4000:

from openai import OpenAI

client = OpenAI(
    api_key="any-string",                    # LiteLLM Proxy doesn't validate content (unless master_key is set)
    base_url="http://localhost:4000",
)

resp = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello via LiteLLM Proxy"}]
)
print(resp.choices[0].message.content)

🎯 Production Tip: We suggest adding a master_key in front of the LiteLLM Proxy and routing all underlying models through APIYI apiyi.com. This way, your application layer only sees "semantic model names" like gpt-5 or claude-opus-4-6, while the underlying channels, billing, and rate limiting are handled by the APIYI + LiteLLM stack—making it completely transparent to your application.

Environment Variable Mode: Best for CLI and Scripts

For one-off scripts and command-line tools, the simplest way is to use environment variables. LiteLLM automatically detects OPENAI_API_KEY and OPENAI_API_BASE:

export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx
export OPENAI_API_BASE=https://api.apiyi.com/v1

From then on, all calls with the openai/ prefix will go through APIYI:

import litellm
print(litellm.completion(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "ping"}]
).choices[0].message.content)

Comparing the Three Ways to Configure LiteLLM with Third-Party Proxies

Different scenarios call for different configuration methods. The table below provides clear recommendations.

litellm-configure-third-party-api-relay-tutorial-apiyi-en 图示

Dimension SDK Inline Proxy YAML Environment Variables
Ease of Use ⭐ Easiest ⭐⭐⭐ Moderate ⭐ Easiest
Use Case Scripts, Notebooks Team sharing, Production CLI tools, CI
Independent Process No Yes (Port 4000) No
Multi-model Management Manual Centralized YAML Single set of creds
Fallback Support Manual try/except ✅ Built-in ❌ None
Key Security Risk of hardcoding ✅ Env reference ✅ Env-based
Recommendation Prototyping Production Personal scripts

💡 Recommendation: For personal development, environment variables are fine. For teams and production, we strongly recommend the Proxy YAML mode, as it manages "model routing + fallback + rate limiting + analytics" in a single file. Regardless of the method, the underlying channel to APIYI apiyi.com remains the same—you only ever need to maintain one API Key.

Practical Guide: LiteLLM + APIYI Multi-Model Routing

The real power of LiteLLM Proxy mode lies in using a single YAML file to map "semantic model names" to "actual channels." Below is a production-ready, minimal routing configuration.

litellm-configure-third-party-api-relay-tutorial-apiyi-en 图示

# litellm_config.yaml - Production routing example
model_list:
  # Primary inference models
  - model_name: smart
    litellm_params:
      model: openai/gpt-5
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY
      timeout: 60

  - model_name: smart
    litellm_params:
      model: openai/claude-opus-4-6
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY
      timeout: 60

  # Low-cost, fast models
  - model_name: fast
    litellm_params:
      model: openai/gpt-5-mini
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY

  # Vision/Multimodal
  - model_name: vision
    litellm_params:
      model: openai/gemini-3-pro
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY

  # Embedding
  - model_name: embed
    litellm_params:
      model: openai/text-embedding-3-large
      api_base: https://api.apiyi.com/v1
      api_key: os.environ/APIYI_KEY

litellm_settings:
  drop_params: true
  num_retries: 2
  request_timeout: 60

router_settings:
  routing_strategy: simple-shuffle           # Round-robin for models with the same name
  fallbacks:
    - smart: ["fast"]                        # Fallback to 'fast' if 'smart' fails

general_settings:
  master_key: sk-litellm-master-xxxx         # Clients must provide this key

Your application layer only sees the four semantic names: smart, fast, vision, and embed. When GPT-5 hits a rate limit, LiteLLM automatically switches to Claude Opus 4.6 (since both are registered as smart), and if that fails, it falls back to fast. All underlying traffic is routed through the APIYI (apiyi.com) API proxy service for unified billing and monitoring, perfectly isolating your application from the channel layer.

Common Questions: Configuring Third-Party API Proxy Services in LiteLLM

Q1: Why do I get a 404 Not Found error even after setting the base_url?

90% of the time, it's because the api_base is missing /v1 at the end. LiteLLM uses the OpenAI client internally, which automatically appends /chat/completions. Therefore, your api_base must be https://api.apiyi.com/v1, not https://api.apiyi.com. Also, don't write it as https://api.apiyi.com/v1/chat/completions, or it will be duplicated.

Q2: Why must I add the openai/ prefix to the model name?

LiteLLM maintains an internal provider routing table. The openai/ prefix tells LiteLLM, "Please use the OpenAI client to access this endpoint." If you omit the prefix, LiteLLM might try to match its built-in provider (e.g., claude-opus-4-6 might be identified as a native Anthropic API), leading to protocol mismatches. Always add the openai/ prefix when connecting to an API proxy service.

Q3: Can a single APIYI key call multiple models?

Yes. A single key from APIYI (apiyi.com) supports all available models on the platform by default, including GPT-5, Claude Opus 4.6, Gemini 3 Pro, DeepSeek, Qwen, and more. This is the core difference between it and official APIs—you only need to maintain one key and one base_url to mount dozens of models in your LiteLLM YAML.

Q4: How do I verify the proxy link is working after starting LiteLLM Proxy?

The fastest way is to use curl to hit the LiteLLM Proxy directly:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-master-xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart",
    "messages": [{"role": "user", "content": "ping"}]
  }'

A 200 OK response with JSON confirms the entire chain (Application → LiteLLM Proxy → APIYI) is working. If it fails, check the LiteLLM console logs first, then try hitting APIYI directly with the same base_url + key to isolate the issue.

Q5: Do I need extra configuration for streaming in a proxy scenario?

No. APIYI (apiyi.com) fully supports SSE streaming responses, and LiteLLM passes them through by default. Just add stream=True to your completion() call, or set stream=True when using the OpenAI SDK to call the Proxy, and you'll receive token-by-token output.

Q6: Can I connect Embedding and image generation as well?

Yes. APIYI (apiyi.com) supports /v1/embeddings, /v1/images/generations, and /v1/audio/transcriptions all through the same base_url and key. Simply add the corresponding models to your model_list in the LiteLLM YAML (e.g., text-embedding-3-large, gpt-image-1, whisper-1). They are used exactly like chat models; see the production routing example above for details.

Summary

Configuring a third-party API proxy service with LiteLLM really boils down to just three things:

  1. Protocol Alignment: Add the openai/ prefix to your model name to tell LiteLLM to use the OpenAI client protocol.
  2. Endpoint Alignment: Set api_base to the proxy service's root path plus /v1 (e.g., https://api.apiyi.com/v1).
  3. Credential Alignment: Pass the API key provided by the proxy service via api_key or an environment variable.

Once you've nailed these three steps, you can seamlessly layer all of LiteLLM's powerful features—like multi-model routing, fallbacks, rate limiting, billing, and logging—on top of a stable API proxy service.

🚀 Pro Tip: If you're building a unified LLM gateway for your team, we recommend a three-tier architecture: "Application → LiteLLM Proxy → APIYI (apiyi.com)". LiteLLM handles the routing and fallbacks, while APIYI takes care of the underlying model access, stability, and pay-as-you-go billing. You only need to manage one YAML file and one API key. Sign up at apiyi.com to get testing credits and make your first model invocation in under 5 minutes.


Author: APIYI Team — Dedicated to providing developers with stable access to mainstream AI Large Language Models. Visit apiyi.com to learn more.

References

  1. LiteLLM Official Documentation – OpenAI Compatible Endpoints

    • Link: docs.litellm.ai/docs/providers/openai_compatible
    • Description: Official examples for SDK and Proxy YAML configurations.
  2. LiteLLM Proxy Configuration Overview

    • Link: docs.litellm.ai/docs/proxy/configs
    • Description: Complete fields for model_list, router_settings, and fallbacks.
  3. LiteLLM GitHub Repository

    • Link: github.com/BerriAI/litellm
    • Description: Source code, issues, and the latest releases.
  4. daily_stock_analysis – LLM_CONFIG_GUIDE

    • Link: github.com/ZhuLinsen/daily_stock_analysis/blob/main/docs/LLM_CONFIG_GUIDE.md
    • Description: Practical guide on three configuration modes and multi-channel setups.
  5. APIYI Official Documentation

    • Link: apiyi.com
    • Description: Supported model list, base_url settings, and API key management.

Similar Posts