Appearance
LLM Integration
Configure AI providers for your agent.
Supported Providers
| Provider | Models | Via |
|---|---|---|
| Workers AI | Llama 4 Scout, GPT-OSS 120B | Direct |
| OpenAI | GPT-4o, o1 | AI Gateway |
| Anthropic | Claude Opus 4.5, Claude Sonnet 4.5 | AI Gateway |
| Gemini 2.5 Pro, Gemini 2.5 Flash | AI Gateway |
Configuration
Via Settings
- Navigate to Settings
- Configure LLM section:
- Provider
- Model
- Temperature
- Max tokens
- System prompt
Settings Schema
json
{
"llm_provider": "workers-ai",
"llm_model": "@cf/meta/llama-3.1-8b-instruct",
"llm_temperature": 0.7,
"llm_max_tokens": 2048,
"llm_system_prompt": "You are a helpful assistant."
}Workers AI Models
Available Models
| Model | Context | Best For |
|---|---|---|
llama-4-scout | 131K | General chat, tool calling |
gpt-oss-120b | 128K | Complex reasoning, instruction following |
No API Key Required
Workers AI is built-in and requires no API key. Models are accessed directly through Cloudflare's infrastructure.
GPT-OSS Format
GPT-OSS models use an instruction-based format instead of chat messages:
json
{
"llm_provider": "workers-ai",
"llm_model": "@cf/openai/gpt-oss-120b"
}The model accepts instructions (system prompt) and input (user query) fields, optimized for single-turn tasks.
OpenAI Integration
Configuration
json
{
"llm_provider": "openai",
"llm_model": "gpt-4-turbo-preview"
}API Key
Store in agent settings or secrets:
bash
npx wrangler secret put OPENAI_API_KEYAnthropic Integration
Configuration
json
{
"llm_provider": "anthropic",
"llm_model": "claude-sonnet-4-5-20250514"
}API Key
Store in agent settings or secrets:
bash
npx wrangler secret put ANTHROPIC_API_KEYGoogle Integration
Configuration
json
{
"llm_provider": "google",
"llm_model": "gemini-2.5-flash"
}API Key
Store in agent settings or secrets:
bash
npx wrangler secret put GOOGLE_AI_API_KEYParameters
Temperature
Controls randomness (0-1):
0- Deterministic, focused0.7- Balanced (default)1- Creative, varied
Max Tokens
Maximum response length:
- Varies by model
- Higher = longer responses
- Consider context limits
System Prompt
Instructions for the AI:
You are a customer support agent for Acme Corp.
- Answer product questions
- Help with issues
- Be polite and professionalBest Practices
1. Choose the Right Model
| Use Case | Recommended |
|---|---|
| General chat | Llama 4 Scout (Workers AI) |
| Complex reasoning | Claude Opus 4.5 / GPT-4o / GPT-OSS 120B |
| Fast responses | Gemini 2.5 Flash |
| Code generation | GPT-4o |
| Cost-effective | Workers AI models (Llama 4, GPT-OSS) |
2. Write Clear Prompts
Be specific in system prompts:
- Define the role
- Set boundaries
- Specify format
3. Monitor Usage
Track token consumption and costs (see Token Usage Tracking below).
4. Test Before Production
Verify responses with test conversations.
Token Usage Tracking
Monitor AI usage with detailed token counts per conversation, model, and time period.
Dashboard
Navigate to Token Usage in the settings to view:
- Total tokens - Cumulative usage
- By model - Breakdown per model
- By period - Daily, weekly, monthly trends
- By conversation - Per-thread usage
Usage Data
json
{
"period": "2024-12",
"totalTokens": 1250000,
"promptTokens": 800000,
"completionTokens": 450000,
"models": {
"@cf/meta/llama-4-scout": 900000,
"gpt-4o": 250000,
"claude-3-sonnet": 100000
},
"cost": {
"estimated": 12.50,
"currency": "USD"
}
}Per-Message Tracking
Each chat message records token usage:
json
{
"id": "msg_abc123",
"role": "assistant",
"content": "...",
"tokenUsage": {
"promptTokens": 1250,
"completionTokens": 340,
"totalTokens": 1590
}
}Optimizing Token Usage
- Use Selective Tool Loading - Reduces system prompt by ~77%
- Enable Code Mode (Selective) - Compact tool hints instead of full schemas
- Trim conversation history - Automatic context management
- Choose efficient models - Smaller models for simple tasks
