LLM Integration

Configure AI providers for your agent.

Supported Providers

Provider	Models	Via
Workers AI	Llama 4 Scout, GPT-OSS 120B	Direct
OpenAI	GPT-4o, o1	AI Gateway
Anthropic	Claude Opus 4.5, Claude Sonnet 4.5	AI Gateway
Google	Gemini 2.5 Pro, Gemini 2.5 Flash	AI Gateway

Configuration

Via Settings

Navigate to Settings
Configure LLM section:
- Provider
- Model
- Temperature
- Max tokens
- System prompt

Settings Schema

json

{
  "llm_provider": "workers-ai",
  "llm_model": "@cf/meta/llama-3.1-8b-instruct",
  "llm_temperature": 0.7,
  "llm_max_tokens": 2048,
  "llm_system_prompt": "You are a helpful assistant."
}

Workers AI Models

Available Models

Model	Context	Best For
`llama-4-scout`	131K	General chat, tool calling
`gpt-oss-120b`	128K	Complex reasoning, instruction following

No API Key Required

Workers AI is built-in and requires no API key. Models are accessed directly through Cloudflare's infrastructure.

GPT-OSS Format

GPT-OSS models use an instruction-based format instead of chat messages:

json

{
  "llm_provider": "workers-ai",
  "llm_model": "@cf/openai/gpt-oss-120b"
}

The model accepts instructions (system prompt) and input (user query) fields, optimized for single-turn tasks.

OpenAI Integration

Configuration

json

{
  "llm_provider": "openai",
  "llm_model": "gpt-4-turbo-preview"
}

API Key

Store in agent settings or secrets:

bash

npx wrangler secret put OPENAI_API_KEY

Anthropic Integration

Configuration

json

{
  "llm_provider": "anthropic",
  "llm_model": "claude-sonnet-4-5-20250514"
}

API Key

Store in agent settings or secrets:

bash

npx wrangler secret put ANTHROPIC_API_KEY

Google Integration

Configuration

json

{
  "llm_provider": "google",
  "llm_model": "gemini-2.5-flash"
}

API Key

Store in agent settings or secrets:

bash

npx wrangler secret put GOOGLE_AI_API_KEY

Parameters

Temperature

Controls randomness (0-1):

0 - Deterministic, focused
0.7 - Balanced (default)
1 - Creative, varied

Max Tokens

Maximum response length:

Varies by model
Higher = longer responses
Consider context limits

System Prompt

Instructions for the AI:

You are a customer support agent for Acme Corp.
- Answer product questions
- Help with issues
- Be polite and professional

Best Practices

1. Choose the Right Model

Use Case	Recommended
General chat	Llama 4 Scout (Workers AI)
Complex reasoning	Claude Opus 4.5 / GPT-4o / GPT-OSS 120B
Fast responses	Gemini 2.5 Flash
Code generation	GPT-4o
Cost-effective	Workers AI models (Llama 4, GPT-OSS)

2. Write Clear Prompts

Be specific in system prompts:

Define the role
Set boundaries
Specify format

3. Monitor Usage

Track token consumption and costs (see Token Usage Tracking below).

4. Test Before Production

Verify responses with test conversations.

Token Usage Tracking

Monitor AI usage with detailed token counts per conversation, model, and time period.

Dashboard

Navigate to Token Usage in the settings to view:

Total tokens - Cumulative usage
By model - Breakdown per model
By period - Daily, weekly, monthly trends
By conversation - Per-thread usage

Usage Data

json

{
  "period": "2024-12",
  "totalTokens": 1250000,
  "promptTokens": 800000,
  "completionTokens": 450000,
  "models": {
    "@cf/meta/llama-4-scout": 900000,
    "gpt-4o": 250000,
    "claude-3-sonnet": 100000
  },
  "cost": {
    "estimated": 12.50,
    "currency": "USD"
  }
}

Per-Message Tracking

Each chat message records token usage:

json

{
  "id": "msg_abc123",
  "role": "assistant",
  "content": "...",
  "tokenUsage": {
    "promptTokens": 1250,
    "completionTokens": 340,
    "totalTokens": 1590
  }
}

Optimizing Token Usage

Use Selective Tool Loading - Reduces system prompt by ~77%
Enable Code Mode (Selective) - Compact tool hints instead of full schemas
Trim conversation history - Automatic context management
Choose efficient models - Smaller models for simple tasks

LLM Integration ​

Supported Providers ​

Configuration ​

Via Settings ​

Settings Schema ​

Workers AI Models ​

Available Models ​

No API Key Required ​

GPT-OSS Format ​

OpenAI Integration ​

Configuration ​

API Key ​

Anthropic Integration ​

Configuration ​

API Key ​

Google Integration ​

Configuration ​

API Key ​

Parameters ​

Temperature ​

Max Tokens ​

System Prompt ​

Best Practices ​

1. Choose the Right Model ​

2. Write Clear Prompts ​

3. Monitor Usage ​

4. Test Before Production ​

Token Usage Tracking ​

Dashboard ​

Usage Data ​

Per-Message Tracking ​

Optimizing Token Usage ​

LLM Integration

Supported Providers

Configuration

Via Settings

Settings Schema

Workers AI Models

Available Models

No API Key Required

GPT-OSS Format

OpenAI Integration

Configuration

API Key

Anthropic Integration

Configuration

API Key

Google Integration

Configuration

API Key

Parameters

Temperature

Max Tokens

System Prompt

Best Practices

1. Choose the Right Model

2. Write Clear Prompts

3. Monitor Usage

4. Test Before Production

Token Usage Tracking

Dashboard

Usage Data

Per-Message Tracking

Optimizing Token Usage