Chat Interface
The chat interface provides interactive AI conversations with streaming responses, tool calling, and intelligent context management.
Overview
Features:
- Real-time streaming responses with Server-Sent Events (SSE)
- Agentic tool calling with up to 8 iterations per message
- Conversation history with token-aware context trimming
- Multi-model support (Llama, GPT-4o, Claude, etc.)
- Automatic RAG context injection
- Loop detection for repeated tool calls
- Streaming state recovery
Usage
Basic Chat
Navigate to Chat in the sidebar:
- Select a model from the dropdown
- Toggle tool calling on/off
- Toggle streaming on/off
- Type your message and press Enter
- View AI response with real-time streaming
- Continue the conversation
Message Types
| Type | Description |
|---|---|
user | Your messages |
assistant | AI responses |
system | System instructions |
tool | Tool call results |
Conversation Management
New Conversation
Start fresh:
- Click "New Conversation" in the sidebar
- Previous context is cleared
- System prompt is reset
Conversation History
View past conversations:
- Conversations appear in the left sidebar
- Click to switch between conversations
- Up to 50 conversations displayed
Delete Conversation
bash
curl -X DELETE https://your-domain.com/api/agents/{id}/chat/conversations/{conversationId}Clear All History
bash
curl -X DELETE https://your-domain.com/api/agents/{id}/chatStreaming Responses
The chat uses Server-Sent Events (SSE) for real-time streaming.
SSE Event Types
| Event | Description |
|---|---|
message_start | Stream begins, includes userMessageId and model |
thinking | Model is processing (iteration count) |
tool_call | Tool being called (id, name, arguments) |
tool_result | Tool execution result (success/error, result) |
content_delta | Text chunk received (delta content) |
message_complete | Stream finished (assistantMessageId, full toolCalls/Results) |
error | Error occurred (message) |
Streaming Example
typescript
const response = await fetch('/api/agents/{id}/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
content: 'Tell me a story',
stream: true
})
})
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const lines = decoder.decode(value).split('\n')
for (const line of lines) {
if (line.startsWith('event: ')) {
const eventType = line.slice(7)
}
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6))
switch (data.type) {
case 'content_delta':
process.stdout.write(data.delta)
break
case 'tool_call':
console.log(`Calling tool: ${data.name}`)
break
case 'tool_result':
console.log(`Tool result: ${data.result}`)
break
case 'message_complete':
console.log('Done!')
break
}
}
}
}Agentic Tool Calling
The AI can call multiple tools in a reasoning loop to accomplish tasks.
How It Works
User Message
│
▼
┌─────────────────────────────┐
│ Agentic Loop (max 8 iter) │
│ ┌─────────────────────┐ │
│ │ 1. Send "thinking" │ │
│ │ 2. Call LLM │ │
│ │ 3. Parse tool calls │ │
│ └──────────┬──────────┘ │
│ │ │
│ Has tool calls? │
│ / \ │
│ Yes No │
│ │ │ │
│ ▼ ▼ │
│ Execute tools Stream │
│ Add results response │
│ Next iteration Exit │
└─────────────────────────────┘Loop Detection
The system prevents infinite loops by:
- Tracking last 10 tool calls
- Limiting same tool with same arguments to 2 occurrences
- Maximum 8 tool call iterations per message
Tool Call Example
User: "Search for React tutorials and summarize the top result"
The AI will:
- Call
search_knowledgewith query "React tutorials" - Receive search results
- Call
browse_urlto fetch the top result - Return a summary combining the information
json
{
"role": "assistant",
"content": "Here's a summary of the top React tutorial...",
"toolCalls": [
{
"id": "tc_abc123",
"name": "search_knowledge",
"arguments": {"query": "React tutorials"}
},
{
"id": "tc_def456",
"name": "browse_url",
"arguments": {"url": "https://example.com/react-tutorial"}
}
],
"toolResults": [
{
"toolCallId": "tc_abc123",
"name": "search_knowledge",
"result": [{"title": "React Tutorial", "url": "..."}]
},
{
"toolCallId": "tc_def456",
"name": "browse_url",
"result": {"content": "..."}
}
]
}Context Management
Token-Aware Trimming
The system intelligently manages context:
- Estimates tokens (4 characters ≈ 1 token)
- Reserves ~20K tokens for: system prompt (8K), tools (5K), RAG (2K), response (2K)
- Removes older messages when approaching context limits
- Always includes system prompt
Context Limits by Model
| Provider | Model | Context Limit |
|---|---|---|
| Workers AI | Llama 4 Scout | 128K tokens |
| OpenAI | GPT-4o | 128K tokens |
| Anthropic | Claude | 200K tokens |
Automatic RAG
When enabled, the system:
- Searches your knowledge base for relevant context
- Injects relevant chunks into the system prompt
- Provides grounded, accurate responses
Configuration
Request Options
| Option | Type | Description |
|---|---|---|
content | string | The message content |
model | string | Model to use (optional) |
conversationId | string | Continue specific conversation |
enableTools | boolean | Enable tool calling (default: true) |
autoRag | boolean | Enable automatic RAG (default: true) |
stream | boolean | Enable streaming (default: false) |
Model Settings
| Setting | Description |
|---|---|
llm_provider | AI provider (workers-ai, openai, anthropic) |
llm_model | Specific model |
llm_temperature | Creativity (0-1) |
llm_max_tokens | Response length limit |
System Prompt
Set agent behavior:
json
{
"llm_system_prompt": "You are a helpful assistant specialized in customer support. Always be polite and professional."
}API
Send Message
bash
curl -X POST https://your-domain.com/api/agents/{id}/chat \
-H "Content-Type: application/json" \
-d '{
"content": "Hello, how can you help me?",
"conversationId": "conv-123",
"model": "llama-4-scout",
"enableTools": true,
"stream": true
}'Get History
bash
curl "https://your-domain.com/api/agents/{id}/chat?limit=50&conversationId=conv-123"List Conversations
bash
curl "https://your-domain.com/api/agents/{id}/chat/conversations"Create Conversation
bash
curl -X POST https://your-domain.com/api/agents/{id}/chat/conversations \
-H "Content-Type: application/json" \
-d '{"title": "New Chat"}'Get Available Models
bash
curl "https://your-domain.com/api/agents/{id}/chat/models"Get Available Tools
bash
curl "https://your-domain.com/api/agents/{id}/chat/tools"Streaming State Recovery
The frontend persists streaming state to localStorage:
- Saves progress during streaming
- Recovers if tab crashes or closes
- Expires after 5 minutes
- Polls server every 2 seconds to detect completion
Best Practices
1. Clear System Prompts
Write specific, clear instructions:
You are a technical support agent for Acme Software.
- Answer questions about our products
- Help troubleshoot issues
- Escalate complex cases to humans
- Never make up answers2. Manage Context
Keep conversations focused:
- Start new conversations for new topics
- Clear history when switching contexts
- Use conversation IDs for threading
3. Enable Relevant Tools
Only enable needed tools:
- Reduces confusion
- Faster responses
- Lower costs
4. Use Streaming
Enable streaming for better UX:
- Real-time feedback
- Lower perceived latency
- Progressive rendering
5. Monitor Tool Calls
Watch for:
- Repeated tool calls (may indicate confusion)
- Failed tool calls
- Excessive iterations