Skip to content

Chat Interface

The chat interface provides interactive AI conversations with streaming responses, tool calling, and intelligent context management.

Overview

Features:

  • Real-time streaming responses with Server-Sent Events (SSE)
  • Agentic tool calling with up to 8 iterations per message
  • Conversation history with token-aware context trimming
  • Multi-model support (Llama, GPT-4o, Claude, etc.)
  • Automatic RAG context injection
  • Loop detection for repeated tool calls
  • Streaming state recovery

Usage

Basic Chat

Navigate to Chat in the sidebar:

  1. Select a model from the dropdown
  2. Toggle tool calling on/off
  3. Toggle streaming on/off
  4. Type your message and press Enter
  5. View AI response with real-time streaming
  6. Continue the conversation

Message Types

TypeDescription
userYour messages
assistantAI responses
systemSystem instructions
toolTool call results

Conversation Management

New Conversation

Start fresh:

  • Click "New Conversation" in the sidebar
  • Previous context is cleared
  • System prompt is reset

Conversation History

View past conversations:

  • Conversations appear in the left sidebar
  • Click to switch between conversations
  • Up to 50 conversations displayed

Delete Conversation

bash
curl -X DELETE https://your-domain.com/api/agents/{id}/chat/conversations/{conversationId}

Clear All History

bash
curl -X DELETE https://your-domain.com/api/agents/{id}/chat

Streaming Responses

The chat uses Server-Sent Events (SSE) for real-time streaming.

SSE Event Types

EventDescription
message_startStream begins, includes userMessageId and model
thinkingModel is processing (iteration count)
tool_callTool being called (id, name, arguments)
tool_resultTool execution result (success/error, result)
content_deltaText chunk received (delta content)
message_completeStream finished (assistantMessageId, full toolCalls/Results)
errorError occurred (message)

Streaming Example

typescript
const response = await fetch('/api/agents/{id}/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    content: 'Tell me a story',
    stream: true
  })
})

const reader = response.body.getReader()
const decoder = new TextDecoder()

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const lines = decoder.decode(value).split('\n')
  for (const line of lines) {
    if (line.startsWith('event: ')) {
      const eventType = line.slice(7)
    }
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6))

      switch (data.type) {
        case 'content_delta':
          process.stdout.write(data.delta)
          break
        case 'tool_call':
          console.log(`Calling tool: ${data.name}`)
          break
        case 'tool_result':
          console.log(`Tool result: ${data.result}`)
          break
        case 'message_complete':
          console.log('Done!')
          break
      }
    }
  }
}

Agentic Tool Calling

The AI can call multiple tools in a reasoning loop to accomplish tasks.

How It Works

User Message


┌─────────────────────────────┐
│  Agentic Loop (max 8 iter)  │
│  ┌─────────────────────┐    │
│  │ 1. Send "thinking"  │    │
│  │ 2. Call LLM         │    │
│  │ 3. Parse tool calls │    │
│  └──────────┬──────────┘    │
│             │               │
│      Has tool calls?        │
│        /        \           │
│      Yes         No         │
│       │           │         │
│       ▼           ▼         │
│  Execute tools   Stream     │
│  Add results     response   │
│  Next iteration   Exit      │
└─────────────────────────────┘

Loop Detection

The system prevents infinite loops by:

  • Tracking last 10 tool calls
  • Limiting same tool with same arguments to 2 occurrences
  • Maximum 8 tool call iterations per message

Tool Call Example

User: "Search for React tutorials and summarize the top result"

The AI will:

  1. Call search_knowledge with query "React tutorials"
  2. Receive search results
  3. Call browse_url to fetch the top result
  4. Return a summary combining the information
json
{
  "role": "assistant",
  "content": "Here's a summary of the top React tutorial...",
  "toolCalls": [
    {
      "id": "tc_abc123",
      "name": "search_knowledge",
      "arguments": {"query": "React tutorials"}
    },
    {
      "id": "tc_def456",
      "name": "browse_url",
      "arguments": {"url": "https://example.com/react-tutorial"}
    }
  ],
  "toolResults": [
    {
      "toolCallId": "tc_abc123",
      "name": "search_knowledge",
      "result": [{"title": "React Tutorial", "url": "..."}]
    },
    {
      "toolCallId": "tc_def456",
      "name": "browse_url",
      "result": {"content": "..."}
    }
  ]
}

Context Management

Token-Aware Trimming

The system intelligently manages context:

  • Estimates tokens (4 characters ≈ 1 token)
  • Reserves ~20K tokens for: system prompt (8K), tools (5K), RAG (2K), response (2K)
  • Removes older messages when approaching context limits
  • Always includes system prompt

Context Limits by Model

ProviderModelContext Limit
Workers AILlama 4 Scout128K tokens
OpenAIGPT-4o128K tokens
AnthropicClaude200K tokens

Automatic RAG

When enabled, the system:

  1. Searches your knowledge base for relevant context
  2. Injects relevant chunks into the system prompt
  3. Provides grounded, accurate responses

Configuration

Request Options

OptionTypeDescription
contentstringThe message content
modelstringModel to use (optional)
conversationIdstringContinue specific conversation
enableToolsbooleanEnable tool calling (default: true)
autoRagbooleanEnable automatic RAG (default: true)
streambooleanEnable streaming (default: false)

Model Settings

SettingDescription
llm_providerAI provider (workers-ai, openai, anthropic)
llm_modelSpecific model
llm_temperatureCreativity (0-1)
llm_max_tokensResponse length limit

System Prompt

Set agent behavior:

json
{
  "llm_system_prompt": "You are a helpful assistant specialized in customer support. Always be polite and professional."
}

API

Send Message

bash
curl -X POST https://your-domain.com/api/agents/{id}/chat \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello, how can you help me?",
    "conversationId": "conv-123",
    "model": "llama-4-scout",
    "enableTools": true,
    "stream": true
  }'

Get History

bash
curl "https://your-domain.com/api/agents/{id}/chat?limit=50&conversationId=conv-123"

List Conversations

bash
curl "https://your-domain.com/api/agents/{id}/chat/conversations"

Create Conversation

bash
curl -X POST https://your-domain.com/api/agents/{id}/chat/conversations \
  -H "Content-Type: application/json" \
  -d '{"title": "New Chat"}'

Get Available Models

bash
curl "https://your-domain.com/api/agents/{id}/chat/models"

Get Available Tools

bash
curl "https://your-domain.com/api/agents/{id}/chat/tools"

Streaming State Recovery

The frontend persists streaming state to localStorage:

  • Saves progress during streaming
  • Recovers if tab crashes or closes
  • Expires after 5 minutes
  • Polls server every 2 seconds to detect completion

Best Practices

1. Clear System Prompts

Write specific, clear instructions:

You are a technical support agent for Acme Software.
- Answer questions about our products
- Help troubleshoot issues
- Escalate complex cases to humans
- Never make up answers

2. Manage Context

Keep conversations focused:

  • Start new conversations for new topics
  • Clear history when switching contexts
  • Use conversation IDs for threading

3. Enable Relevant Tools

Only enable needed tools:

  • Reduces confusion
  • Faster responses
  • Lower costs

4. Use Streaming

Enable streaming for better UX:

  • Real-time feedback
  • Lower perceived latency
  • Progressive rendering

5. Monitor Tool Calls

Watch for:

  • Repeated tool calls (may indicate confusion)
  • Failed tool calls
  • Excessive iterations

Released under the MIT License.