Chat Interface

The chat interface provides interactive AI conversations with streaming responses, tool calling, and intelligent context management.

Overview

Features:

Real-time streaming responses with Server-Sent Events (SSE)
Agentic tool calling with up to 8 iterations per message
Conversation history with token-aware context trimming
Multi-model support (Llama, GPT-4o, Claude, etc.)
Automatic RAG context injection
Loop detection for repeated tool calls
Streaming state recovery

Usage

Basic Chat

Navigate to Chat in the sidebar:

Select a model from the dropdown
Toggle tool calling on/off
Toggle streaming on/off
Type your message and press Enter
View AI response with real-time streaming
Continue the conversation

Message Types

Type	Description
`user`	Your messages
`assistant`	AI responses
`system`	System instructions
`tool`	Tool call results

Conversation Management

New Conversation

Start fresh:

Click "New Conversation" in the sidebar
Previous context is cleared
System prompt is reset

Conversation History

View past conversations:

Conversations appear in the left sidebar
Click to switch between conversations
Up to 50 conversations displayed

Delete Conversation

bash

curl -X DELETE https://your-domain.com/api/agents/{id}/chat/conversations/{conversationId}

Clear All History

bash

curl -X DELETE https://your-domain.com/api/agents/{id}/chat

Streaming Responses

The chat uses Server-Sent Events (SSE) for real-time streaming.

SSE Event Types

Event	Description
`message_start`	Stream begins, includes userMessageId and model
`thinking`	Model is processing (iteration count)
`tool_call`	Tool being called (id, name, arguments)
`tool_result`	Tool execution result (success/error, result)
`content_delta`	Text chunk received (delta content)
`message_complete`	Stream finished (assistantMessageId, full toolCalls/Results)
`error`	Error occurred (message)

Streaming Example

typescript

const response = await fetch('/api/agents/{id}/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    content: 'Tell me a story',
    stream: true
  })
})

const reader = response.body.getReader()
const decoder = new TextDecoder()

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const lines = decoder.decode(value).split('\n')
  for (const line of lines) {
    if (line.startsWith('event: ')) {
      const eventType = line.slice(7)
    }
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6))

      switch (data.type) {
        case 'content_delta':
          process.stdout.write(data.delta)
          break
        case 'tool_call':
          console.log(`Calling tool: ${data.name}`)
          break
        case 'tool_result':
          console.log(`Tool result: ${data.result}`)
          break
        case 'message_complete':
          console.log('Done!')
          break
      }
    }
  }
}

Agentic Tool Calling

The AI can call multiple tools in a reasoning loop to accomplish tasks.

How It Works

User Message
     │
     ▼
┌─────────────────────────────┐
│  Agentic Loop (max 8 iter)  │
│  ┌─────────────────────┐    │
│  │ 1. Send "thinking"  │    │
│  │ 2. Call LLM         │    │
│  │ 3. Parse tool calls │    │
│  └──────────┬──────────┘    │
│             │               │
│      Has tool calls?        │
│        /        \           │
│      Yes         No         │
│       │           │         │
│       ▼           ▼         │
│  Execute tools   Stream     │
│  Add results     response   │
│  Next iteration   Exit      │
└─────────────────────────────┘

Loop Detection

The system prevents infinite loops by:

Tracking last 10 tool calls
Limiting same tool with same arguments to 2 occurrences
Maximum 8 tool call iterations per message

Tool Call Example

User: "Search for React tutorials and summarize the top result"

The AI will:

Call search_knowledge with query "React tutorials"
Receive search results
Call browse_url to fetch the top result
Return a summary combining the information

json

{
  "role": "assistant",
  "content": "Here's a summary of the top React tutorial...",
  "toolCalls": [
    {
      "id": "tc_abc123",
      "name": "search_knowledge",
      "arguments": {"query": "React tutorials"}
    },
    {
      "id": "tc_def456",
      "name": "browse_url",
      "arguments": {"url": "https://example.com/react-tutorial"}
    }
  ],
  "toolResults": [
    {
      "toolCallId": "tc_abc123",
      "name": "search_knowledge",
      "result": [{"title": "React Tutorial", "url": "..."}]
    },
    {
      "toolCallId": "tc_def456",
      "name": "browse_url",
      "result": {"content": "..."}
    }
  ]
}

Context Management

Token-Aware Trimming

The system intelligently manages context:

Estimates tokens (4 characters ≈ 1 token)
Reserves ~20K tokens for: system prompt (8K), tools (5K), RAG (2K), response (2K)
Removes older messages when approaching context limits
Always includes system prompt

Context Limits by Model

Provider	Model	Context Limit
Workers AI	Llama 4 Scout	128K tokens
OpenAI	GPT-4o	128K tokens
Anthropic	Claude	200K tokens

Automatic RAG

When enabled, the system:

Searches your knowledge base for relevant context
Injects relevant chunks into the system prompt
Provides grounded, accurate responses

Configuration

Request Options

Option	Type	Description
`content`	string	The message content
`model`	string	Model to use (optional)
`conversationId`	string	Continue specific conversation
`enableTools`	boolean	Enable tool calling (default: true)
`autoRag`	boolean	Enable automatic RAG (default: true)
`stream`	boolean	Enable streaming (default: false)

Model Settings

Setting	Description
`llm_provider`	AI provider (workers-ai, openai, anthropic)
`llm_model`	Specific model
`llm_temperature`	Creativity (0-1)
`llm_max_tokens`	Response length limit

System Prompt

Set agent behavior:

json

{
  "llm_system_prompt": "You are a helpful assistant specialized in customer support. Always be polite and professional."
}

API

Send Message

bash

curl -X POST https://your-domain.com/api/agents/{id}/chat \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello, how can you help me?",
    "conversationId": "conv-123",
    "model": "llama-4-scout",
    "enableTools": true,
    "stream": true
  }'

Get History

bash

curl "https://your-domain.com/api/agents/{id}/chat?limit=50&conversationId=conv-123"

List Conversations

bash

curl "https://your-domain.com/api/agents/{id}/chat/conversations"

Create Conversation

bash

curl -X POST https://your-domain.com/api/agents/{id}/chat/conversations \
  -H "Content-Type: application/json" \
  -d '{"title": "New Chat"}'

Get Available Models

bash

curl "https://your-domain.com/api/agents/{id}/chat/models"

Get Available Tools

bash

curl "https://your-domain.com/api/agents/{id}/chat/tools"

Streaming State Recovery

The frontend persists streaming state to localStorage:

Saves progress during streaming
Recovers if tab crashes or closes
Expires after 5 minutes
Polls server every 2 seconds to detect completion

Best Practices

1. Clear System Prompts

Write specific, clear instructions:

You are a technical support agent for Acme Software.
- Answer questions about our products
- Help troubleshoot issues
- Escalate complex cases to humans
- Never make up answers

2. Manage Context

Keep conversations focused:

Start new conversations for new topics
Clear history when switching contexts
Use conversation IDs for threading

3. Enable Relevant Tools

Only enable needed tools:

Reduces confusion
Faster responses
Lower costs

4. Use Streaming

Enable streaming for better UX:

Real-time feedback
Lower perceived latency
Progressive rendering

5. Monitor Tool Calls

Watch for:

Repeated tool calls (may indicate confusion)
Failed tool calls
Excessive iterations

Chat Interface ​

Overview ​

Usage ​

Basic Chat ​

Message Types ​

Conversation Management ​

New Conversation ​

Conversation History ​

Delete Conversation ​

Clear All History ​

Streaming Responses ​

SSE Event Types ​

Streaming Example ​

Agentic Tool Calling ​

How It Works ​

Loop Detection ​

Tool Call Example ​

Context Management ​

Token-Aware Trimming ​

Context Limits by Model ​

Automatic RAG ​

Configuration ​

Request Options ​

Model Settings ​

System Prompt ​

API ​

Send Message ​

Get History ​

List Conversations ​

Create Conversation ​

Get Available Models ​

Get Available Tools ​

Streaming State Recovery ​

Best Practices ​

1. Clear System Prompts ​

2. Manage Context ​

3. Enable Relevant Tools ​

4. Use Streaming ​

5. Monitor Tool Calls ​

Chat Interface

Overview

Usage

Basic Chat

Message Types

Conversation Management

New Conversation

Conversation History

Delete Conversation

Clear All History

Streaming Responses

SSE Event Types

Streaming Example

Agentic Tool Calling

How It Works

Loop Detection

Tool Call Example

Context Management

Token-Aware Trimming

Context Limits by Model

Automatic RAG

Configuration

Request Options

Model Settings

System Prompt

API

Send Message

Get History

List Conversations

Create Conversation

Get Available Models

Get Available Tools

Streaming State Recovery

Best Practices

1. Clear System Prompts

2. Manage Context

3. Enable Relevant Tools

4. Use Streaming

5. Monitor Tool Calls