Sources API

Manage data sources and RAG search.

Endpoints

Method	Endpoint	Description
GET	`/api/agents/{id}/sources`	List sources
POST	`/api/agents/{id}/sources`	Create source
POST	`/api/agents/{id}/sources/crawl`	Crawl a website
GET	`/api/agents/{id}/sources/search`	Search sources
GET	`/api/agents/{id}/sources/{sourceId}`	Get source
DELETE	`/api/agents/{id}/sources/{sourceId}`	Delete source
POST	`/api/agents/{id}/sources/{sourceId}/reindex`	Reindex source

List Sources

bash

GET /api/agents/{id}/sources

Response

json

{
  "sources": [
    {
      "id": "source-123",
      "type": "file",
      "name": "Product Manual",
      "status": "ready",
      "chunks": 150,
      "created_at": "2024-12-01T00:00:00Z"
    },
    {
      "id": "source-456",
      "type": "crawl",
      "name": "Help Center",
      "status": "ready",
      "chunks": 320,
      "config": {
        "url": "https://help.example.com",
        "maxPages": 50,
        "maxDepth": 3,
        "pageCount": 42,
        "lastCrawledAt": "2026-03-10T14:30:00Z"
      },
      "created_at": "2026-03-01T00:00:00Z"
    }
  ]
}

Create Source

File Upload

bash

POST /api/agents/{id}/sources
Content-Type: multipart/form-data

-F "type=file"
-F "name=Product Manual"
-F "file=@/path/to/manual.pdf"

Response

json

{
  "id": "source-123",
  "type": "file",
  "name": "Product Manual",
  "status": "processing",
  "created_at": "2024-12-15T10:00:00Z"
}

Crawl Website

bash

POST /api/agents/{id}/sources/crawl
Content-Type: application/json

Starts a BFS crawl of the target website using Cloudflare Browser Rendering (/markdown + /links endpoints). The crawl runs in the background via ctx.waitUntil and does not block the response.

Request Body

json

{
  "url": "https://docs.example.com",
  "name": "Documentation Site",
  "maxPages": 30,
  "maxDepth": 3,
  "includePatterns": ["/docs/*", "/guides/*"],
  "excludePatterns": ["/blog/*", "/changelog/*"]
}

Parameters

Field	Type	Required	Default	Description
`url`	string	Yes	—	Starting URL for the crawl
`name`	string	Yes	—	Display name for the source
`maxPages`	number	No	50	Maximum pages to crawl (up to 50)
`maxDepth`	number	No	3	Maximum link depth from start URL (up to 5)
`includePatterns`	string[]	No	`[]`	Glob patterns — only crawl URLs matching at least one pattern
`excludePatterns`	string[]	No	`[]`	Glob patterns — skip URLs matching any pattern

Response

json

{
  "id": "source-789",
  "type": "crawl",
  "name": "Documentation Site",
  "status": "crawling",
  "config": {
    "url": "https://docs.example.com",
    "maxPages": 30,
    "maxDepth": 3,
    "includePatterns": ["/docs/*", "/guides/*"],
    "excludePatterns": ["/blog/*", "/changelog/*"]
  },
  "created_at": "2026-03-11T10:00:00Z"
}

The source status will be crawling while the BFS traversal is in progress. Once complete, pages are chunked and embedded, and the status transitions to ready.

Crawl Source Config Fields

When retrieving a crawl source, the config object contains these additional metadata fields:

Field	Type	Description
`url`	string	The starting URL
`maxPages`	number	Configured page limit
`maxDepth`	number	Configured depth limit
`includePatterns`	string[]	URL include patterns
`excludePatterns`	string[]	URL exclude patterns
`pageCount`	number	Total pages discovered and indexed
`lastCrawledAt`	string	ISO 8601 timestamp of the most recent crawl
`pageHashes`	object	Map of URL to SHA-256 content hash (used for delta detection)
`changedPages`	number	Pages that were new or modified on the last re-crawl
`unchangedPages`	number	Pages skipped (content hash matched previous crawl)

Reindex Source

bash

POST /api/agents/{id}/sources/{sourceId}/reindex

Reprocesses the source content. For crawl sources, this triggers a full re-crawl with delta detection: the crawler revisits all pages but only re-indexes those whose SHA-256 content hash has changed since the last crawl. Unchanged pages are skipped.

Response

json

{
  "id": "source-789",
  "type": "crawl",
  "status": "crawling",
  "message": "Re-crawl started with delta detection"
}

After the re-crawl completes, the source config will include updated changedPages and unchangedPages counts:

json

{
  "id": "source-789",
  "type": "crawl",
  "name": "Documentation Site",
  "status": "ready",
  "config": {
    "url": "https://docs.example.com",
    "maxPages": 30,
    "maxDepth": 3,
    "pageCount": 42,
    "lastCrawledAt": "2026-03-11T15:00:00Z",
    "changedPages": 3,
    "unchangedPages": 39
  }
}

Search Sources

bash

GET /api/agents/{id}/sources/search?query=how+to+reset+password&limit=5

Query Parameters

Parameter	Type	Description
`query`	string	Search query (required)
`limit`	number	Max results (default: 10)
`source_id`	string	Filter by source
`min_score`	number	Minimum similarity score

Response

json

{
  "results": [
    {
      "id": "chunk-123",
      "source_id": "source-456",
      "content": "To reset your password, go to Settings > Account > Password...",
      "score": 0.92,
      "metadata": {
        "source_name": "User Guide",
        "page": 15,
        "section": "Account Settings"
      }
    }
  ]
}

Get Source

bash

GET /api/agents/{id}/sources/{sourceId}

Response

json

{
  "id": "source-123",
  "type": "file",
  "name": "Product Manual",
  "file_path": "uploads/manual.pdf",
  "status": "ready",
  "chunks": 150,
  "config": {
    "chunk_size": 1000,
    "overlap": 200
  },
  "created_at": "2024-12-01T00:00:00Z",
  "updated_at": "2024-12-15T10:00:00Z"
}

Crawl Source Response

json

{
  "id": "source-789",
  "type": "crawl",
  "name": "Documentation Site",
  "status": "ready",
  "chunks": 320,
  "config": {
    "url": "https://docs.example.com",
    "maxPages": 30,
    "maxDepth": 3,
    "includePatterns": ["/docs/*"],
    "excludePatterns": ["/blog/*"],
    "pageCount": 42,
    "lastCrawledAt": "2026-03-11T15:00:00Z",
    "changedPages": 3,
    "unchangedPages": 39
  },
  "created_at": "2026-03-01T00:00:00Z",
  "updated_at": "2026-03-11T15:00:00Z"
}

Delete Source

bash

DELETE /api/agents/{id}/sources/{sourceId}

Response

json

{
  "success": true
}

Refresh Source

bash

POST /api/agents/{id}/sources/{sourceId}/refresh

Reprocesses the source content. For crawl sources, use the Reindex endpoint instead, which performs a re-crawl with delta detection.

Source Types

Type	Description
`file`	Uploaded file
`crawl`	Crawled website (BFS traversal with delta detection)
`database`	Database query
`api`	API endpoint

Source Status

Status	Description
`pending`	Waiting to process
`processing`	Currently processing
`crawling`	Website crawl in progress (BFS traversal)
`ready`	Available for search
`error`	Processing failed
`updating`	Refreshing content

Supported Formats

Format	Extensions
PDF	.pdf
Text	.txt
Markdown	.md
Word	.docx
JSON	.json
CSV	.csv

Chunking Config

json

{
  "config": {
    "chunk_size": 1000,
    "overlap": 200,
    "method": "semantic"
  }
}

Errors

Code	Description
400	Invalid source data
404	Source not found
413	File too large
415	Unsupported format

Examples

Upload PDF

bash

curl -X POST .../sources \
  -F "type=file" \
  -F "name=User Guide" \
  -F "[email protected]"

Search with Filter

bash

curl ".../sources/search?query=installation&source_id=source-123&min_score=0.8"

Crawl Website

bash

curl -X POST .../sources/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://help.example.com",
    "name": "Help Center",
    "maxPages": 30,
    "maxDepth": 3,
    "includePatterns": ["/docs/*"],
    "excludePatterns": ["/blog/*"]
  }'

Check Crawl Source Details

bash

curl .../sources/source-789

Response includes crawl metadata (pageCount, lastCrawledAt, changedPages, unchangedPages).

Reindex Crawl Source (Delta Re-crawl)

bash

curl -X POST .../sources/source-789/reindex

Triggers a re-crawl that only re-indexes pages whose content has changed.

Sources API ​

Endpoints ​

List Sources ​

Response ​

Create Source ​

File Upload ​

Response ​

Crawl Website ​

Request Body ​

Parameters ​

Response ​

Crawl Source Config Fields ​

Reindex Source ​

Response ​

Search Sources ​

Query Parameters ​

Response ​

Get Source ​

Response ​

Crawl Source Response ​

Delete Source ​

Response ​

Refresh Source ​

Source Types ​

Source Status ​

Supported Formats ​

Chunking Config ​

Errors ​

Examples ​

Upload PDF ​

Search with Filter ​

Crawl Website ​

Check Crawl Source Details ​

Reindex Crawl Source (Delta Re-crawl) ​

Sources API

Endpoints

List Sources

Response

Create Source

File Upload

Response

Crawl Website

Request Body

Parameters

Response

Crawl Source Config Fields

Reindex Source

Response

Search Sources

Query Parameters

Response

Get Source

Response

Crawl Source Response

Delete Source

Response

Refresh Source

Source Types

Source Status

Supported Formats

Chunking Config

Errors

Examples

Upload PDF

Search with Filter

Crawl Website

Check Crawl Source Details

Reindex Crawl Source (Delta Re-crawl)