Generate an API key, then:
curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector&caller_model=claude-haiku-4-5" \ -H "Authorization: Bearer srch_your_key"
Bearer token in the Authorization header. API keys are prefixed with srch_.
Look up (or generate) an AI-generated answer from the semantic cache. Each cached answer is tagged with the model that produced it and a quality tier 1–4; callers filter by min_model_tier to demand a quality floor.
By default, this endpoint is lookup-only (free). Set mode=generate to search and generate an answer on cache miss — this costs 1 search credit and the result is cached for future free lookups.
| Parameter | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | The question to look up |
| mode | string | No | lookup (default): cache-only, returns null on miss. generate: on cache miss, runs a web search, summarizes the top results, caches the answer, and returns it. Costs 1 search credit. Requires authentication. If auth or search quota is insufficient, silently falls back to lookup behavior. |
| caller_model | string | No | Your calling model (e.g. claude-haiku-4-5). Sets min_model_tier automatically to max(callerTier, 3) so you never get answers at-or-below your own tier. |
| min_model_tier | integer 1–4 | No | Quality floor. 4=frontier (Opus, GPT-5), 3=strong (Sonnet, GPT-4o), 2=fast/cheap (Haiku, GPT-4o-mini), 1=tiny/local. Default: 1 (no floor) unless caller_model is set. |
| min_similarity | float 0–1 | No | Cosine similarity required for a hit (default 0.92). |
| rank | string | No | tier-then-similarity (default): among answers that clear min_similarity, return the one from the highest-tier model. similarity: closest match wins regardless of tier. |
| alternatives | boolean | No | Return up to 2 secondary candidates alongside the primary answer. |
curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results" \ -H "X-API-Key: srch_your_key"
curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results&mode=generate" \ -H "X-API-Key: srch_your_key"
{
"query": "how to paginate pgvector results",
"cache_hit": true,
"answer": {
"text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
"model_used": "claude-opus-4-7",
"model_tier": 4,
"confidence": 0.97,
"similarity": 0.984,
"cached_at": "2026-04-12T08:13:22Z",
"hit_count": 142,
"sources": [{ "title": "pgvector docs", "url": "https://..." }]
},
"alternatives": [],
"meta": { "mode": "lookup", "generated": false, "min_model_tier": 3, "best_similarity": 0.984, "took_ms": 38 }
}
{
"query": "...",
"cache_hit": false,
"answer": null,
"alternatives": [],
"meta": { "mode": "lookup", "generated": false, "best_similarity": 0.71, "took_ms": 41 }
}
{
"query": "how to paginate pgvector results",
"cache_hit": false,
"answer": {
"text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
"model_used": "claude-opus-4-7",
"model_tier": 4,
"confidence": 0.85,
"similarity": 1.0,
"cached_at": "2026-05-09T20:30:00Z",
"hit_count": 0,
"sources": [{ "title": "pgvector docs", "url": "https://..." }]
},
"alternatives": [],
"meta": { "mode": "generate", "generated": true, "best_similarity": 0.71, "took_ms": 3200 }
}
Cache lookups are free. mode=generate costs 1 search credit (only on cache miss). If you don't have enough search credits, the endpoint silently falls back to lookup-only behavior — no error, just answer: null.
Get ranked web results for a query. Pass a location for "near me" queries; results include news, places, and shopping when the query calls for them.
| Parameter | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | Search query |
| limit | integer | No | Max results 1–20 (default 10) |
| near | string | No | Free-text location (e.g. "San Francisco") |
| include_answer | boolean | No | Generate or look up an LLM answer summarising the top results. Authenticated callers only — generation costs LLM tokens. Cache lookup runs first; a hit returns the cached answer for free. Generated answers are written to answer_cache, so subsequent /v1/answer calls for the same question return them with no extra cost. |
curl "https://api.tomsindex.com/v1/search?q=what+to+do+in+boston&include_answer=true" \ -H "Authorization: Bearer srch_your_key"
{
"results": [{
"title": "Actix Web Framework",
"url": "https://actix.rs",
"snippet": "Actix Web is a powerful...",
"score": 0.95,
"result_id": "uuid"
}],
"answer": {
"text": "Actix Web and Axum are the leading Rust web frameworks...",
"model_used": "claude-opus-4-7",
"model_tier": 4,
"sources": [{ "title": "Actix Web Framework", "url": "https://actix.rs" }],
"generated": true
},
"meta": { "intent": "informational", "took_ms": 143 }
}
The answer object only appears when include_answer=true. generated: true means the answer was just produced and written to the cache; generated: false means it was a cache hit.
Get one actionable hint plus recommended follow-up questions. Complex queries are decomposed into reusable sub-plans — each piece is cached individually so future users benefit from partial matches.
| Field | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | The question or task |
| context | string | No | Code context (source snippets, file paths). Used by the LLM to give specific guidance but NOT embedded or cached — keeps hints reusable across callers. |
| session_id | string | No | Session ID. If session context was previously sent via /v1/session/context, the hint will use it to give specific guidance (e.g. referencing your files, recent errors). |
curl -X POST "https://api.tomsindex.com/v1/hint" \ -H "Authorization: Bearer srch_your_key" \ -H "Content-Type: application/json" \ -d '{ "q": "Plan to build a search engine API using Postgres FTS" }'
{
"hint": "Start with a small crawler → indexer → search API → ranking loop before adding billing. Cache and reuse existing plans when possible.",
"recommended_follow_up": [
{ "label": "Get crawler plan", "q": "Plan to build a web crawler for a search API" },
{ "label": "Find risks", "q": "What are the biggest risks in building a search engine API?" },
{ "label": "Estimate cost", "q": "Estimate monthly cost for a Postgres FTS search API" }
],
"session_id": "abc123"
}
Each request costs 1 hint credit. Cached hints are served from the answer cache; cache misses decompose into sub-plans and generate only the missing pieces.
Pre-warm session state so future /v1/hint calls are context-aware. Send your working directory, recent conversation, files in scope, and errors — the hint endpoint pulls this automatically when session_id matches.
| Field | Type | Required | Description |
|---|---|---|---|
| session_id | string | Yes | Session identifier (same one you pass to /v1/hint) |
| cwd | string | No | Working directory path |
| recent_messages | string[] | No | Last 3–5 user messages from the conversation |
| files_mentioned | string[] | No | File paths discussed or edited in the session |
| errors | string[] | No | Recent error messages or stack traces |
| stack | string | No | Tech stack (e.g. "node express postgres") |
curl -X POST "https://api.tomsindex.com/v1/session/context" \ -H "Authorization: Bearer srch_your_key" \ -H "Content-Type: application/json" \ -d '{ "session_id": "sess_abc123", "cwd": "/Users/me/myproject", "files_mentioned": ["src/auth.js", "src/db.js"], "errors": ["TypeError: Cannot read property id of undefined"], "stack": "node express postgres" }'
{ "ok": true, "session_id": "sess_abc123" }
Context is stored in memory for 1 hour. When /v1/hint is called with the same session_id, the stored context is injected into the LLM synthesis step — making hints reference your actual files and errors instead of giving generic advice. Context is not embedded or cached — it only affects the current session's hint quality.
Free. Session context updates are not billed.
Crawl any URL and get back clean markdown, metadata, links, and media. Powered by a headless browser — handles JavaScript-rendered pages, SPAs, and dynamic content.
| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The URL to extract content from |
| css_selector | string | No | Extract only content matching this CSS selector (e.g. "article", ".main-content") |
| headers | object | No | Custom HTTP headers sent to the target page |
curl -X POST "https://api.tomsindex.com/v1/extract" \ -H "Authorization: Bearer srch_your_key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "css_selector": "article" }'
{
"url": "https://example.com",
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"raw_markdown": "# Example Domain\n\n...",
"metadata": {
"title": "Example Domain",
"description": "",
"language": null,
"statusCode": 200,
"url": "https://example.com"
},
"links": [{ "href": "https://www.iana.org/domains/example", "text": "More information...", "type": "external" }],
"media": [],
"took_ms": 2134
}
import requests def extract(url, css_selector=None): r = requests.post( "https://api.tomsindex.com/v1/extract", headers={"Authorization": "Bearer srch_..."}, json={"url": url, "css_selector": css_selector}, ) data = r.json() return data["markdown"]
const res = await fetch("https://api.tomsindex.com/v1/extract", { method: "POST", headers: { "Authorization": "Bearer srch_...", "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com" }), }); const { markdown, metadata } = await res.json();
Each extract call costs 1 search credit. Cached pages are served from the crawl cache at no extra cost unless bypass_cache: true.
OpenAI-compatible web_search tool endpoint — drop into LiteLLM, LangChain, OpenRouter, or raw OpenAI function calling.
{
"query": "aws lambda pricing",
"limit": 5
}
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web using TomsIndex",
"parameters": {
"type": "object",
"properties": { "query": { "type": "string" } },
"required": ["query"]
}
}
}
import requests def web_search(query, limit=5): r = requests.post( "https://api.tomsindex.com/v1/tools/web_search", headers={"Authorization": "Bearer srch_..."}, json={"query": query, "limit": limit}, ) return r.json()["results"]
TomsIndex ships as an MCP server exposing tomsindex_search, tomsindex_ask, and tomsindex_hint tools. Session context is automatically sent via the UserPromptSubmit hook — no manual setup needed.
npx tomsindex
{
"mcpServers": {
"tomsindex": {
"command": "npx",
"args": ["tomsindex"],
"env": { "TOMSINDEX_API_KEY": "srch_..." }
}
}
}
All errors return JSON: { "error": "<message>" }.
| Status | Meaning |
|---|---|
| 400 | Missing or malformed parameters |
| 401 | Missing or invalid API key |
| 429 | Rate limit hit — back off and honor Retry-After |
| 500 | Server error — retry with exponential backoff |
| Endpoint | Free | Pro | Scale |
|---|---|---|---|
GET /v1/answer | 1k/mo | 100k/mo | 1M/mo |
GET /v1/search | 10k/mo | 500k/mo | 5M/mo |
POST /v1/tools/web_search | 10k/mo | 500k/mo | 5M/mo |
POST /v1/extract | 10/min | 60/min | 120/min |
POST /v1/hint | 30/min | 30/min | 60/min |