API Documentation - TomsIndex

Getting Started

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector&caller_model=claude-haiku-4-5" \
  -H "Authorization: Bearer srch_your_key"

Authentication

Bearer token in the Authorization header. API keys are prefixed with srch_.

HEADER Authorization: Bearer srch_your_api_key

Answer Cache

Look up (or generate) an AI-generated answer from the semantic cache. Each cached answer is tagged with the model that produced it and a quality tier 1–4; callers filter by min_model_tier to demand a quality floor.

By default, this endpoint is lookup-only (free). Set mode=generate to search and generate an answer on cache miss — this costs 1 search credit and the result is cached for future free lookups.

GET /v1/answer

Parameters

Parameter	Type	Required	Description
q	string	Yes	The question to look up
mode	string	No	`lookup` (default): cache-only, returns `null` on miss. `generate`: on cache miss, runs a web search, summarizes the top results, caches the answer, and returns it. Costs 1 search credit. Requires authentication. If auth or search quota is insufficient, silently falls back to lookup behavior.
caller_model	string	No	Your calling model (e.g. `claude-haiku-4-5`). Sets `min_model_tier` automatically to `max(callerTier, 3)` so you never get answers at-or-below your own tier.
min_model_tier	integer 1–4	No	Quality floor. 4=frontier (Opus, GPT-5), 3=strong (Sonnet, GPT-4o), 2=fast/cheap (Haiku, GPT-4o-mini), 1=tiny/local. Default: `1` (no floor) unless `caller_model` is set.
min_similarity	float 0–1	No	Cosine similarity required for a hit (default 0.92).
rank	string	No	`tier-then-similarity` (default): among answers that clear `min_similarity`, return the one from the highest-tier model. `similarity`: closest match wins regardless of tier.
alternatives	boolean	No	Return up to 2 secondary candidates alongside the primary answer.

Example (lookup)

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results" \
  -H "X-API-Key: srch_your_key"

Example (generate on miss)

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results&mode=generate" \
  -H "X-API-Key: srch_your_key"

Response (cache hit)

{
  "query": "how to paginate pgvector results",
  "cache_hit": true,
  "answer": {
    "text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
    "model_used": "claude-opus-4-7",
    "model_tier": 4,
    "confidence": 0.97,
    "similarity": 0.984,
    "cached_at": "2026-04-12T08:13:22Z",
    "hit_count": 142,
    "sources": [{ "title": "pgvector docs", "url": "https://..." }]
  },
  "alternatives": [],
  "meta": { "mode": "lookup", "generated": false, "min_model_tier": 3, "best_similarity": 0.984, "took_ms": 38 }
}

Response (miss, lookup mode)

{
  "query": "...",
  "cache_hit": false,
  "answer": null,
  "alternatives": [],
  "meta": { "mode": "lookup", "generated": false, "best_similarity": 0.71, "took_ms": 41 }
}

Response (miss, generate mode)

{
  "query": "how to paginate pgvector results",
  "cache_hit": false,
  "answer": {
    "text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
    "model_used": "claude-opus-4-7",
    "model_tier": 4,
    "confidence": 0.85,
    "similarity": 1.0,
    "cached_at": "2026-05-09T20:30:00Z",
    "hit_count": 0,
    "sources": [{ "title": "pgvector docs", "url": "https://..." }]
  },
  "alternatives": [],
  "meta": { "mode": "generate", "generated": true, "best_similarity": 0.71, "took_ms": 3200 }
}

Billing

Cache lookups are free. mode=generate costs 1 search credit (only on cache miss). If you don't have enough search credits, the endpoint silently falls back to lookup-only behavior — no error, just answer: null.

Web Search

Get ranked web results for a query. Pass a location for "near me" queries; results include news, places, and shopping when the query calls for them.

GET /v1/search

Parameters

Parameter	Type	Required	Description
q	string	Yes	Search query
limit	integer	No	Max results 1–20 (default 10)
near	string	No	Free-text location (e.g. "San Francisco")
include_answer	boolean	No	Generate or look up an LLM answer summarising the top results. Authenticated callers only — generation costs LLM tokens. Cache lookup runs first; a hit returns the cached answer for free. Generated answers are written to `answer_cache`, so subsequent `/v1/answer` calls for the same question return them with no extra cost.

Example

curl "https://api.tomsindex.com/v1/search?q=what+to+do+in+boston&include_answer=true" \
  -H "Authorization: Bearer srch_your_key"

Response

{
  "results": [{
    "title": "Actix Web Framework",
    "url": "https://actix.rs",
    "snippet": "Actix Web is a powerful...",
    "score": 0.95,
    "result_id": "uuid"
  }],
  "answer": {
    "text": "Actix Web and Axum are the leading Rust web frameworks...",
    "model_used": "claude-opus-4-7",
    "model_tier": 4,
    "sources": [{ "title": "Actix Web Framework", "url": "https://actix.rs" }],
    "generated": true
  },
  "meta": { "intent": "informational", "took_ms": 143 }
}

The answer object only appears when include_answer=true. generated: true means the answer was just produced and written to the cache; generated: false means it was a cache hit.

Hint

Get one actionable hint plus recommended follow-up questions. Complex queries are decomposed into reusable sub-plans — each piece is cached individually so future users benefit from partial matches.

POST /v1/hint

Request body

Field	Type	Required	Description
q	string	Yes	The question or task
context	string	No	Code context (source snippets, file paths). Used by the LLM to give specific guidance but NOT embedded or cached — keeps hints reusable across callers.
session_id	string	No	Session ID. If session context was previously sent via `/v1/session/context`, the hint will use it to give specific guidance (e.g. referencing your files, recent errors).

Example

curl -X POST "https://api.tomsindex.com/v1/hint" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{ "q": "Plan to build a search engine API using Postgres FTS" }'

Response

{
  "hint": "Start with a small crawler → indexer → search API → ranking loop before adding billing. Cache and reuse existing plans when possible.",
  "recommended_follow_up": [
    { "label": "Get crawler plan", "q": "Plan to build a web crawler for a search API" },
    { "label": "Find risks", "q": "What are the biggest risks in building a search engine API?" },
    { "label": "Estimate cost", "q": "Estimate monthly cost for a Postgres FTS search API" }
  ],
  "session_id": "abc123"
}

Billing

Each request costs 1 hint credit. Cached hints are served from the answer cache; cache misses decompose into sub-plans and generate only the missing pieces.

Session Context

Pre-warm session state so future /v1/hint calls are context-aware. Send your working directory, recent conversation, files in scope, and errors — the hint endpoint pulls this automatically when session_id matches.

POST /v1/session/context

Request body

Field	Type	Required	Description
session_id	string	Yes	Session identifier (same one you pass to `/v1/hint`)
cwd	string	No	Working directory path
recent_messages	string[]	No	Last 3–5 user messages from the conversation
files_mentioned	string[]	No	File paths discussed or edited in the session
errors	string[]	No	Recent error messages or stack traces
stack	string	No	Tech stack (e.g. `"node express postgres"`)

Example

curl -X POST "https://api.tomsindex.com/v1/session/context" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "sess_abc123",
    "cwd": "/Users/me/myproject",
    "files_mentioned": ["src/auth.js", "src/db.js"],
    "errors": ["TypeError: Cannot read property id of undefined"],
    "stack": "node express postgres"
  }'

Response

{ "ok": true, "session_id": "sess_abc123" }

How it works

Context is stored in memory for 1 hour. When /v1/hint is called with the same session_id, the stored context is injected into the LLM synthesis step — making hints reference your actual files and errors instead of giving generic advice. Context is not embedded or cached — it only affects the current session's hint quality.

Billing

Free. Session context updates are not billed.

Extract

Crawl any URL and get back clean markdown, metadata, links, and media. Powered by a headless browser — handles JavaScript-rendered pages, SPAs, and dynamic content.

POST /v1/extract

Request body

Field	Type	Required	Description
url	string	Yes	The URL to extract content from
css_selector	string	No	Extract only content matching this CSS selector (e.g. `"article"`, `".main-content"`)
headers	object	No	Custom HTTP headers sent to the target page

Example

curl -X POST "https://api.tomsindex.com/v1/extract" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "css_selector": "article"
  }'

Response

{
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "raw_markdown": "# Example Domain\n\n...",
  "metadata": {
    "title": "Example Domain",
    "description": "",
    "language": null,
    "statusCode": 200,
    "url": "https://example.com"
  },
  "links": [{ "href": "https://www.iana.org/domains/example", "text": "More information...", "type": "external" }],
  "media": [],
  "took_ms": 2134
}

Python

import requests

def extract(url, css_selector=None):
    r = requests.post(
        "https://api.tomsindex.com/v1/extract",
        headers={"Authorization": "Bearer srch_..."},
        json={"url": url, "css_selector": css_selector},
    )
    data = r.json()
    return data["markdown"]

Node.js

const res = await fetch("https://api.tomsindex.com/v1/extract", {
  method: "POST",
  headers: {
    "Authorization": "Bearer srch_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com" }),
});
const { markdown, metadata } = await res.json();

Billing

Each extract call costs 1 search credit. Cached pages are served from the crawl cache at no extra cost unless bypass_cache: true.

Tool Integration

OpenAI-compatible web_search tool endpoint — drop into LiteLLM, LangChain, OpenRouter, or raw OpenAI function calling.

POST /v1/tools/web_search

Request body

{
  "query": "aws lambda pricing",
  "limit": 5
}

OpenAI function definition

{
  "type": "function",
  "function": {
    "name": "web_search",
    "description": "Search the web using TomsIndex",
    "parameters": {
      "type": "object",
      "properties": { "query": { "type": "string" } },
      "required": ["query"]
    }
  }
}

Python

import requests

def web_search(query, limit=5):
    r = requests.post(
        "https://api.tomsindex.com/v1/tools/web_search",
        headers={"Authorization": "Bearer srch_..."},
        json={"query": query, "limit": limit},
    )
    return r.json()["results"]

MCP Server

TomsIndex ships as an MCP server exposing tomsindex_search, tomsindex_ask, and tomsindex_hint tools. Session context is automatically sent via the UserPromptSubmit hook — no manual setup needed.

Claude Code / Codex CLI

npx tomsindex

Manual config (Claude Desktop, Cursor, etc.)

{
  "mcpServers": {
    "tomsindex": {
      "command": "npx",
      "args": ["tomsindex"],
      "env": { "TOMSINDEX_API_KEY": "srch_..." }
    }
  }
}

Errors

All errors return JSON: { "error": "<message>" }.

Status	Meaning
400	Missing or malformed parameters
401	Missing or invalid API key
429	Rate limit hit — back off and honor `Retry-After`
500	Server error — retry with exponential backoff

Rate Limits

Endpoint	Free	Pro	Scale
`GET /v1/answer`	1k/mo	100k/mo	1M/mo
`GET /v1/search`	10k/mo	500k/mo	5M/mo
`POST /v1/tools/web_search`	10k/mo	500k/mo	5M/mo
`POST /v1/extract`	10/min	60/min	120/min
`POST /v1/hint`	30/min	30/min	60/min