TomsIndex

API Documentation

Getting Started

Generate an API key, then:

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector&caller_model=claude-haiku-4-5" \
  -H "Authorization: Bearer srch_your_key"

Authentication

Bearer token in the Authorization header. API keys are prefixed with srch_.

HEADER Authorization: Bearer srch_your_api_key

Answer Cache

Look up (or generate) an AI-generated answer from the semantic cache. Each cached answer is tagged with the model that produced it and a quality tier 1–4; callers filter by min_model_tier to demand a quality floor.

By default, this endpoint is lookup-only (free). Set mode=generate to search and generate an answer on cache miss — this costs 1 search credit and the result is cached for future free lookups.

GET /v1/answer

Parameters

ParameterTypeRequiredDescription
qstringYesThe question to look up
modestringNolookup (default): cache-only, returns null on miss. generate: on cache miss, runs a web search, summarizes the top results, caches the answer, and returns it. Costs 1 search credit. Requires authentication. If auth or search quota is insufficient, silently falls back to lookup behavior.
caller_modelstringNoYour calling model (e.g. claude-haiku-4-5). Sets min_model_tier automatically to max(callerTier, 3) so you never get answers at-or-below your own tier.
min_model_tierinteger 1–4NoQuality floor. 4=frontier (Opus, GPT-5), 3=strong (Sonnet, GPT-4o), 2=fast/cheap (Haiku, GPT-4o-mini), 1=tiny/local. Default: 1 (no floor) unless caller_model is set.
min_similarityfloat 0–1NoCosine similarity required for a hit (default 0.92).
rankstringNotier-then-similarity (default): among answers that clear min_similarity, return the one from the highest-tier model. similarity: closest match wins regardless of tier.
alternativesbooleanNoReturn up to 2 secondary candidates alongside the primary answer.

Example (lookup)

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results" \
  -H "X-API-Key: srch_your_key"

Example (generate on miss)

curl "https://api.tomsindex.com/v1/answer?q=how+to+paginate+pgvector+results&mode=generate" \
  -H "X-API-Key: srch_your_key"

Response (cache hit)

{
  "query": "how to paginate pgvector results",
  "cache_hit": true,
  "answer": {
    "text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
    "model_used": "claude-opus-4-7",
    "model_tier": 4,
    "confidence": 0.97,
    "similarity": 0.984,
    "cached_at": "2026-04-12T08:13:22Z",
    "hit_count": 142,
    "sources": [{ "title": "pgvector docs", "url": "https://..." }]
  },
  "alternatives": [],
  "meta": { "mode": "lookup", "generated": false, "min_model_tier": 3, "best_similarity": 0.984, "took_ms": 38 }
}

Response (miss, lookup mode)

{
  "query": "...",
  "cache_hit": false,
  "answer": null,
  "alternatives": [],
  "meta": { "mode": "lookup", "generated": false, "best_similarity": 0.71, "took_ms": 41 }
}

Response (miss, generate mode)

{
  "query": "how to paginate pgvector results",
  "cache_hit": false,
  "answer": {
    "text": "Use LIMIT and OFFSET with ORDER BY embedding <=> $1...",
    "model_used": "claude-opus-4-7",
    "model_tier": 4,
    "confidence": 0.85,
    "similarity": 1.0,
    "cached_at": "2026-05-09T20:30:00Z",
    "hit_count": 0,
    "sources": [{ "title": "pgvector docs", "url": "https://..." }]
  },
  "alternatives": [],
  "meta": { "mode": "generate", "generated": true, "best_similarity": 0.71, "took_ms": 3200 }
}

Billing

Cache lookups are free. mode=generate costs 1 search credit (only on cache miss). If you don't have enough search credits, the endpoint silently falls back to lookup-only behavior — no error, just answer: null.

Hint

Get one actionable hint plus recommended follow-up questions. Complex queries are decomposed into reusable sub-plans — each piece is cached individually so future users benefit from partial matches.

POST /v1/hint

Request body

FieldTypeRequiredDescription
qstringYesThe question or task
contextstringNoCode context (source snippets, file paths). Used by the LLM to give specific guidance but NOT embedded or cached — keeps hints reusable across callers.
session_idstringNoSession ID. If session context was previously sent via /v1/session/context, the hint will use it to give specific guidance (e.g. referencing your files, recent errors).

Example

curl -X POST "https://api.tomsindex.com/v1/hint" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{ "q": "Plan to build a search engine API using Postgres FTS" }'

Response

{
  "hint": "Start with a small crawler → indexer → search API → ranking loop before adding billing. Cache and reuse existing plans when possible.",
  "recommended_follow_up": [
    { "label": "Get crawler plan", "q": "Plan to build a web crawler for a search API" },
    { "label": "Find risks", "q": "What are the biggest risks in building a search engine API?" },
    { "label": "Estimate cost", "q": "Estimate monthly cost for a Postgres FTS search API" }
  ],
  "session_id": "abc123"
}

Billing

Each request costs 1 hint credit. Cached hints are served from the answer cache; cache misses decompose into sub-plans and generate only the missing pieces.

Session Context

Pre-warm session state so future /v1/hint calls are context-aware. Send your working directory, recent conversation, files in scope, and errors — the hint endpoint pulls this automatically when session_id matches.

POST /v1/session/context

Request body

FieldTypeRequiredDescription
session_idstringYesSession identifier (same one you pass to /v1/hint)
cwdstringNoWorking directory path
recent_messagesstring[]NoLast 3–5 user messages from the conversation
files_mentionedstring[]NoFile paths discussed or edited in the session
errorsstring[]NoRecent error messages or stack traces
stackstringNoTech stack (e.g. "node express postgres")

Example

curl -X POST "https://api.tomsindex.com/v1/session/context" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "sess_abc123",
    "cwd": "/Users/me/myproject",
    "files_mentioned": ["src/auth.js", "src/db.js"],
    "errors": ["TypeError: Cannot read property id of undefined"],
    "stack": "node express postgres"
  }'

Response

{ "ok": true, "session_id": "sess_abc123" }

How it works

Context is stored in memory for 1 hour. When /v1/hint is called with the same session_id, the stored context is injected into the LLM synthesis step — making hints reference your actual files and errors instead of giving generic advice. Context is not embedded or cached — it only affects the current session's hint quality.

Billing

Free. Session context updates are not billed.

Extract

Crawl any URL and get back clean markdown, metadata, links, and media. Powered by a headless browser — handles JavaScript-rendered pages, SPAs, and dynamic content.

POST /v1/extract

Request body

FieldTypeRequiredDescription
urlstringYesThe URL to extract content from
css_selectorstringNoExtract only content matching this CSS selector (e.g. "article", ".main-content")
headersobjectNoCustom HTTP headers sent to the target page

Example

curl -X POST "https://api.tomsindex.com/v1/extract" \
  -H "Authorization: Bearer srch_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "css_selector": "article"
  }'

Response

{
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "raw_markdown": "# Example Domain\n\n...",
  "metadata": {
    "title": "Example Domain",
    "description": "",
    "language": null,
    "statusCode": 200,
    "url": "https://example.com"
  },
  "links": [{ "href": "https://www.iana.org/domains/example", "text": "More information...", "type": "external" }],
  "media": [],
  "took_ms": 2134
}

Python

import requests

def extract(url, css_selector=None):
    r = requests.post(
        "https://api.tomsindex.com/v1/extract",
        headers={"Authorization": "Bearer srch_..."},
        json={"url": url, "css_selector": css_selector},
    )
    data = r.json()
    return data["markdown"]

Node.js

const res = await fetch("https://api.tomsindex.com/v1/extract", {
  method: "POST",
  headers: {
    "Authorization": "Bearer srch_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com" }),
});
const { markdown, metadata } = await res.json();

Billing

Each extract call costs 1 search credit. Cached pages are served from the crawl cache at no extra cost unless bypass_cache: true.

Tool Integration

OpenAI-compatible web_search tool endpoint — drop into LiteLLM, LangChain, OpenRouter, or raw OpenAI function calling.

POST /v1/tools/web_search

Request body

{
  "query": "aws lambda pricing",
  "limit": 5
}

OpenAI function definition

{
  "type": "function",
  "function": {
    "name": "web_search",
    "description": "Search the web using TomsIndex",
    "parameters": {
      "type": "object",
      "properties": { "query": { "type": "string" } },
      "required": ["query"]
    }
  }
}

Python

import requests

def web_search(query, limit=5):
    r = requests.post(
        "https://api.tomsindex.com/v1/tools/web_search",
        headers={"Authorization": "Bearer srch_..."},
        json={"query": query, "limit": limit},
    )
    return r.json()["results"]

MCP Server

TomsIndex ships as an MCP server exposing tomsindex_search, tomsindex_ask, and tomsindex_hint tools. Session context is automatically sent via the UserPromptSubmit hook — no manual setup needed.

Claude Code / Codex CLI

npx tomsindex

Manual config (Claude Desktop, Cursor, etc.)

{
  "mcpServers": {
    "tomsindex": {
      "command": "npx",
      "args": ["tomsindex"],
      "env": { "TOMSINDEX_API_KEY": "srch_..." }
    }
  }
}

Errors

All errors return JSON: { "error": "<message>" }.

StatusMeaning
400Missing or malformed parameters
401Missing or invalid API key
429Rate limit hit — back off and honor Retry-After
500Server error — retry with exponential backoff

Rate Limits

EndpointFreeProScale
GET /v1/answer1k/mo100k/mo1M/mo
GET /v1/search10k/mo500k/mo5M/mo
POST /v1/tools/web_search10k/mo500k/mo5M/mo
POST /v1/extract10/min60/min120/min
POST /v1/hint30/min30/min60/min