Use smaller models like GPT-mini with cached answers, web search, clean page extraction, and reusable coding hints.
npx tomsindex
Instantly adds search, cached answers, extraction, and coding hints to Claude Code, Codex CLI, and 20+ MCP clients. Free.
| Haiku alone | TomsIndex Cache | |
|---|---|---|
| Latency | 1,200ms | ~100ms |
| Cost per query | $0.003 | Free (cache hit) |
| Citations | None | 3 sources |
| Context | Model-only | Cached + cited |
| Improves over time | No | Yes |
Search 10M+ indexed pages. Hybrid BM25 + vector ranking. Drop it into any agent as a tool call.
Ask a question. If it's been answered before, you get it in <0.1s. If not, a frontier model answers it once and we cache it for everyone.
Turn any URL into clean markdown with metadata, links, and media. Built for JavaScript-rendered pages and agent ingestion.
A stronger model can produce a concise coding hint once. The result is cached. Your small model retrieves that hint instead of rediscovering the same approach.
We test against HotPotQA and SWE-bench to measure when cached retrieval and hints help small models close the gap.
View benchmarks →Use it through REST or MCP.
curl "https://tomsindex.com/v1/search?q=things%20to%20do%20in%20boston" \ -H "Authorization: Bearer srch_your_key"
curl "https://tomsindex.com/v1/answer?q=how%20to%20paginate%20pgvector" \ -H "Authorization: Bearer srch_your_key"
// claude_desktop_config.json { "mcpServers": { "tomsindex": { "command": "npx", "args": ["tomsindex"], "env": { "TOMSINDEX_API_KEY": "srch_..." } } } }
curl -X POST "https://tomsindex.com/v1/hint" \ -H "Authorization: Bearer srch_your_key" \ -H "Content-Type: application/json" \ -d '{"q": "add cursor pagination to pgvector search"}'
curl -X POST "https://tomsindex.com/v1/extract" \ -H "Authorization: Bearer srch_your_key" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "css_selector": "article"}'
{
"results": [
{
"title": "Things to Do in Boston | Attractions, Tours …",
"link": "https://www.meetboston.com/things-to-do/",
"snippet": "Step into history on the Freedom Trail, grab a lobster roll …"
},
],
"meta": { "took_ms": 45, "total": 10 }
}Every new question adds to the cache. The more people use it, the faster every model gets.
Full hybrid BM25 + vector search across 10M+ pages. One key, two tools — ask or search.
Drop-in tool for Claude Desktop and any MCP-compatible agent. One config, instant setup.
Works with Llama, Mistral, GPT, Claude — any model that can call an API.
Free tier. No credit card. API key in 30 seconds.