TomsIndex
The knowledge layer for AI agents

10x less spend.
Without dumbing them down.

Use smaller models like GPT-mini with cached answers, web search, clean page extraction, and reusable coding hints.

Try it

install for free

npx tomsindex

Instantly adds search, cached answers, extraction, and coding hints to Claude Code, Codex CLI, and 20+ MCP clients. Free.

<0.1s
Cached answer response
10M+
Pages indexed
Free
On cache hit

See the difference.

Haiku alone TomsIndex Cache
Latency 1,200ms ~100ms
Cost per query $0.003 Free (cache hit)
Citations None 3 sources
Context Model-only Cached + cited
Improves over time No Yes

Four tools. One API key.

1

Web Search

Search 10M+ indexed pages. Hybrid BM25 + vector ranking. Drop it into any agent as a tool call.

2

Cached Answers

Ask a question. If it's been answered before, you get it in <0.1s. If not, a frontier model answers it once and we cache it for everyone.

3

Extract

Turn any URL into clean markdown with metadata, links, and media. Built for JavaScript-rendered pages and agent ingestion.

4

Hints

A stronger model can produce a concise coding hint once. The result is cached. Your small model retrieves that hint instead of rediscovering the same approach.

/benchmarks

Proof is in the results.

We test against HotPotQA and SWE-bench to measure when cached retrieval and hints help small models close the gap.

View benchmarks →
Haiku alone
42%
GPT-4o
78%
Haiku + TomsIndex
81%

One question. Instant answer.

Use it through REST or MCP.

Get API Key
JSON Response
{
  "results": [
    {
      "title": "Things to Do in Boston | Attractions, Tours …",
      "link":  "https://www.meetboston.com/things-to-do/",
      "snippet": "Step into history on the Freedom Trail, grab a lobster roll …"
    },
    
+ 9 more results { "title": "THE 15 BEST Things to Do in Boston (2026)", "link": "https://www.tripadvisor.com/Attractions-g60745-…", "snippet": "Top attractions, museums, tours, and nightlife …" }, // … 8 more
], "meta": { "took_ms": 45, "total": 10 } }

Why teams switch.

Gets Smarter Over Time

Every new question adds to the cache. The more people use it, the faster every model gets.

Web Search Built In

Full hybrid BM25 + vector search across 10M+ pages. One key, two tools — ask or search.

MCP Native

Drop-in tool for Claude Desktop and any MCP-compatible agent. One config, instant setup.

Model-Agnostic

Works with Llama, Mistral, GPT, Claude — any model that can call an API.

Start building.

Free tier. No credit card. API key in 30 seconds.