Why CandleKeep Doesn't Use RAG — And Why That's the Point
Everyone assumed CandleKeep uses RAG. It doesn't. Here's why agentic search — the same approach Anthropic chose for Claude Code — produces fundamentally better results for book-length content.

A story of why Candlekeep does not use RAGs
Six months ago, I was staring at my bookshelf — 50 books on design patterns, product management, UX, software architecture — and I wanted my AI agents to access all of it. Not vague recollections from training data, but the actual content. The specific frameworks. The exact examples that shaped how I think and build.
The obvious approach was RAG — Retrieval Augmented Generation. Chunk the books, generate embeddings, build a vector index, run semantic similarity searches. That's what everyone does. The tutorials are everywhere. The infrastructure is mature.
I didn't do it.
I have a methodology I call HDD — Human Driven Development. The core idea: before you build a tool for an agent, watch how a human expert does the same thing. Don't start with architecture diagrams. Start with observation.
So I asked myself: how do I actually research something in a book?
I don't scan every word. I don't build a mental index of every sentence. I walk to the shelf, look at the spines, pick up a promising book, flip to the table of contents, jump to the relevant chapter, read a few pages, and if it's not what I need — I grab another book.
It's an active process of exploration, not passive retrieval.
That distinction turned out to be everything.
Instead of indexes and embeddings, CandleKeep gives agents the same tools I have as a reader:
- Browse the shelf — list all books with titles, authors, descriptions
- Check the table of contents — see structure before reading anything
- Read specific pages — request exactly what you need (
read book_id:12-18) - Follow threads — if one book references a concept, check another for depth
The agent decides what to read next based on what it's already found. Each reading decision is informed by the previous one. The LLM's reasoning loop is the retrieval algorithm.
Here's a concrete example of why this matters. Say you ask the agent which design pattern fits a distributed system. It opens a book, reads the chapter on Observer Pattern, and midway through finds a sentence: "For distributed systems with scale requirements, see the Event Sourcing Pattern in Chapter 12."
An agent that reads like a human jumps to Chapter 12, reads it, and returns an answer that synthesizes both patterns — with page references for each. A RAG system? It returns the chunk about Observer and stops. The cross-reference to Chapter 12 is just more text inside the chunk. The system has no mechanism to follow it.
This is the fundamental difference. RAG retrieves fragments. Agentic search reads, reasons, and follows threads.
After my last LinkedIn post about running a D&D campaign with CandleKeep, four people independently asked about the retrieval mechanism:
- One asked about Needle in a Haystack
- One asked if it's Graph RAG
- One asked about special indexing
- One asked about Elasticsearch-style search
They all assumed there was a sophisticated retrieval pipeline underneath. There isn't. And there's a very good reason for that.
This isn't just my contrarian take. Boris Cherny, one of the engineers behind Claude Code at Anthropic, shared that early versions of Claude Code used RAG with a local vector database — and they quickly discovered that agentic search works better.
Their reasons map almost exactly to what I found building CandleKeep:
Staleness. Indexes go stale the moment content changes. New books get added, existing ones get updated, and your carefully built embeddings are immediately out of date. With agentic search, the agent always reads the current version.
Reliability. When you need a specific rule from the D&D Player's Handbook, you need the exact text — not the three chunks with the highest cosine similarity. Fuzzy matching produces fuzzy answers. Agents that read the actual pages produce precise ones.
Simplicity. No vector database to maintain. No embedding pipeline to run. No chunking strategy to optimize. CandleKeep's API is: list books, read TOC, read pages. The complexity lives in the model's reasoning, not in external infrastructure.
Security and Privacy. No content gets sent to external embedding services. Everything stays between the agent and the library.
Here's the part most people miss.
RAG was introduced as a concept in 2020. By 2022 it was the industry standard for grounding AI in external knowledge. Vector databases — Pinecone, Weaviate, Chroma, FAISS — became essential infrastructure. The entire ecosystem of chunking strategies, embedding models, and retrieval pipelines grew into a multi-billion dollar industry.
All of it was built around a single constraint: language models couldn't use tools.
If your model can only process text placed in front of it, someone has to do the finding. You need a retrieval system — something to search, rank, and select the right chunks before the model ever sees them. RAG was an elegant solution to a real limitation.
But that limitation no longer exists.
Starting in 2023, models gained the ability to call functions and use tools. By 2025, tool use was reliable enough for production systems. In 2026, autonomous tool use is the standard operating mode for AI agents.
RAG was the best answer to "how do we get relevant information to a model that can't search for itself." Agentic search is the answer to "what if the model can search for itself?"
Needle in a Haystack — The agent reads the table of contents, identifies the most promising chapter, reads it, and if the answer isn't there, expands the search. It's exactly how you'd find a specific rule in a 300-page rulebook. You don't read all 300 pages. You navigate.
Graph RAG — You don't need a graph when content has natural structure. Books are already organized into parts, chapters, sections, and pages. Why build an artificial knowledge graph on top of an organizational structure that already exists?
Index type — The agent is the index. It reasons about what to read based on titles, TOCs, and what it's already found. The "index" is rebuilt from scratch for every query, perfectly tailored to the question being asked.
Context window management — The agent reads only what it needs. Five pages from one book, three from another. The context window stays lean because the agent is selective — not because a retrieval pipeline pre-filtered the content.
Agentic search isn't free. It costs more tokens per query than a pre-built index lookup. The agent might need 3-4 rounds of reading before finding what it needs, each round consuming context.
For narrow, repetitive queries against a stable corpus — the same FAQ answering the same 50 questions — a traditional index is probably more efficient.
But for the use case CandleKeep was built for — agents doing genuine research across diverse books, following threads between sources, building understanding iteratively — agentic search produces fundamentally better results. Because it's not just finding text. It's reading.
Stop thinking about retrieval as a search problem.
Think about it as a reading problem.
Humans don't index their books. They browse, skim tables of contents, read relevant chapters, and follow threads across sources. They build understanding iteratively. Each thing they read informs what they read next.
That's exactly what agentic search does. And that's why CandleKeep was built this way.
- HDD: Human-Driven Development — The methodology behind observing human experts before building agent tools
- Contextual Retrieval — Anthropic's research on improving RAG
- Why Claude Code Dropped Vector DB-Based RAG — Analysis of Anthropic's architectural decision
- Agentic RAG: A Survey — Academic survey on the shift from passive retrieval to agentic approaches