Agentic RAG: Letting Agents Decide What to Retrieve

The Single-Shot Problem

Traditional RAG is a one-shot game. User asks a question. System converts it to a query. System retrieves top-k chunks. System generates an answer from those chunks.

One shot. No feedback loop. No self-correction.

This works for simple factual questions. "What's our return policy?" Retrieve, read, answer. Done.

It falls apart for anything complex. "Compare our Q3 2025 performance across all regions and identify which products underperformed relative to their targets." That requires multiple retrievals across different document types, cross-referencing results, identifying gaps, and synthesizing information from potentially dozens of chunks.

No single query captures all of that. And if your first retrieval misses something, there's no mechanism to try again.

Agentic RAG solves this by putting an agent in the retrieval loop.

What Agentic RAG Actually Means

Instead of a fixed retrieve-then-generate pipeline, you have an agent that can:

Analyze the user's question and decompose it into sub-questions
Plan which searches to run and in what order
Retrieve using different strategies (semantic search, keyword search, metadata filters)
Evaluate whether the retrieved context is sufficient
Reformulate queries when results are inadequate
Synthesize across multiple retrieval rounds It is worth reading about hybrid search backends alongside this.

The agent has retrieval as a tool, not as a fixed pipeline step. It decides when to search, what to search for, and whether to search again.

The Architecture

from langchain.agents import AgentExecutor
from langchain.tools import Tool

# Define retrieval tools the agent can use
tools = [
    Tool(
        name="semantic_search",
        description="Search the knowledge base using natural language. "
                    "Best for conceptual questions and finding related content.",
        func=lambda q: vector_store.similarity_search(q, k=5),
    ),
    Tool(
        name="keyword_search",
        description="Search using exact keywords. Best for finding specific "
                    "documents, codes, names, or identifiers.",
        func=lambda q: bm25_index.search(q, k=5),
    ),
    Tool(
        name="metadata_filter",
        description="Filter documents by metadata. Accepts JSON with fields: "
                    "department, document_type, date_range, author.",
        func=lambda q: metadata_store.filter(json.loads(q)),
    ),
    Tool(
        name="lookup_document",
        description="Retrieve a specific document by ID. Use when you know "
                    "which document you need.",
        func=lambda doc_id: document_store.get(doc_id),
    ),
]

agent = AgentExecutor(
    agent=create_react_agent(llm, tools, prompt),
    tools=tools,
    max_iterations=10,  # prevent infinite loops
    verbose=True,
)

The agent doesn't just have one "search" tool. It has multiple retrieval strategies and can choose the right one for each sub-question.

Query Decomposition: The First Win

The simplest form of agentic RAG is query decomposition. Break a complex question into simpler ones, retrieve for each, combine.

DECOMPOSE_PROMPT = """
Given this complex question, break it down into simpler sub-questions
that can each be answered independently from a knowledge base.

Question: {question}

Sub-questions (one per line):
"""

async def agentic_retrieve(question: str, retriever, llm):
    # Step 1: Decompose
    sub_questions = await llm.generate(
        DECOMPOSE_PROMPT.format(question=question)
    )

    # Step 2: Retrieve for each sub-question
    all_contexts = []
    for sub_q in sub_questions.split("\n"):
        results = await retriever.search(sub_q.strip(), top_k=3)
        all_contexts.extend(results)

    # Step 3: Deduplicate and rank
    unique_contexts = deduplicate(all_contexts)

    # Step 4: Generate answer from combined context
    return await llm.generate(
        ANSWER_PROMPT.format(
            question=question,
            context=format_contexts(unique_contexts)
        )
    )

"Compare Q3 performance across regions" becomes: "What was Q3 2025 performance in EMEA?", "What was Q3 2025 performance in North America?", "What was Q3 2025 performance in APAC?", "What were the Q3 2025 targets per region?"

Each sub-question retrieves relevant chunks. The combined context gives the LLM what it needs to actually compare.

Self-Evaluation: Knowing When Retrieval Failed

The real power of agentic RAG is the feedback loop. The agent evaluates its own retrieval results.

EVALUATION_PROMPT = """
Given the original question and the retrieved context, evaluate:
1. Does the context contain enough information to answer the question?
2. What information is missing?
3. What additional searches would help?

Question: {question}
Retrieved Context: {context}

Evaluation:
"""

async def retrieval_with_evaluation(question, retriever, llm, max_rounds=3):
    all_context = []

    for round_num in range(max_rounds):
        if round_num == 0:
            query = question
        else:
            # Use evaluation to generate a better query
            eval_result = await llm.generate(
                EVALUATION_PROMPT.format(
                    question=question,
                    context=format_contexts(all_context),
                )
            )
            if "sufficient" in eval_result.lower():
                break
            query = extract_follow_up_query(eval_result)

        results = await retriever.search(query, top_k=5)
        all_context.extend(results)

    return deduplicate(all_context)

Round 1: "What's our data retention policy for EU customers?" Retrieves general data retention policy. Agent evaluates: "I found the general policy but nothing EU-specific. Need to search for GDPR-related retention requirements."

Round 2: Searches for "GDPR data retention requirements." Finds the EU-specific addendum. Agent evaluates: "Now I have both general policy and EU-specific requirements. Sufficient."

Without self-evaluation, the system would have answered with just the general policy, missing the EU-specific nuance that was the whole point of the question.

Tool Selection: Right Strategy for Each Sub-Problem

Different sub-questions need different retrieval approaches.

ROUTING_PROMPT = """
For each sub-question, select the best retrieval strategy:

1. semantic_search - for conceptual questions, "how does X work?"
2. keyword_search - for specific terms, codes, names, identifiers
3. metadata_filter - for filtering by date, department, type
4. lookup_document - when you know the specific document The related post on [agent memory patterns](/blog/agent-memory-patterns) goes further on this point.

Sub-question: {sub_question}

Best strategy and query:
"""

"Who approved the budget change in Q3?" doesn't need semantic search. It needs a metadata filter on document type (approval/decision), date range (Q3), and maybe a keyword search for "budget."

"What's our general approach to cost optimization?" needs semantic search, not keywords.

The agent makes this decision per sub-question, which is exactly what a human researcher would do.

The Routing Pattern

For organizations with multiple knowledge bases (common in enterprise), the agent also decides WHERE to search.

tools = [
    Tool(name="search_policies", func=policy_store.search,
         description="Company policies, SOPs, compliance docs"),
    Tool(name="search_technical", func=tech_store.search,
         description="Technical documentation, architecture, API docs"),
    Tool(name="search_financial", func=finance_store.search,
         description="Financial reports, budgets, forecasts"),
    Tool(name="search_hr", func=hr_store.search,
         description="HR policies, benefits, org structure"),
    Tool(name="search_projects", func=project_store.search,
         description="Project plans, status updates, meeting notes"),
]

"Can an intern on Project Phoenix access the production database?" requires searching HR (intern policies), technical (database access control), and project docs (Project Phoenix team structure). The agent routes to all three.

Guardrails: Preventing Infinite Loops

Agentic RAG can go wrong. The agent can loop endlessly, reformulating queries that never find what it's looking for (because the information doesn't exist). It can run up LLM costs with excessive iterations. It can hallucinate that it needs more information when it already has enough.

Essential guardrails:

class AgenticRAGConfig:
    max_iterations: int = 5          # hard stop on retrieval rounds
    max_tool_calls: int = 15         # total tool invocations
    timeout_seconds: int = 30        # wall clock limit
    min_confidence_to_answer: float = 0.7  # below this, say "I don't know"
    cost_budget_usd: float = 0.50    # per query cost cap

The "I don't know" path is critical. If after 5 rounds of retrieval the agent still can't find sufficient context, it should say so explicitly rather than generating a low-confidence answer.

When to Use Agentic RAG

Not every query needs an agent. Simple factual lookups ("What's the office WiFi password?") don't benefit from query decomposition and multi-round retrieval. They just get slower and more expensive.

Use agentic RAG for: Complex multi-part questions. Comparative questions. Questions that span multiple document types or knowledge bases. Research-style queries where the user expects a synthesized answer. For a deeper look, see reranking retrieved results.

Use standard RAG for: Simple factual lookups. Single-topic questions. High-volume, low-complexity query patterns.

A practical approach: route queries based on complexity. Simple queries go through the fast path (single retrieval, single generation). Complex queries go through the agentic path.

The best RAG systems aren't uniform. They adapt their retrieval strategy to the question. Just like a good researcher would.