🧠 Agentic RAG (`agentic-rag`)¶

Multi-source retrieval orchestrator that reasons across a Neo4j knowledge graph, an Azure AI Search hybrid index, and live web search, with optional SOTA reranking, citation-aware truncation, and a cost-efficient LLM-as-a-Judge evaluation gate.

Tools¶

Tool	Purpose
`graph_retrieval`	Run a Cypher template against Neo4j and project rows into chunks.
`graph_schema`	List node labels and relationship types so the agent can plan queries.
`verify_graph_plugins`	Sanity-check that APOC and GDS are active before running advanced templates.
`list_cypher_templates`	Surface the named Cypher templates shipped with the skill.
`enterprise_search`	Hybrid (BM25 + vector) search against Azure AI Search with optional OData filter and Semantic Ranker.
`web_intelligence`	Delegate to the configured `WebSearchProvider` for real-time external context.
`parallel_retrieval`	Fan out across every configured source via `asyncio.gather`, rerank the merged pool, and citation-cap the result.

Reasoning flow¶

        ┌──────────────────┐
        │  intent triage   │
        └────────┬─────────┘
                 │
   ┌─────────────┼─────────────┐
   ▼             ▼             ▼
graph_retrieval enterprise_search web_intelligence
   │             │             │
   └─────────────┼─────────────┘
                 ▼
       evaluate sufficiency
                 │
           insufficient?
                 │
                 ▼
        parallel_retrieval
                 │
                 ▼
   reranker (Qwen3 / Cohere)
                 │
                 ▼
   citation-aware truncation
                 │
                 ▼
            synthesise

Reranking layer¶

The RerankerProvider ABC lets callers attach a high-precision second-stage scorer that runs over the merged retrieval pool before truncation.

Provider	Model	Endpoint	Long-context
`Qwen3RerankerProvider`	`Qwen/Qwen3-Reranker-4B`	OpenAI-compatible inference (vLLM / TGI / Together AI)	32 768 tokens
`CohereRerankProvider`	`rerank-v3.5`	`https://api.cohere.com/v2/rerank`	Long-form documents
`NoOpRerankerProvider`	—	passthrough	n/a

Both production providers expose RerankerConfig.max_context_tokens (default 32 768) so long enterprise documents are ranked end-to-end without relevance loss from naive head-truncation.

Raw retrieval vs. reranked (illustrative)¶

The numbers below are reference values from internal benchmarking on the rag_golden_set.json cases; rerun the eval suite (see below) to refresh them against your environment.

Metric	Raw retrieval	+ Reranker	Δ
Faithfulness	0.78	0.91	+0.13
Answer Relevancy	0.82	0.93	+0.11
Contextual Precision	0.71	0.89	+0.18

Citation-aware truncation¶

truncate_chunks_to_budget reserves headroom for source attribution before packing chunk content:

chunk_budget = token_budget - (estimated_citation_tokens + citation_buffer)

estimate_citation_tokens sums the source label, identifier, and JSON serialisation of each chunk's metadata.extra, then divides by 4 to approximate tokens. The default safety buffer is 64 tokens to absorb structural overhead (commas, JSON braces, surrounding prose). The function returns an empty list when the citation overhead exhausts the budget so downstream synthesis never produces uncited claims.

Configuration¶

from mirai_shared_skills.agentic_rag import (
    AgenticRAGSkill,
    AzureSearchConfig,
    AzureSearchProvider,
    BrowserWebSearchProvider,
    CohereRerankProvider,
    Neo4jConnection,
    Neo4jGraphProvider,
    RerankerConfig,
)

azure = AzureSearchProvider(
    AzureSearchConfig(
        endpoint="https://acme.search.windows.net",
        index_name="docs",
        api_key="...",
        semantic_configuration="default",
    )
)
neo4j = Neo4jGraphProvider(
    Neo4jConnection(uri="bolt://neo4j:7687", user="neo4j", password="..."),
)
reranker = CohereRerankProvider(
    api_key="...",
    config=RerankerConfig(top_k=8, max_context_tokens=32_768),
)
skill = AgenticRAGSkill(
    neo4j=neo4j,
    azure=azure,
    web=BrowserWebSearchProvider(),
    reranker=reranker,
)

Local Neo4j setup (Docker)¶

docker-compose.test.yml ships a Neo4j Enterprise container pre-loaded with APOC and GDS:

docker compose -f docker-compose.test.yml up -d
# wait for the healthcheck to pass, then:
uv run pytest tests/integration -m integration

The integration suite auto-detects the live Bolt endpoint at bolt://localhost:7687 and skips when the container is offline. Override the endpoint with MIRAI_TEST_NEO4J_URI (e.g. when running against a remote sandbox).

The verify_graph_plugins tool uses apoc.help and gds.list calls to confirm both plugins are active before complex Cypher templates run, and returns {apoc, gds, ok, detail} so the agent can halt cleanly if the graph is misconfigured.

Evaluation — DeepEval + Flash judge¶

The mirai_shared_skills.agentic_rag.eval module ships an opinionated LLM-as-a-Judge pipeline tuned for CI economics:

JudgeLLM ABC plus GeminiFlashJudge (gemini-1.5-flash) and GPT4oMiniJudge (gpt-4o-mini) implementations. Both judges talk JSON over HTTP and return a {score, reason} envelope.
MockJudge for hermetic CI runs.
evaluate_dataset(cases, candidates, judge) scores every case against three metrics — Faithfulness, Answer Relevancy, and Contextual Precision — and aggregates per-metric averages.
DeepEvalJudgeAdapter exposes any JudgeLLM as a DeepEvalBaseLLM, so the same judge powers both the lightweight scorer and the full DeepEval metric suite when [eval] is installed.

Why a small judge¶

CI runs grade every PR. Switching the judge from a frontier model to Gemini Flash or GPT-4o-mini reduces cost roughly 50–100× while staying strongly correlated with human judgement on the three RAG metrics shipped here. The pipeline is judge-agnostic, so swap in a frontier model for nightly precision runs.

Quality gate¶

tests/test_rag_eval.py enforces a minimum threshold of 0.85 on the average score for every metric across the golden dataset at tests/data/rag_golden_set.json (12 cases). Run it with the default MockJudge for hermetic CI; opt in to a real judge by exporting MIRAI_RUN_DEEPEVAL=1 plus the corresponding API key (GEMINI_API_KEY or OPENAI_API_KEY).

Categorisation¶

standard — read-only retrieval. Safe to inject directly into a DynamicAgentEngine without SecureSkill wrapping.

Performance & safety¶

All providers are async; parallel_retrieval runs them concurrently via asyncio.gather(..., return_exceptions=True) so one failed source never cancels the others.
The Neo4j driver, Azure httpx.AsyncClient, and reranker clients are managed as per-skill singletons and disposed via await skill.aclose().
Retrieved chunks pass through the configured reranker before citation-aware truncation, then through truncate_chunks_to_budget which derives the chunk budget by subtracting estimated citation overhead.

🧠 Agentic RAG (agentic-rag)¶