Domain Skills — API¶
Domain skills cover specific enterprise capabilities — agentic_rag (with its four provider families), weather, and database. Decision rationale lives in the ADR index; operational guidance lives in the skill pages.
Agentic RAG¶
mirai_shared_skills.agentic_rag ¶
Agentic RAG package — multi-source retrieval orchestrator.
__all__
module-attribute
¶
__all__ = [
"DEFAULT_CITATION_SAFETY_BUFFER_TOKENS",
"DEFAULT_TOKEN_BUDGET",
"QWEN3_LONG_CONTEXT_TOKENS",
"AgenticRAGSkill",
"AzureSearchConfig",
"AzureSearchProvider",
"BrowserWebSearchProvider",
"CohereRerankProvider",
"Neo4jConnection",
"Neo4jGraphProvider",
"Neo4jUnavailableError",
"NoOpRerankerProvider",
"Qwen3RerankerProvider",
"RAGContextChunk",
"RerankerConfig",
"RerankerProvider",
"RetrievalQuery",
"SourceMetadata",
"SourceName",
"WebSearchProvider",
"estimate_citation_tokens",
"truncate_chunks_to_budget",
]
AgenticRAGSkill ¶
AgenticRAGSkill(
*,
neo4j: Neo4jGraphProvider | None = None,
azure: AzureSearchProvider | None = None,
web: WebSearchProvider | None = None,
reranker: RerankerProvider | None = None,
token_budget: int = DEFAULT_TOKEN_BUDGET,
citation_buffer_tokens: int = DEFAULT_CITATION_SAFETY_BUFFER_TOKENS,
)
Bases: BaseSkill
Multi-source retrieval orchestrator (Graph + Hybrid Vector + Web).
Source code in mirai_shared_skills/agentic_rag/skill.py
aclose
async
¶
get_tools ¶
Source code in mirai_shared_skills/agentic_rag/skill.py
AzureSearchConfig
dataclass
¶
AzureSearchConfig(
endpoint: str,
index_name: str,
api_key: str,
api_version: str = DEFAULT_API_VERSION,
semantic_configuration: str | None = None,
vector_fields: tuple[str, ...] = DEFAULT_VECTOR_FIELDS,
)
AzureSearchProvider ¶
AzureSearchProvider(
config: AzureSearchConfig,
*,
client: AsyncClient | None = None,
embedder: EmbeddingFn | None = None,
timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
)
Async hybrid-search wrapper over the Azure AI Search REST API.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
aclose
async
¶
search
async
¶
search(
query: str,
*,
top_k: int = 5,
filters: dict[str, Any] | None = None,
) -> list[RAGContextChunk]
Issue a hybrid search and project the response into context chunks.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
BrowserWebSearchProvider ¶
BrowserWebSearchProvider(
*,
browser: AgentBrowserSkill | None = None,
url_builder: UrlBuilder = default_duckduckgo_url,
)
Bases: WebSearchProvider
Default WebSearchProvider that delegates to AgentBrowserSkill.
The url_builder callable controls which search engine is queried, so the
same provider works with DuckDuckGo, Bing, Brave, or any private
enterprise search portal.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
search
async
¶
Source code in mirai_shared_skills/agentic_rag/providers/web.py
CohereRerankProvider ¶
CohereRerankProvider(
api_key: str,
*,
model: str = DEFAULT_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Cohere Rerank v4 over the public REST API.
Requires a CO_API_KEY. The provider is async-first and reuses a
single httpx.AsyncClient for connection pooling. Pass an existing
client to share pools with surrounding application code.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Neo4jConnection
dataclass
¶
Neo4jGraphProvider ¶
Neo4jGraphProvider(
connection: Neo4jConnection | None = None,
*,
driver: _AsyncDriver | None = None,
)
Async Neo4j wrapper exposing schema introspection plus parameterised queries.
Test seams: callers can inject driver directly so unit tests bypass the
real driver factory. Otherwise the driver is created lazily on first use
and reused (singleton) for the lifetime of the provider instance.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
aclose
async
¶
describe_schema
async
¶
Return labels and relationship types known to the graph.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
execute
async
¶
execute(
cypher: str,
parameters: Mapping[str, Any] | None = None,
*,
top_k: int = 5,
) -> list[RAGContextChunk]
Run a Cypher statement and project the rows into RAGContextChunks.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
verify_plugins
async
¶
Probe the graph for APOC and GDS availability.
Returns a status dict shaped like::
{"apoc": True, "gds": False, "detail": {"apoc": "...", "gds": "..."}}
The probe runs lightweight introspection calls (apoc.help('apoc')
and gds.list()) and never raises — when a procedure is missing the
corresponding flag is set to False and the failure reason is
captured in detail.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
Neo4jUnavailableError ¶
Bases: RuntimeError
Raised when the optional neo4j dependency is missing.
NoOpRerankerProvider ¶
Bases: RerankerProvider
Passthrough reranker — preserves original ordering and trims to top_k.
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Qwen3RerankerProvider ¶
Qwen3RerankerProvider(
endpoint: str,
*,
api_key: str | None = None,
model: str = DEFAULT_MODEL,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Qwen3-Reranker-4B served via an OpenAI-compatible inference endpoint.
The provider POSTs {query, documents, top_n} to endpoint and expects
a {results: [{index, score}]} envelope. vLLM, TGI, and Together AI all
expose this shape for cross-encoder reranker models.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RAGContextChunk ¶
RerankerConfig
dataclass
¶
RerankerConfig(
top_k: int = 5,
max_context_tokens: int = QWEN3_LONG_CONTEXT_TOKENS,
score_threshold: float | None = None,
extra_headers: dict[str, str] = dict(),
)
Common reranker configuration knobs shared by every backend.
Attributes:
| Name | Type | Description |
|---|---|---|
top_k |
int
|
Maximum number of chunks the reranker should return. |
max_context_tokens |
int
|
Per-pair input token budget. Defaults to the 32k window supported by Qwen3-Reranker-4B; Cohere Rerank v4 also operates against long-form documents and respects this hint. |
score_threshold |
float | None
|
Optional minimum score; chunks below it are dropped. |
extra_headers |
dict[str, str]
|
Extra headers forwarded with every request. |
RerankerProvider ¶
Bases: ABC
Abstract base for any reranker implementation.
rerank
abstractmethod
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Return chunks reordered by relevance to query, truncated to top_k.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RetrievalQuery ¶
Bases: BaseModel
Caller-supplied query envelope sent to the retrieval providers.
filters
class-attribute
instance-attribute
¶
filters: dict[str, Any] | None = Field(
default=None,
description="Provider-specific filter map (e.g. OData filter for Azure).",
)
model_config
class-attribute
instance-attribute
¶
query
class-attribute
instance-attribute
¶
top_k
class-attribute
instance-attribute
¶
SourceMetadata ¶
Bases: BaseModel
Provenance attached to every retrieved chunk so the LLM can cite sources.
extra
class-attribute
instance-attribute
¶
extra: dict[str, Any] = Field(
default_factory=dict,
description="Free-form provider extras (URL, label, embedding, etc.).",
)
identifier
class-attribute
instance-attribute
¶
score
class-attribute
instance-attribute
¶
score: float | None = Field(
default=None,
description="Provider-reported relevance score, when available.",
)
source
class-attribute
instance-attribute
¶
WebSearchProvider ¶
estimate_citation_tokens ¶
Return the approximate token cost of citing every chunk in chunks.
Each citation reserves space for the source label, the identifier (often a
URL), and the JSON-serialised metadata extras the LLM may surface back to
the user. Token counts are estimated as chars / 4 to stay tokeniser-free.
Source code in mirai_shared_skills/agentic_rag/skill.py
truncate_chunks_to_budget ¶
truncate_chunks_to_budget(
chunks: Sequence[RAGContextChunk],
token_budget: int = DEFAULT_TOKEN_BUDGET,
*,
citation_buffer_tokens: int = DEFAULT_CITATION_SAFETY_BUFFER_TOKENS,
) -> list[RAGContextChunk]
Citation-aware greedy truncation.
Chunks are fitted into a budget reduced by the projected citation overhead:
chunk_budget = token_budget - (estimated_citation_tokens + citation_buffer)
The citation estimate is derived from the source label, identifier, and metadata extras of the candidate chunks. The safety buffer accommodates structural overhead (commas, JSON braces, surrounding prose). When the resulting budget is non-positive the function returns an empty list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
Sequence[RAGContextChunk]
|
Candidate chunks in priority order. The first chunk is always kept verbatim if any budget remains. |
required |
token_budget
|
int
|
Total tokens the synthesis prompt can spend on retrieval. |
DEFAULT_TOKEN_BUDGET
|
citation_buffer_tokens
|
int
|
Constant token allowance added to the dynamic citation estimate. Defaults to 64 tokens. |
DEFAULT_CITATION_SAFETY_BUFFER_TOKENS
|
Returns:
| Type | Description |
|---|---|
list[RAGContextChunk]
|
A list of chunks whose total textual char count fits within the |
list[RAGContextChunk]
|
derived character budget. The trailing chunk may be partially |
list[RAGContextChunk]
|
truncated; in that case its |
Source code in mirai_shared_skills/agentic_rag/skill.py
eval ¶
Evaluation suite for the AgenticRAGSkill — judge models, metrics, dataset.
GOLDEN_DATASET_PATH
module-attribute
¶
__all__
module-attribute
¶
__all__ = [
"DEFAULT_GEMINI_MODEL",
"DEFAULT_GPT4O_MINI_MODEL",
"DeepEvalJudgeAdapter",
"EvaluationReport",
"GOLDEN_DATASET_PATH",
"GPT4oMiniJudge",
"GeminiFlashJudge",
"GoldenCase",
"JudgeLLM",
"MetricName",
"MetricScore",
"MockJudge",
"evaluate_dataset",
"load_golden_dataset",
]
DeepEvalJudgeAdapter ¶
Adapter that exposes a JudgeLLM as DeepEval's DeepEvalBaseLLM.
Imported lazily — the adapter is only instantiable when the optional
[eval] extra (deepeval) is installed. The class itself remains
constructible without DeepEval so type-checking stays clean.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
EvaluationReport
dataclass
¶
EvaluationReport(
cases: int,
averages: dict[MetricName, float],
per_case: list[MetricScore] = list(),
)
GPT4oMiniJudge ¶
GPT4oMiniJudge(
api_key: str,
*,
model: str = DEFAULT_GPT4O_MINI_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_JUDGE_TIMEOUT_SECONDS,
)
Bases: _HttpJudge
OpenAI gpt-4o-mini judge over the chat completions API.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
a_generate
async
¶
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
GeminiFlashJudge ¶
GeminiFlashJudge(
api_key: str,
*,
model: str = DEFAULT_GEMINI_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_JUDGE_TIMEOUT_SECONDS,
)
Bases: _HttpJudge
Gemini 1.5 Flash judge over the Generative Language REST API.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
DEFAULT_ENDPOINT
class-attribute
instance-attribute
¶
a_generate
async
¶
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
GoldenCase ¶
Bases: BaseModel
A single ground-truth tuple used for offline evaluation.
context
class-attribute
instance-attribute
¶
context: list[str] = Field(
min_length=1,
description="Retrieval chunks that should ground the response.",
)
expected_answer
class-attribute
instance-attribute
¶
expected_answer: str = Field(
min_length=1,
description="The synthesised answer the agent should produce.",
)
expected_keywords
class-attribute
instance-attribute
¶
expected_keywords: list[str] = Field(
default_factory=list,
description="Optional canonical phrases the answer is expected to contain.",
)
id
class-attribute
instance-attribute
¶
model_config
class-attribute
instance-attribute
¶
query
class-attribute
instance-attribute
¶
JudgeLLM ¶
Bases: ABC
Abstract judge with a single async generate entry point.
a_generate
abstractmethod
async
¶
generate ¶
Sync wrapper around a_generate for DeepEval interoperability.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
MetricName ¶
MetricScore
dataclass
¶
MockJudge ¶
Bases: JudgeLLM
Scripted judge — returns the result of responder(prompt) verbatim.
Used by hermetic unit tests to verify the eval pipeline wiring without issuing real LLM calls.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
evaluate_dataset
async
¶
evaluate_dataset(
cases: list[GoldenCase],
candidate_answers: dict[str, str],
judge: JudgeLLM,
) -> EvaluationReport
Score every case across the three metrics using judge.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cases
|
list[GoldenCase]
|
Golden ground-truth cases. |
required |
candidate_answers
|
dict[str, str]
|
Mapping of |
required |
judge
|
JudgeLLM
|
The |
required |
Source code in mirai_shared_skills/agentic_rag/eval/metrics.py
load_golden_dataset ¶
Load and validate the JSON-encoded golden dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional override; defaults to |
None
|
Returns:
| Type | Description |
|---|---|
list[GoldenCase]
|
A validated list of |
Source code in mirai_shared_skills/agentic_rag/eval/dataset.py
dataset ¶
Golden dataset loader for the AgenticRAG evaluation suite.
GOLDEN_DATASET_PATH
module-attribute
¶
__all__
module-attribute
¶
GoldenCase ¶
Bases: BaseModel
A single ground-truth tuple used for offline evaluation.
context
class-attribute
instance-attribute
¶
context: list[str] = Field(
min_length=1,
description="Retrieval chunks that should ground the response.",
)
expected_answer
class-attribute
instance-attribute
¶
expected_answer: str = Field(
min_length=1,
description="The synthesised answer the agent should produce.",
)
expected_keywords
class-attribute
instance-attribute
¶
expected_keywords: list[str] = Field(
default_factory=list,
description="Optional canonical phrases the answer is expected to contain.",
)
id
class-attribute
instance-attribute
¶
model_config
class-attribute
instance-attribute
¶
query
class-attribute
instance-attribute
¶
load_golden_dataset ¶
Load and validate the JSON-encoded golden dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional override; defaults to |
None
|
Returns:
| Type | Description |
|---|---|
list[GoldenCase]
|
A validated list of |
Source code in mirai_shared_skills/agentic_rag/eval/dataset.py
judge ¶
Cost-efficient judge models for the AgenticRAG evaluation suite.
The evaluation pipeline scores retrievals + answers using small, fast models (Gemini 1.5 Flash, GPT-4o-mini) so high-frequency CI runs stay economically viable while remaining well-aligned with human judgement on RAG metrics.
Three concrete JudgeLLM implementations are shipped:
MockJudge: scripted responses for hermetic unit tests.GeminiFlashJudge: Googlegemini-1.5-flashvia the Generative Language API.GPT4oMiniJudge: OpenAIgpt-4o-minivia the chat completions API.
A DeepEvalJudgeAdapter bridges any JudgeLLM to DeepEval's
DeepEvalBaseLLM interface so the same judge powers both our lightweight
deterministic eval runner and DeepEval's full metric suite.
__all__
module-attribute
¶
__all__ = [
"DEFAULT_GEMINI_MODEL",
"DEFAULT_GPT4O_MINI_MODEL",
"DEFAULT_JUDGE_TIMEOUT_SECONDS",
"DeepEvalJudgeAdapter",
"GPT4oMiniJudge",
"GeminiFlashJudge",
"JudgeFactory",
"JudgeLLM",
"MockJudge",
]
DeepEvalJudgeAdapter ¶
Adapter that exposes a JudgeLLM as DeepEval's DeepEvalBaseLLM.
Imported lazily — the adapter is only instantiable when the optional
[eval] extra (deepeval) is installed. The class itself remains
constructible without DeepEval so type-checking stays clean.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
GPT4oMiniJudge ¶
GPT4oMiniJudge(
api_key: str,
*,
model: str = DEFAULT_GPT4O_MINI_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_JUDGE_TIMEOUT_SECONDS,
)
Bases: _HttpJudge
OpenAI gpt-4o-mini judge over the chat completions API.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
a_generate
async
¶
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
GeminiFlashJudge ¶
GeminiFlashJudge(
api_key: str,
*,
model: str = DEFAULT_GEMINI_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_JUDGE_TIMEOUT_SECONDS,
)
Bases: _HttpJudge
Gemini 1.5 Flash judge over the Generative Language REST API.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
DEFAULT_ENDPOINT
class-attribute
instance-attribute
¶
a_generate
async
¶
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
JudgeLLM ¶
Bases: ABC
Abstract judge with a single async generate entry point.
a_generate
abstractmethod
async
¶
generate ¶
Sync wrapper around a_generate for DeepEval interoperability.
Source code in mirai_shared_skills/agentic_rag/eval/judge.py
metrics ¶
LLM-as-a-judge metrics for the AgenticRAG evaluation suite.
Each metric is graded by a small, cost-efficient JudgeLLM (Gemini Flash or
GPT-4o-mini by default) so high-frequency CI runs stay economically viable.
The judge is instructed to return a strict JSON envelope {"score": float,
"reason": str} which we parse defensively.
Three metrics ship out of the box:
- Faithfulness — does every claim in the answer appear in the context?
- Answer Relevancy — does the answer address the user's question?
- Contextual Precision — does the retrieved context cover the answer?
When DeepEval is installed, the same JudgeLLM instance is reusable through
DeepEvalJudgeAdapter for the full DeepEval metric suite; see
mirai_shared_skills.agentic_rag.eval.judge.DeepEvalJudgeAdapter.
__all__
module-attribute
¶
EvaluationReport
dataclass
¶
EvaluationReport(
cases: int,
averages: dict[MetricName, float],
per_case: list[MetricScore] = list(),
)
MetricName ¶
MetricScore
dataclass
¶
evaluate_dataset
async
¶
evaluate_dataset(
cases: list[GoldenCase],
candidate_answers: dict[str, str],
judge: JudgeLLM,
) -> EvaluationReport
Score every case across the three metrics using judge.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cases
|
list[GoldenCase]
|
Golden ground-truth cases. |
required |
candidate_answers
|
dict[str, str]
|
Mapping of |
required |
judge
|
JudgeLLM
|
The |
required |
Source code in mirai_shared_skills/agentic_rag/eval/metrics.py
models ¶
Strict Pydantic v2 schemas shared by every retrieval provider.
__all__
module-attribute
¶
RAGContextChunk ¶
RetrievalQuery ¶
Bases: BaseModel
Caller-supplied query envelope sent to the retrieval providers.
filters
class-attribute
instance-attribute
¶
filters: dict[str, Any] | None = Field(
default=None,
description="Provider-specific filter map (e.g. OData filter for Azure).",
)
model_config
class-attribute
instance-attribute
¶
query
class-attribute
instance-attribute
¶
top_k
class-attribute
instance-attribute
¶
SourceMetadata ¶
Bases: BaseModel
Provenance attached to every retrieved chunk so the LLM can cite sources.
extra
class-attribute
instance-attribute
¶
extra: dict[str, Any] = Field(
default_factory=dict,
description="Free-form provider extras (URL, label, embedding, etc.).",
)
identifier
class-attribute
instance-attribute
¶
score
class-attribute
instance-attribute
¶
score: float | None = Field(
default=None,
description="Provider-reported relevance score, when available.",
)
source
class-attribute
instance-attribute
¶
providers ¶
Retrieval providers consumed by the AgenticRAGSkill.
__all__
module-attribute
¶
__all__ = [
"AzureSearchProvider",
"BrowserWebSearchProvider",
"CohereRerankProvider",
"Neo4jGraphProvider",
"NoOpRerankerProvider",
"Qwen3RerankerProvider",
"RerankerConfig",
"RerankerProvider",
"WebSearchProvider",
]
AzureSearchProvider ¶
AzureSearchProvider(
config: AzureSearchConfig,
*,
client: AsyncClient | None = None,
embedder: EmbeddingFn | None = None,
timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
)
Async hybrid-search wrapper over the Azure AI Search REST API.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
aclose
async
¶
search
async
¶
search(
query: str,
*,
top_k: int = 5,
filters: dict[str, Any] | None = None,
) -> list[RAGContextChunk]
Issue a hybrid search and project the response into context chunks.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
BrowserWebSearchProvider ¶
BrowserWebSearchProvider(
*,
browser: AgentBrowserSkill | None = None,
url_builder: UrlBuilder = default_duckduckgo_url,
)
Bases: WebSearchProvider
Default WebSearchProvider that delegates to AgentBrowserSkill.
The url_builder callable controls which search engine is queried, so the
same provider works with DuckDuckGo, Bing, Brave, or any private
enterprise search portal.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
search
async
¶
Source code in mirai_shared_skills/agentic_rag/providers/web.py
CohereRerankProvider ¶
CohereRerankProvider(
api_key: str,
*,
model: str = DEFAULT_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Cohere Rerank v4 over the public REST API.
Requires a CO_API_KEY. The provider is async-first and reuses a
single httpx.AsyncClient for connection pooling. Pass an existing
client to share pools with surrounding application code.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Neo4jGraphProvider ¶
Neo4jGraphProvider(
connection: Neo4jConnection | None = None,
*,
driver: _AsyncDriver | None = None,
)
Async Neo4j wrapper exposing schema introspection plus parameterised queries.
Test seams: callers can inject driver directly so unit tests bypass the
real driver factory. Otherwise the driver is created lazily on first use
and reused (singleton) for the lifetime of the provider instance.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
aclose
async
¶
describe_schema
async
¶
Return labels and relationship types known to the graph.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
execute
async
¶
execute(
cypher: str,
parameters: Mapping[str, Any] | None = None,
*,
top_k: int = 5,
) -> list[RAGContextChunk]
Run a Cypher statement and project the rows into RAGContextChunks.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
verify_plugins
async
¶
Probe the graph for APOC and GDS availability.
Returns a status dict shaped like::
{"apoc": True, "gds": False, "detail": {"apoc": "...", "gds": "..."}}
The probe runs lightweight introspection calls (apoc.help('apoc')
and gds.list()) and never raises — when a procedure is missing the
corresponding flag is set to False and the failure reason is
captured in detail.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
NoOpRerankerProvider ¶
Bases: RerankerProvider
Passthrough reranker — preserves original ordering and trims to top_k.
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Qwen3RerankerProvider ¶
Qwen3RerankerProvider(
endpoint: str,
*,
api_key: str | None = None,
model: str = DEFAULT_MODEL,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Qwen3-Reranker-4B served via an OpenAI-compatible inference endpoint.
The provider POSTs {query, documents, top_n} to endpoint and expects
a {results: [{index, score}]} envelope. vLLM, TGI, and Together AI all
expose this shape for cross-encoder reranker models.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RerankerConfig
dataclass
¶
RerankerConfig(
top_k: int = 5,
max_context_tokens: int = QWEN3_LONG_CONTEXT_TOKENS,
score_threshold: float | None = None,
extra_headers: dict[str, str] = dict(),
)
Common reranker configuration knobs shared by every backend.
Attributes:
| Name | Type | Description |
|---|---|---|
top_k |
int
|
Maximum number of chunks the reranker should return. |
max_context_tokens |
int
|
Per-pair input token budget. Defaults to the 32k window supported by Qwen3-Reranker-4B; Cohere Rerank v4 also operates against long-form documents and respects this hint. |
score_threshold |
float | None
|
Optional minimum score; chunks below it are dropped. |
extra_headers |
dict[str, str]
|
Extra headers forwarded with every request. |
RerankerProvider ¶
Bases: ABC
Abstract base for any reranker implementation.
rerank
abstractmethod
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Return chunks reordered by relevance to query, truncated to top_k.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
WebSearchProvider ¶
azure ¶
Azure AI Search provider — hybrid retrieval over the REST API.
The provider issues hybrid (vector + BM25) queries with the Semantic Ranker
enabled when semantic_configuration is provided. Filters are passed through
verbatim as OData expressions.
A pluggable embedding callable produces vectors for the user query; tests can inject a stub callable to avoid hitting the real embedding model.
AzureSearchConfig
dataclass
¶
AzureSearchConfig(
endpoint: str,
index_name: str,
api_key: str,
api_version: str = DEFAULT_API_VERSION,
semantic_configuration: str | None = None,
vector_fields: tuple[str, ...] = DEFAULT_VECTOR_FIELDS,
)
AzureSearchProvider ¶
AzureSearchProvider(
config: AzureSearchConfig,
*,
client: AsyncClient | None = None,
embedder: EmbeddingFn | None = None,
timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
)
Async hybrid-search wrapper over the Azure AI Search REST API.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
aclose
async
¶
search
async
¶
search(
query: str,
*,
top_k: int = 5,
filters: dict[str, Any] | None = None,
) -> list[RAGContextChunk]
Issue a hybrid search and project the response into context chunks.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
neo4j_graph ¶
Neo4j graph retrieval provider.
neo4j is an optional dependency. The driver is imported lazily so the
catalog can be loaded without the Neo4j package installed; attempting to
execute a query without the driver raises a structured error chunk instead.
__all__
module-attribute
¶
Neo4jConnection
dataclass
¶
Neo4jGraphProvider ¶
Neo4jGraphProvider(
connection: Neo4jConnection | None = None,
*,
driver: _AsyncDriver | None = None,
)
Async Neo4j wrapper exposing schema introspection plus parameterised queries.
Test seams: callers can inject driver directly so unit tests bypass the
real driver factory. Otherwise the driver is created lazily on first use
and reused (singleton) for the lifetime of the provider instance.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
aclose
async
¶
describe_schema
async
¶
Return labels and relationship types known to the graph.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
execute
async
¶
execute(
cypher: str,
parameters: Mapping[str, Any] | None = None,
*,
top_k: int = 5,
) -> list[RAGContextChunk]
Run a Cypher statement and project the rows into RAGContextChunks.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
verify_plugins
async
¶
Probe the graph for APOC and GDS availability.
Returns a status dict shaped like::
{"apoc": True, "gds": False, "detail": {"apoc": "...", "gds": "..."}}
The probe runs lightweight introspection calls (apoc.help('apoc')
and gds.list()) and never raises — when a procedure is missing the
corresponding flag is set to False and the failure reason is
captured in detail.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
Neo4jUnavailableError ¶
Bases: RuntimeError
Raised when the optional neo4j dependency is missing.
reranker ¶
Reranker providers — high-precision second-stage scoring for retrieved chunks.
Two production-grade implementations are shipped, each behind a unified
RerankerProvider interface:
CohereRerankProvider— Cohere Rerank v4 over the public REST API.Qwen3RerankerProvider— open-source SOTAQwen3-Reranker-4Bexposed via an OpenAI-compatible inference server (vLLM, TGI, or any compatible host).
Both providers talk over httpx so unit tests stay fully hermetic with respx.
A NoOpRerankerProvider is provided as a passthrough default so the skill can
opt out without conditional logic at the call site.
__all__
module-attribute
¶
__all__ = [
"DEFAULT_RERANK_TIMEOUT_SECONDS",
"QWEN3_LONG_CONTEXT_TOKENS",
"CohereRerankProvider",
"NoOpRerankerProvider",
"Qwen3RerankerProvider",
"RerankerConfig",
"RerankerProvider",
]
CohereRerankProvider ¶
CohereRerankProvider(
api_key: str,
*,
model: str = DEFAULT_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Cohere Rerank v4 over the public REST API.
Requires a CO_API_KEY. The provider is async-first and reuses a
single httpx.AsyncClient for connection pooling. Pass an existing
client to share pools with surrounding application code.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
NoOpRerankerProvider ¶
Bases: RerankerProvider
Passthrough reranker — preserves original ordering and trims to top_k.
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Qwen3RerankerProvider ¶
Qwen3RerankerProvider(
endpoint: str,
*,
api_key: str | None = None,
model: str = DEFAULT_MODEL,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Qwen3-Reranker-4B served via an OpenAI-compatible inference endpoint.
The provider POSTs {query, documents, top_n} to endpoint and expects
a {results: [{index, score}]} envelope. vLLM, TGI, and Together AI all
expose this shape for cross-encoder reranker models.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RerankerConfig
dataclass
¶
RerankerConfig(
top_k: int = 5,
max_context_tokens: int = QWEN3_LONG_CONTEXT_TOKENS,
score_threshold: float | None = None,
extra_headers: dict[str, str] = dict(),
)
Common reranker configuration knobs shared by every backend.
Attributes:
| Name | Type | Description |
|---|---|---|
top_k |
int
|
Maximum number of chunks the reranker should return. |
max_context_tokens |
int
|
Per-pair input token budget. Defaults to the 32k window supported by Qwen3-Reranker-4B; Cohere Rerank v4 also operates against long-form documents and respects this hint. |
score_threshold |
float | None
|
Optional minimum score; chunks below it are dropped. |
extra_headers |
dict[str, str]
|
Extra headers forwarded with every request. |
RerankerProvider ¶
Bases: ABC
Abstract base for any reranker implementation.
rerank
abstractmethod
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Return chunks reordered by relevance to query, truncated to top_k.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
web ¶
Web search provider abstraction.
The agentic RAG skill needs to fall back to live web data when internal sources are insufficient. This module exposes:
WebSearchProvider: an abstract interface (justsearch(query, top_k)).BrowserWebSearchProvider: a default implementation that wraps the existingAgentBrowserSkillplus a configurable URL builder so any search engine (Bing, Brave, Tavily, Perplexity, custom intranet) can be plugged in by changing the URL template.
__all__
module-attribute
¶
__all__ = [
"BrowserWebSearchProvider",
"UrlBuilder",
"WebSearchProvider",
"default_duckduckgo_url",
]
BrowserWebSearchProvider ¶
BrowserWebSearchProvider(
*,
browser: AgentBrowserSkill | None = None,
url_builder: UrlBuilder = default_duckduckgo_url,
)
Bases: WebSearchProvider
Default WebSearchProvider that delegates to AgentBrowserSkill.
The url_builder callable controls which search engine is queried, so the
same provider works with DuckDuckGo, Bing, Brave, or any private
enterprise search portal.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
search
async
¶
Source code in mirai_shared_skills/agentic_rag/providers/web.py
WebSearchProvider ¶
default_duckduckgo_url ¶
Return the DuckDuckGo HTML endpoint for query. Used as the safe default.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
skill ¶
AgenticRAGSkill — multi-source orchestrator for graph, vector, and web retrieval.
__all__
module-attribute
¶
__all__ = [
"AgenticRAGSkill",
"DEFAULT_CITATION_SAFETY_BUFFER_TOKENS",
"DEFAULT_TOKEN_BUDGET",
"estimate_citation_tokens",
"truncate_chunks_to_budget",
]
AgenticRAGSkill ¶
AgenticRAGSkill(
*,
neo4j: Neo4jGraphProvider | None = None,
azure: AzureSearchProvider | None = None,
web: WebSearchProvider | None = None,
reranker: RerankerProvider | None = None,
token_budget: int = DEFAULT_TOKEN_BUDGET,
citation_buffer_tokens: int = DEFAULT_CITATION_SAFETY_BUFFER_TOKENS,
)
Bases: BaseSkill
Multi-source retrieval orchestrator (Graph + Hybrid Vector + Web).
Source code in mirai_shared_skills/agentic_rag/skill.py
aclose
async
¶
get_tools ¶
Source code in mirai_shared_skills/agentic_rag/skill.py
estimate_citation_tokens ¶
Return the approximate token cost of citing every chunk in chunks.
Each citation reserves space for the source label, the identifier (often a
URL), and the JSON-serialised metadata extras the LLM may surface back to
the user. Token counts are estimated as chars / 4 to stay tokeniser-free.
Source code in mirai_shared_skills/agentic_rag/skill.py
truncate_chunks_to_budget ¶
truncate_chunks_to_budget(
chunks: Sequence[RAGContextChunk],
token_budget: int = DEFAULT_TOKEN_BUDGET,
*,
citation_buffer_tokens: int = DEFAULT_CITATION_SAFETY_BUFFER_TOKENS,
) -> list[RAGContextChunk]
Citation-aware greedy truncation.
Chunks are fitted into a budget reduced by the projected citation overhead:
chunk_budget = token_budget - (estimated_citation_tokens + citation_buffer)
The citation estimate is derived from the source label, identifier, and metadata extras of the candidate chunks. The safety buffer accommodates structural overhead (commas, JSON braces, surrounding prose). When the resulting budget is non-positive the function returns an empty list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
Sequence[RAGContextChunk]
|
Candidate chunks in priority order. The first chunk is always kept verbatim if any budget remains. |
required |
token_budget
|
int
|
Total tokens the synthesis prompt can spend on retrieval. |
DEFAULT_TOKEN_BUDGET
|
citation_buffer_tokens
|
int
|
Constant token allowance added to the dynamic citation estimate. Defaults to 64 tokens. |
DEFAULT_CITATION_SAFETY_BUFFER_TOKENS
|
Returns:
| Type | Description |
|---|---|
list[RAGContextChunk]
|
A list of chunks whose total textual char count fits within the |
list[RAGContextChunk]
|
derived character budget. The trailing chunk may be partially |
list[RAGContextChunk]
|
truncated; in that case its |
Source code in mirai_shared_skills/agentic_rag/skill.py
Models¶
mirai_shared_skills.agentic_rag.models ¶
Strict Pydantic v2 schemas shared by every retrieval provider.
__all__
module-attribute
¶
RAGContextChunk ¶
RetrievalQuery ¶
Bases: BaseModel
Caller-supplied query envelope sent to the retrieval providers.
filters
class-attribute
instance-attribute
¶
filters: dict[str, Any] | None = Field(
default=None,
description="Provider-specific filter map (e.g. OData filter for Azure).",
)
model_config
class-attribute
instance-attribute
¶
query
class-attribute
instance-attribute
¶
top_k
class-attribute
instance-attribute
¶
SourceMetadata ¶
Bases: BaseModel
Provenance attached to every retrieved chunk so the LLM can cite sources.
extra
class-attribute
instance-attribute
¶
extra: dict[str, Any] = Field(
default_factory=dict,
description="Free-form provider extras (URL, label, embedding, etc.).",
)
identifier
class-attribute
instance-attribute
¶
score
class-attribute
instance-attribute
¶
score: float | None = Field(
default=None,
description="Provider-reported relevance score, when available.",
)
source
class-attribute
instance-attribute
¶
Providers¶
mirai_shared_skills.agentic_rag.providers ¶
Retrieval providers consumed by the AgenticRAGSkill.
__all__
module-attribute
¶
__all__ = [
"AzureSearchProvider",
"BrowserWebSearchProvider",
"CohereRerankProvider",
"Neo4jGraphProvider",
"NoOpRerankerProvider",
"Qwen3RerankerProvider",
"RerankerConfig",
"RerankerProvider",
"WebSearchProvider",
]
AzureSearchProvider ¶
AzureSearchProvider(
config: AzureSearchConfig,
*,
client: AsyncClient | None = None,
embedder: EmbeddingFn | None = None,
timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
)
Async hybrid-search wrapper over the Azure AI Search REST API.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
aclose
async
¶
search
async
¶
search(
query: str,
*,
top_k: int = 5,
filters: dict[str, Any] | None = None,
) -> list[RAGContextChunk]
Issue a hybrid search and project the response into context chunks.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
BrowserWebSearchProvider ¶
BrowserWebSearchProvider(
*,
browser: AgentBrowserSkill | None = None,
url_builder: UrlBuilder = default_duckduckgo_url,
)
Bases: WebSearchProvider
Default WebSearchProvider that delegates to AgentBrowserSkill.
The url_builder callable controls which search engine is queried, so the
same provider works with DuckDuckGo, Bing, Brave, or any private
enterprise search portal.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
search
async
¶
Source code in mirai_shared_skills/agentic_rag/providers/web.py
CohereRerankProvider ¶
CohereRerankProvider(
api_key: str,
*,
model: str = DEFAULT_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Cohere Rerank v4 over the public REST API.
Requires a CO_API_KEY. The provider is async-first and reuses a
single httpx.AsyncClient for connection pooling. Pass an existing
client to share pools with surrounding application code.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Neo4jGraphProvider ¶
Neo4jGraphProvider(
connection: Neo4jConnection | None = None,
*,
driver: _AsyncDriver | None = None,
)
Async Neo4j wrapper exposing schema introspection plus parameterised queries.
Test seams: callers can inject driver directly so unit tests bypass the
real driver factory. Otherwise the driver is created lazily on first use
and reused (singleton) for the lifetime of the provider instance.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
aclose
async
¶
describe_schema
async
¶
Return labels and relationship types known to the graph.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
execute
async
¶
execute(
cypher: str,
parameters: Mapping[str, Any] | None = None,
*,
top_k: int = 5,
) -> list[RAGContextChunk]
Run a Cypher statement and project the rows into RAGContextChunks.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
verify_plugins
async
¶
Probe the graph for APOC and GDS availability.
Returns a status dict shaped like::
{"apoc": True, "gds": False, "detail": {"apoc": "...", "gds": "..."}}
The probe runs lightweight introspection calls (apoc.help('apoc')
and gds.list()) and never raises — when a procedure is missing the
corresponding flag is set to False and the failure reason is
captured in detail.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
NoOpRerankerProvider ¶
Bases: RerankerProvider
Passthrough reranker — preserves original ordering and trims to top_k.
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Qwen3RerankerProvider ¶
Qwen3RerankerProvider(
endpoint: str,
*,
api_key: str | None = None,
model: str = DEFAULT_MODEL,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Qwen3-Reranker-4B served via an OpenAI-compatible inference endpoint.
The provider POSTs {query, documents, top_n} to endpoint and expects
a {results: [{index, score}]} envelope. vLLM, TGI, and Together AI all
expose this shape for cross-encoder reranker models.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RerankerConfig
dataclass
¶
RerankerConfig(
top_k: int = 5,
max_context_tokens: int = QWEN3_LONG_CONTEXT_TOKENS,
score_threshold: float | None = None,
extra_headers: dict[str, str] = dict(),
)
Common reranker configuration knobs shared by every backend.
Attributes:
| Name | Type | Description |
|---|---|---|
top_k |
int
|
Maximum number of chunks the reranker should return. |
max_context_tokens |
int
|
Per-pair input token budget. Defaults to the 32k window supported by Qwen3-Reranker-4B; Cohere Rerank v4 also operates against long-form documents and respects this hint. |
score_threshold |
float | None
|
Optional minimum score; chunks below it are dropped. |
extra_headers |
dict[str, str]
|
Extra headers forwarded with every request. |
RerankerProvider ¶
Bases: ABC
Abstract base for any reranker implementation.
rerank
abstractmethod
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Return chunks reordered by relevance to query, truncated to top_k.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
WebSearchProvider ¶
azure ¶
Azure AI Search provider — hybrid retrieval over the REST API.
The provider issues hybrid (vector + BM25) queries with the Semantic Ranker
enabled when semantic_configuration is provided. Filters are passed through
verbatim as OData expressions.
A pluggable embedding callable produces vectors for the user query; tests can inject a stub callable to avoid hitting the real embedding model.
AzureSearchConfig
dataclass
¶
AzureSearchConfig(
endpoint: str,
index_name: str,
api_key: str,
api_version: str = DEFAULT_API_VERSION,
semantic_configuration: str | None = None,
vector_fields: tuple[str, ...] = DEFAULT_VECTOR_FIELDS,
)
AzureSearchProvider ¶
AzureSearchProvider(
config: AzureSearchConfig,
*,
client: AsyncClient | None = None,
embedder: EmbeddingFn | None = None,
timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
)
Async hybrid-search wrapper over the Azure AI Search REST API.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
aclose
async
¶
search
async
¶
search(
query: str,
*,
top_k: int = 5,
filters: dict[str, Any] | None = None,
) -> list[RAGContextChunk]
Issue a hybrid search and project the response into context chunks.
Source code in mirai_shared_skills/agentic_rag/providers/azure.py
neo4j_graph ¶
Neo4j graph retrieval provider.
neo4j is an optional dependency. The driver is imported lazily so the
catalog can be loaded without the Neo4j package installed; attempting to
execute a query without the driver raises a structured error chunk instead.
__all__
module-attribute
¶
Neo4jConnection
dataclass
¶
Neo4jGraphProvider ¶
Neo4jGraphProvider(
connection: Neo4jConnection | None = None,
*,
driver: _AsyncDriver | None = None,
)
Async Neo4j wrapper exposing schema introspection plus parameterised queries.
Test seams: callers can inject driver directly so unit tests bypass the
real driver factory. Otherwise the driver is created lazily on first use
and reused (singleton) for the lifetime of the provider instance.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
aclose
async
¶
describe_schema
async
¶
Return labels and relationship types known to the graph.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
execute
async
¶
execute(
cypher: str,
parameters: Mapping[str, Any] | None = None,
*,
top_k: int = 5,
) -> list[RAGContextChunk]
Run a Cypher statement and project the rows into RAGContextChunks.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
verify_plugins
async
¶
Probe the graph for APOC and GDS availability.
Returns a status dict shaped like::
{"apoc": True, "gds": False, "detail": {"apoc": "...", "gds": "..."}}
The probe runs lightweight introspection calls (apoc.help('apoc')
and gds.list()) and never raises — when a procedure is missing the
corresponding flag is set to False and the failure reason is
captured in detail.
Source code in mirai_shared_skills/agentic_rag/providers/neo4j_graph.py
Neo4jUnavailableError ¶
Bases: RuntimeError
Raised when the optional neo4j dependency is missing.
reranker ¶
Reranker providers — high-precision second-stage scoring for retrieved chunks.
Two production-grade implementations are shipped, each behind a unified
RerankerProvider interface:
CohereRerankProvider— Cohere Rerank v4 over the public REST API.Qwen3RerankerProvider— open-source SOTAQwen3-Reranker-4Bexposed via an OpenAI-compatible inference server (vLLM, TGI, or any compatible host).
Both providers talk over httpx so unit tests stay fully hermetic with respx.
A NoOpRerankerProvider is provided as a passthrough default so the skill can
opt out without conditional logic at the call site.
__all__
module-attribute
¶
__all__ = [
"DEFAULT_RERANK_TIMEOUT_SECONDS",
"QWEN3_LONG_CONTEXT_TOKENS",
"CohereRerankProvider",
"NoOpRerankerProvider",
"Qwen3RerankerProvider",
"RerankerConfig",
"RerankerProvider",
]
CohereRerankProvider ¶
CohereRerankProvider(
api_key: str,
*,
model: str = DEFAULT_MODEL,
endpoint: str = DEFAULT_ENDPOINT,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Cohere Rerank v4 over the public REST API.
Requires a CO_API_KEY. The provider is async-first and reuses a
single httpx.AsyncClient for connection pooling. Pass an existing
client to share pools with surrounding application code.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
NoOpRerankerProvider ¶
Bases: RerankerProvider
Passthrough reranker — preserves original ordering and trims to top_k.
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
Qwen3RerankerProvider ¶
Qwen3RerankerProvider(
endpoint: str,
*,
api_key: str | None = None,
model: str = DEFAULT_MODEL,
config: RerankerConfig | None = None,
client: AsyncClient | None = None,
timeout_seconds: float = DEFAULT_RERANK_TIMEOUT_SECONDS,
)
Bases: RerankerProvider
Qwen3-Reranker-4B served via an OpenAI-compatible inference endpoint.
The provider POSTs {query, documents, top_n} to endpoint and expects
a {results: [{index, score}]} envelope. vLLM, TGI, and Together AI all
expose this shape for cross-encoder reranker models.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
aclose
async
¶
rerank
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
RerankerConfig
dataclass
¶
RerankerConfig(
top_k: int = 5,
max_context_tokens: int = QWEN3_LONG_CONTEXT_TOKENS,
score_threshold: float | None = None,
extra_headers: dict[str, str] = dict(),
)
Common reranker configuration knobs shared by every backend.
Attributes:
| Name | Type | Description |
|---|---|---|
top_k |
int
|
Maximum number of chunks the reranker should return. |
max_context_tokens |
int
|
Per-pair input token budget. Defaults to the 32k window supported by Qwen3-Reranker-4B; Cohere Rerank v4 also operates against long-form documents and respects this hint. |
score_threshold |
float | None
|
Optional minimum score; chunks below it are dropped. |
extra_headers |
dict[str, str]
|
Extra headers forwarded with every request. |
RerankerProvider ¶
Bases: ABC
Abstract base for any reranker implementation.
rerank
abstractmethod
async
¶
rerank(
query: str,
chunks: Sequence[RAGContextChunk],
*,
top_k: int | None = None,
) -> list[RAGContextChunk]
Return chunks reordered by relevance to query, truncated to top_k.
Source code in mirai_shared_skills/agentic_rag/providers/reranker.py
web ¶
Web search provider abstraction.
The agentic RAG skill needs to fall back to live web data when internal sources are insufficient. This module exposes:
WebSearchProvider: an abstract interface (justsearch(query, top_k)).BrowserWebSearchProvider: a default implementation that wraps the existingAgentBrowserSkillplus a configurable URL builder so any search engine (Bing, Brave, Tavily, Perplexity, custom intranet) can be plugged in by changing the URL template.
__all__
module-attribute
¶
__all__ = [
"BrowserWebSearchProvider",
"UrlBuilder",
"WebSearchProvider",
"default_duckduckgo_url",
]
BrowserWebSearchProvider ¶
BrowserWebSearchProvider(
*,
browser: AgentBrowserSkill | None = None,
url_builder: UrlBuilder = default_duckduckgo_url,
)
Bases: WebSearchProvider
Default WebSearchProvider that delegates to AgentBrowserSkill.
The url_builder callable controls which search engine is queried, so the
same provider works with DuckDuckGo, Bing, Brave, or any private
enterprise search portal.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
search
async
¶
Source code in mirai_shared_skills/agentic_rag/providers/web.py
WebSearchProvider ¶
default_duckduckgo_url ¶
Return the DuckDuckGo HTML endpoint for query. Used as the safe default.
Source code in mirai_shared_skills/agentic_rag/providers/web.py
Weather¶
mirai_shared_skills.weather ¶
Weather domain — a Standard (read-only) shared skill.
Fetches forecasts for a location through a public HTTP endpoint. The skill is
inherently safe and does not require downstream SecureSkill wrapping.
Database¶
mirai_shared_skills.database ¶
Raw database operations — comprehensive, including destructive tooling.
RawDatabaseSkill is intentionally written without security guards. The whole
catalog is consumed by downstream clients that wrap raw skills with a
SecureSkill policy mapping. The catalog therefore exposes the full surface
area of database tooling, including destructive operations like drop_table,
without arbitrarily limiting utility during the development phase.
RawDatabaseSkill ¶
Bases: BaseSkill
Raw skill exposing comprehensive read and state-mutating database tools.