Skip to content

Testing Skills

This guide covers how to write tests for skills in mirai-shared-skills. The repo enforces three test tiers — unit, integration, and Docker-required — each with its own fixtures and pytest markers.

Test layout

tests/
├── conftest.py                    # registry-clearing fixture, shared fakes
├── auth_gates/
│   ├── test_skill.py              # unit
│   └── test_credential_handoff.py
├── agentic_rag/
│   ├── test_skill.py              # unit (mocked providers)
│   ├── test_token_budget.py       # unit (pure function)
│   ├── providers/
│   │   ├── test_azure_search.py   # unit + respx HTTP mocks
│   │   └── test_neo4j_graph.py    # integration (Docker)
│   └── eval/
│       └── test_harness.py        # eval harness, [eval] extra
├── browser/
│   └── test_skill.py              # unit + respx
├── ...

Tier 1: Unit tests (no live backends)

Run with uv sync --extra dev && uv run pytest tests/. These should be the bulk of your tests.

  • For HTTP-using skills: use respx to mock httpx calls.
  • For provider-using skills (e.g. AgenticRAGSkill): inject MagicMock(spec=GraphProvider) etc. — never real providers.
  • For pure helpers (e.g. estimate_citation_tokens): plain pytest parametrize.

Example: testing WeatherSkill with respx:

import respx
from httpx import Response

from mirai_shared_skills.weather import WeatherSkill

@respx.mock
async def test_weather_lookup():
    respx.get("https://api.weather.test/london").mock(
        return_value=Response(200, json={"temp_c": 12.5}),
    )
    skill = WeatherSkill(api_base="https://api.weather.test")
    result = await skill._get_current("london")
    assert result["temp_c"] == 12.5

Example: testing AgenticRAGSkill with mocked providers:

from unittest.mock import MagicMock, AsyncMock
from mirai_shared_skills.agentic_rag import AgenticRAGSkill
from mirai_shared_skills.agentic_rag.providers import GraphProvider, VectorSearchProvider

async def test_rag_token_budget_drops_low_rank_chunks():
    graph = MagicMock(spec=GraphProvider)
    graph.expand = AsyncMock(return_value=[/* big chunks */])
    vector = MagicMock(spec=VectorSearchProvider)
    vector.search = AsyncMock(return_value=[/* small chunks */])

    skill = AgenticRAGSkill(graph=graph, vector=vector, token_budget=500)
    result = await skill._retrieve("query")

    assert sum(c.estimated_tokens for c in result.chunks) <= 500
    assert result.dropped_count > 0

Tier 2: Integration tests (@pytest.mark.integration)

Run with uv sync --all-extras && uv run pytest -m integration. These tests exercise real provider SDKs against in-memory or local backends.

  • Mark with @pytest.mark.integration so default pytest tests/ skips them.
  • Don't hit production endpoints. Use local backends or recorded fixtures.

Tier 3: Docker-required tests

Some integration tests need a running Neo4j or other DB. Use docker-compose.test.yml:

docker compose -f docker-compose.test.yml up -d
uv run pytest tests/agentic_rag/providers/test_neo4j_graph.py -m integration
docker compose -f docker-compose.test.yml down -v

CI runs Tier 1 always, Tier 2/3 nightly (or on a manual workflow_dispatch).

Registry fixtures

Skills register descriptors at import time (see ADR-0004). For tests that touch the registry, clear it between cases:

import pytest
from mirai_shared_skills import _registry

@pytest.fixture
def empty_registry():
    saved = dict(_registry._REGISTRY)
    _registry._REGISTRY.clear()
    yield _registry
    _registry._REGISTRY.clear()
    _registry._REGISTRY.update(saved)

def test_register_and_find(empty_registry):
    descriptor = make_fake_descriptor("fake-skill")
    empty_registry.register(descriptor)
    assert empty_registry.find("fake")[0].name == "fake-skill"

Linting + type checking

CI also runs:

uv run ruff check . && uv run ruff format --check .
uv run mypy mirai_shared_skills tests

Mypy is in --strict mode for the package. Tests are checked too — keep them typed.

Docs build as a test

The CI pipeline runs uv run mkdocs build --strict to catch doc breakage. If your skill ships new references or new ADR cross-links, run this locally:

uv run mkdocs build --strict

Broken internal links, missing files, or --8<-- snippets that don't resolve all surface as Aborted with N warnings in strict mode!.