ADR-0009: Skill Metadata for Compliance and Production Routing¶
- Date: 2026-05-06
- Authors: Matteo Rizzo
- Status: Accepted
- Approval State: Approved (Approved by: Matteo Rizzo on 2026-05-06)
- Implementation State: Completed
1. Context and Problem Statement¶
SkillDescriptor (see ADR-0004) shipped with five fields: name, description, instructions, category, import_path, tools, references. That schema was sufficient when the catalog was a handful of demo skills and downstream clients composed the bundle by hand. Two pressures changed the requirement:
- EU AI Act Article 16 (August 2026 deadline) makes downstream operators responsible for declaring the risk profile of every "AI system component" they ship. Operators need a structured way to enumerate which catalog skills carry destructive or data-egress risk versus pure read-only metadata access. Scanning import paths or grepping docstrings is not a defensible audit position.
- Stub skills shipped to production by accident.
weather-apianddatabase-operationsare deliberately mock implementations during the development phase —weather-apireturns a hardcoded "22 °C and sunny" string regardless of input, anddatabase-operationsreturns no-op JSON envelopes for every mutation. When a downstream agent'sSemanticRouterhappens to pick them, the user gets confidently wrong answers. There was no first-class way to tell the catalog "this exists, but do not route to it in production".
category (standard vs raw per ADR-0001) addresses neither concern: standard/raw distinguishes "safe-by-default" from "must-be-wrapped-in-SecureSkill", which is a runtime safety property, not a regulatory risk tier or a production-readiness flag. A skill can be standard (read-only) and still need to be excluded from production routing because it is a stub; a skill can be raw (destructive) and have a high regulatory tier because it is destructive — they are orthogonal axes.
2. Decision Drivers (Forces)¶
- Regulatory forward-compat. The EU AI Act risk tiers (minimal / limited / high) are the lingua franca downstream compliance teams already use. Aligning the catalog field with those terms means a future attestation report can read directly from
find-skillsoutput. - Single source of truth. The risk classification belongs to the catalog, not to every downstream client's local copy. Adding the field to
SkillDescriptorkeeps it versioned alongside the skill itself. - No silent stubs in production. Stub-flagged skills must be invisible to the default
find-skillsdiscovery path; an operator who explicitly wants to inspect them should still be able to. - Backward compatibility. Existing downstream code that calls
all_descriptors()orfind()must continue to work without modification. New behaviour is opt-in via a keyword argument. - No new ADR axis. The existing
categoryand the newrisk_tier/stubfields cover three orthogonal concerns. We do not want to invent a fourth field next quarter — see Considered Options for rejected alternatives.
3. Considered Options¶
- Option 1: A single
risk_levelfield that subsumescategory,risk_tier, and stub-ness. - Option 2: Two new orthogonal fields —
risk_tier: Literal["minimal", "limited", "high"]andstub: bool— added toSkillDescriptor(chosen). - Option 3: Out-of-band metadata file (e.g.
risk_classifications.yamlshipped beside the catalog). - Option 4: Skill-side decorator (
@high_risk,@stub) that the registry inspects via attribute lookup.
4. Decision Outcome¶
Chosen option: Option 2 (two orthogonal fields on SkillDescriptor), because each concern has a distinct audience (compliance reviewers vs. production operators) and the fields' Pydantic-style typing keeps the catalog schema searchable, validated at import, and immune to drift between the description and the reality.
The SkillDescriptor dataclass gains:
RiskTier = Literal["minimal", "limited", "high"]
@dataclass(frozen=True, slots=True)
class SkillDescriptor:
# ... existing fields ...
risk_tier: RiskTier = "minimal"
stub: bool = False
risk_tier defaults to "minimal" so newly-added skills must opt-in to higher tiers explicitly (deny-by-default for the regulatory axis). stub defaults to False so a missing flag never accidentally hides a real skill.
Concrete annotations on the catalog at acceptance time:
| Skill | category |
risk_tier |
stub |
Rationale |
|---|---|---|---|---|
find-skills |
standard | minimal | false | Pure metadata read |
authentication-gates |
standard | minimal | false | Frozen schema, signal-only |
pdf-extraction |
standard | minimal | false | Local file read; no egress |
agent-browser |
standard | limited | false | Outbound HTTP; data egress surface |
agentic-rag |
standard | limited | false | Touches enterprise indexes + public web |
execution-debugging |
raw | high | false | Arbitrary command execution (sandboxed) |
database-operations |
raw | high | true | Stub mutations; production must replace |
weather-api |
standard | minimal | true | Mock implementation |
all_descriptors(), find(), and the discovery skill's list_skills / search_skills tools accept include_stubs: bool (default True for all_descriptors and find to preserve back-compat for admin tooling; default False for the agent-facing discovery tools so a production router never picks a stub).
4.1. Validation / Compliance¶
risk_tieris aLiteralso an unknown tier name is a type error caught by mypy and at runtime by the dataclass's__post_init__of frozen dataclasses (Python rejects assignment).- The discovery skill returns
risk_tierandstubin the JSON envelope oflist_skills,search_skills, andload_skill_instructionsso a compliance pipeline can enumerate the catalog by tier without importing the skill modules. - A registry-completeness test asserts every shipped skill has
risk_tierset explicitly (no relying on the default in production code, only in third-party extensions).
5. Pros and Cons of the Options¶
Option 1: Single risk_level field¶
- Pros: One concept, one field.
- Cons: Conflates regulatory risk with stub-ness; breaks the "stubs default to hidden" rule (you would need a magic value like
"stub-high"); makes future axes (e.g. multi-region availability) require yet another magic value.
Option 2 (chosen): Two orthogonal fields¶
- Pros: Each field has a distinct audience and lifecycle. Compliance reads
risk_tier; production ops readsstub. Defaults are safe ("minimal"+False). Searchable via the existing registry helpers. - Cons: Two fields to remember. Mitigated by the
bootstrap()test that asserts both are present on every shipped descriptor.
Option 3: Out-of-band YAML¶
- Pros: Compliance team can edit without touching code.
- Cons: Two sources of truth; drift inevitable; type checking impossible.
Option 4: Skill-side decorators¶
- Pros: Co-located with the skill class.
- Cons: Doubles the surface (decorator + descriptor); the registry already exists as the single source of truth. Decorators on a class do not survive serialization to the discovery JSON envelope without parallel logic in
SkillDiscoverySkill.
6. Consequences¶
- Positive Consequences:
- Compliance teams can grep
risk_tieronce, not import-walk the catalog quarterly. find-skillshides stubs from production routers automatically. The two stubs that ship today (weather-api,database-operations) are no longer auto-discoverable; admin tooling setsinclude_stubs=Trueexplicitly.- Future telemetry can label spans by risk tier so dashboards split high-risk skill executions out of the default mean-latency view.
- Negative Consequences / Trade-offs:
- Every new skill must explicitly choose a tier. PRs adding a skill without setting
risk_tierrely on the default; we accept that as a code-review concern rather than a compile-time error so prototypes can ship faster. - Risks & Mitigations:
- Risk: a downstream client overrides
risk_tierafter registration, masking a high-tier skill. Mitigation: the descriptor is a frozen dataclass — re-registration replaces wholesale, no partial mutation. - Risk: the
stubfield becomes a dumping ground for "things we have not gotten around to". Mitigation: documentation guidance indocs/guides/adding-skills.mdexplicitly limitsstub=Trueto placeholder implementations destined to be replaced; ADR-0001 (raw vs standard) remains the security axis.
7. Implementation Plan & Status Updates¶
- Target Milestone/Release: v0.2.0 (current).
- Implementation Notes:
- 2026-05-06:
SkillDescriptorextended; every shipped descriptor annotated; discovery skill updated to surface both fields and honourinclude_stubs. Tests updated.
8. References / Related Documents¶
mirai_shared_skills/_registry.py—SkillDescriptor,all_descriptors,find.mirai_shared_skills/discovery/skill.py—SkillDiscoverySkillwithinclude_stubsplumbing.- ADR-0001: Standard vs Raw Skill Categorization — orthogonal runtime safety axis.
- ADR-0004: In-Process Skill Descriptor Registry — registry that holds the new fields.
- EU AI Act Article 16 — risk-tier vocabulary.