🛠️ Execution & Debugging (execution-debugging)¶
ExecutionDebuggingSkill runs a shell command in an isolated sandbox directory and returns captured STDOUT, STDERR, and log files. Its purpose is to make agent-issued code/command execution verifiable: the agent can read what it just ran and assess outcomes objectively rather than asserting success.
This is the canonical worked example of a raw skill (see ADR-0001) — arbitrary subprocess execution is the most powerful primitive in the catalog and MUST be wrapped in SecureSkill before reaching an agent.
When to use it¶
- The agent needs to verify a build, run a test, or check a deployment outcome.
- You're debugging an agent's reasoning chain and want to inspect what shell commands it issued.
- You're building a CI replay loop where the agent's commands need to be reproducible (the bundled
sandbox_runner.pyscript supports this).
Tools¶
| Tool | Purpose |
|---|---|
run_command |
Executes a shell-quoted command in a fresh temp dir; returns captured streams. |
read_log_file |
Tails a log file and returns the last max_bytes of content. |
Configuration¶
The skill takes no environment variables. The sandbox directory is created per call via tempfile.mkdtemp and torn down by the calling code (or left for forensic review).
Example¶
from mirai_core.core.types import SecureSkill, SecurityLevel
from mirai_shared_skills.execution import ExecutionDebuggingSkill
# REQUIRED: every raw skill must be wrapped before reaching an agent.
gated = SecureSkill(
ExecutionDebuggingSkill(),
policy={
"run_command": SecurityLevel.REQUIRES_HITL,
"read_log_file": SecurityLevel.SAFE,
},
)
CI/CD replay¶
The skill ships mirai_shared_skills/execution/scripts/sandbox_runner.py — a CLI that replays the exact command the agent issued, using the same sandbox setup, so post-incident review can reproduce agent behavior.
Security considerations¶
raw per ADR-0001. Subprocess execution is unbounded by default. Recommended policy:
run_command→REQUIRES_HITLat minimum;BLOCKEDfor any production agent.read_log_file→SAFE(read-only).
If the agent is allowed run_command autonomously, you accept that any command — including destructive ones — can run unattended.
Related¶
- ADR-0001: Standard vs Raw Skill Categorization — why this skill is
raw. - agent-core ADR-0012: Declarative Security Policies — the
SecureSkillwrapper. - Database Operations — the other
rawskill in the catalog.