🛠️ Execution & Debugging (`execution-debugging`)¶

ExecutionDebuggingSkill runs a shell command in an isolated sandbox directory and returns captured STDOUT, STDERR, and log files. Its purpose is to make agent-issued code/command execution verifiable: the agent can read what it just ran and assess outcomes objectively rather than asserting success.

This is the canonical worked example of a raw skill (see ADR-0001) — arbitrary subprocess execution is the most powerful primitive in the catalog and MUST be wrapped in SecureSkill before reaching an agent.

When to use it¶

The agent needs to verify a build, run a test, or check a deployment outcome.
You're debugging an agent's reasoning chain and want to inspect what shell commands it issued.
You're building a CI replay loop where the agent's commands need to be reproducible (the bundled sandbox_runner.py script supports this).

Tools¶

Tool	Purpose
`run_command`	Executes a shell-quoted command in a fresh temp dir; returns captured streams.
`read_log_file`	Tails a log file and returns the last `max_bytes` of content.

Configuration¶

The skill takes no environment variables. The sandbox directory is created per call via tempfile.mkdtemp and torn down by the calling code (or left for forensic review).

Example¶

from mirai_core.core.types import SecureSkill, SecurityLevel
from mirai_shared_skills.execution import ExecutionDebuggingSkill

# REQUIRED: every raw skill must be wrapped before reaching an agent.
gated = SecureSkill(
    ExecutionDebuggingSkill(),
    policy={
        "run_command": SecurityLevel.REQUIRES_HITL,
        "read_log_file": SecurityLevel.SAFE,
    },
)

CI/CD replay¶

The skill ships mirai_shared_skills/execution/scripts/sandbox_runner.py — a CLI that replays the exact command the agent issued, using the same sandbox setup, so post-incident review can reproduce agent behavior.

Security considerations¶

raw per ADR-0001. Subprocess execution is unbounded by default. Recommended policy:

run_command → REQUIRES_HITL at minimum; BLOCKED for any production agent.
read_log_file → SAFE (read-only).

If the agent is allowed run_command autonomously, you accept that any command — including destructive ones — can run unattended.

ADR-0001: Standard vs Raw Skill Categorization — why this skill is raw.
agent-core ADR-0012: Declarative Security Policies — the SecureSkill wrapper.
Database Operations — the other raw skill in the catalog.

🛠️ Execution & Debugging (execution-debugging)¶