Skip to content

🛠️ Execution & Debugging (execution-debugging)

ExecutionDebuggingSkill runs a shell command in an isolated sandbox directory and returns captured STDOUT, STDERR, and log files. Its purpose is to make agent-issued code/command execution verifiable: the agent can read what it just ran and assess outcomes objectively rather than asserting success.

This is the canonical worked example of a raw skill (see ADR-0001) — arbitrary subprocess execution is the most powerful primitive in the catalog and MUST be wrapped in SecureSkill before reaching an agent.

When to use it

  • The agent needs to verify a build, run a test, or check a deployment outcome.
  • You're debugging an agent's reasoning chain and want to inspect what shell commands it issued.
  • You're building a CI replay loop where the agent's commands need to be reproducible (the bundled sandbox_runner.py script supports this).

Tools

Tool Purpose
run_command Executes a shell-quoted command in a fresh temp dir; returns captured streams.
read_log_file Tails a log file and returns the last max_bytes of content.

Configuration

The skill takes no environment variables. The sandbox directory is created per call via tempfile.mkdtemp and torn down by the calling code (or left for forensic review).

Example

from mirai_core.core.types import SecureSkill, SecurityLevel
from mirai_shared_skills.execution import ExecutionDebuggingSkill

# REQUIRED: every raw skill must be wrapped before reaching an agent.
gated = SecureSkill(
    ExecutionDebuggingSkill(),
    policy={
        "run_command": SecurityLevel.REQUIRES_HITL,
        "read_log_file": SecurityLevel.SAFE,
    },
)

CI/CD replay

The skill ships mirai_shared_skills/execution/scripts/sandbox_runner.py — a CLI that replays the exact command the agent issued, using the same sandbox setup, so post-incident review can reproduce agent behavior.

Security considerations

raw per ADR-0001. Subprocess execution is unbounded by default. Recommended policy:

  • run_commandREQUIRES_HITL at minimum; BLOCKED for any production agent.
  • read_log_fileSAFE (read-only).

If the agent is allowed run_command autonomously, you accept that any command — including destructive ones — can run unattended.