Supervisor Protocol

Agent Mesh exposes a rich approval API so that external supervisor agents can review and resolve approval requests on behalf of humans.

The supervisor is not built into agent-mesh. It's any external process — Python script, Go service, LangChain agent, Claude API call — that polls for pending approvals and resolves them. Agent-mesh provides the protocol; you bring the logic.

Why

Human-in-the-loop doesn't scale:

Multiple concurrent agents generate approval requests faster than a human can review them
Overnight runs stall at approval gates while the human sleeps
80% of approvals are routine ("yes, you can write to the project directory") — only 20% need real judgment

A supervisor agent handles the routine and escalates the rest.

Trust hierarchy

Level 0: Policy engine         — static rules, instant, no judgment
Level 1: Supervisor agent      — dynamic evaluation, fast, bounded judgment
Level 2: Human                 — full judgment, slow, expensive attention

Level	Handles	Example
Policy (L0)	Black/white rules	`allow` reads, `deny` deletes
Supervisor (L1)	Gray area, routine	write to project dir, draft email to known contact
Human (L2)	High-stakes, ambiguous	send to external recipient, write outside project scope

The goal is not to remove the human. It's to protect human attention for decisions that actually need it.

Architecture

                          ┌─────────────────────────────┐
                          │        agent-mesh            │
                          │                              │
Worker Agent ────────────>│  policy ──> approval store   │
                          │                  │           │
                          │                  v           │
                          │         GET /approvals       │
                          │              │               │
                          └──────────────┼───────────────┘
                                         │
                                         v
                               ┌───────────────────┐
                               │ Supervisor Agent   │
                               │                    │
                               │  evaluate risk     │
                               │  check context     │
                               │  decide:           │
                               │   - approve        │──> POST /approvals/{id}/approve
                               │   - deny           │──> POST /approvals/{id}/deny
                               │   - escalate       │──> notify human
                               │                    │
                               └───────────────────┘
                                         │
                                    (escalation only)
                                         │
                                         v
                                       Human

API reference

List pending approvals

GET /approvals?status=pending

Optional tool filter for domain-specific supervisors:

GET /approvals?status=pending&tool=filesystem.*
GET /approvals?status=pending&tool=gmail.*

The tool parameter supports glob patterns (Go filepath.Match syntax).

Response:

[
  {
    "id": "a1b2c3d4e5f67890",
    "agent_id": "claude",
    "tool": "filesystem.write_file",
    "params": {"path": "/home/user/project/main.go", "content": "..."},
    "policy_rule": "claude:rule-2",
    "status": "pending",
    "created_at": "2026-04-10T14:30:00Z",
    "remaining": "4m30s",
    "injection_risk": false
  }
]

When supervisor.expose_content: false is configured, large or content-like param values are replaced with structural metadata:

{
  "params": {
    "path": "/home/user/project/main.go",
    "content": {
      "content_length": 1234,
      "content_sha256": "a1b2c3d4...",
      "content_type_detected": "text/plain"
    }
  }
}

Get approval detail (with context)

GET /approvals/{id}

Returns the approval plus the agent's recent activity and active grants — context for evaluation:

{
  "id": "a1b2c3d4e5f67890",
  "agent_id": "claude",
  "tool": "filesystem.write_file",
  "params": {"path": "/home/user/project/main.go"},
  "policy_rule": "claude:rule-2",
  "status": "pending",
  "created_at": "2026-04-10T14:30:00Z",
  "remaining": "4m30s",
  "injection_risk": false,
  "recent_traces": [
    {
      "trace_id": "abc123...",
      "tool": "filesystem.read_file",
      "policy": "allow",
      "timestamp": "2026-04-10T14:29:55Z"
    },
    {
      "trace_id": "def456...",
      "tool": "filesystem.list_directory",
      "policy": "allow",
      "timestamp": "2026-04-10T14:29:50Z"
    }
  ],
  "active_grants": [
    {
      "id": "g_789",
      "agent": "claude",
      "tools": "filesystem.read_*",
      "expires_at": "2026-04-10T15:00:00Z",
      "remaining": "30m0s",
      "granted_by": "http:127.0.0.1:9090"
    }
  ]
}

Approve

POST /approvals/{id}/approve

Body (all fields optional):

{
  "resolved_by": "agent:supervisor",
  "reasoning": "write to project source directory, valid Go content, consistent with recent activity",
  "confidence": 0.95
}

Field	Type	Description
`resolved_by`	string	Who resolved it. Defaults to `http:<remote_addr>`
`reasoning`	string	Explanation of the decision (stored in traces)
`confidence`	float64	0.0–1.0, how confident the supervisor is (stored in traces)

Responses:

Status	Meaning
200	Approved. The original tool call is forwarded to the backend.
404	Approval ID not found
409	Already resolved

Deny

POST /approvals/{id}/deny

Same body format as approve. On deny, the agent receives a 403 response.

Configuration

Content isolation

When expose_content is false, the approval API replaces raw content values with structural metadata. This reduces the prompt injection attack surface — the supervisor evaluates structure, not content.

supervisor:
  expose_content: false    # default: true

Content-like keys (content, body, data, file_content, text) and any string value longer than 256 bytes are redacted to:

{
  "content_length": 1234,
  "content_sha256": "a1b2c3d4...",
  "content_type_detected": "text/plain"
}

Detected content types: application/json, text/xml, text/html, text/plain, application/octet-stream.

Injection detection

Every approval view includes an injection_risk boolean. It is true when tool params contain patterns commonly used in prompt injection attacks:

"ignore previous instructions"
"you are now"
"system prompt:"
"override policy"
"pre-approved"
"do not deny" / "do not escalate"
"pretend you are" / "act as"
"IMPORTANT: approve/ignore/override"
"confidence should be"

This is a best-effort heuristic, not a security guarantee. It raises the cost of attack and flags suspicious requests for closer review.

Building a supervisor

A supervisor is any program that implements this loop:

import requests
import time

MESH_URL = "http://localhost:9090"
CONFIDENCE_THRESHOLD = 0.8

while True:
    # 1. Poll for pending approvals (optionally filter by tool)
    pending = requests.get(f"{MESH_URL}/approvals",
        params={"status": "pending", "tool": "filesystem.*"}).json()

    for approval in pending:
        # 2. Get full context
        detail = requests.get(f"{MESH_URL}/approvals/{approval['id']}").json()

        # 3. Evaluate
        verdict = evaluate(detail)

        # 4. Act
        if verdict["confidence"] >= CONFIDENCE_THRESHOLD:
            requests.post(f"{MESH_URL}/approvals/{approval['id']}/{verdict['action']}", json={
                "resolved_by": "agent:supervisor",
                "reasoning": verdict["reasoning"],
                "confidence": verdict["confidence"],
            })
        else:
            # Escalate to human — leave pending
            notify_human(detail, verdict["reasoning"])

    time.sleep(2)

Evaluation guidelines

The supervisor should evaluate structural properties, not interpret content semantically:

Good (structural)	Bad (semantic)
"path is within project directory"	"content says it's pre-approved"
"file type matches extension"	"the code looks safe"
"agent has been writing to this area recently"	"the comment says this is a fix"
"request rate is normal"	"the agent seems trustworthy"

Structural evaluation is resistant to prompt injection. Semantic evaluation is vulnerable.

Context signals available

From GET /approvals/{id}:

Signal	Field	Use
What tool	`tool`	Scope check
What params	`params` (or metadata if redacted)	Path analysis, size check
Which agent	`agent_id`	Trust level per agent
Which policy rule	`policy_rule`	Why it needs approval
Injection risk	`injection_risk`	Auto-escalate if true
Recent activity	`recent_traces`	Pattern consistency check
Active grants	`active_grants`	What's already trusted
Time remaining	`remaining`	Urgency signal

Escalation criteria

Recommended reasons to escalate to a human instead of auto-resolving:

Condition	Why
Confidence below threshold	Supervisor is unsure
`injection_risk: true`	Suspicious content in params
Path outside project scope	Higher risk write
Tool is high-stakes	Email send, database write, deploy
Rate anomaly	Unusual number of requests from the agent
Agent is unknown	No recent trace history to establish pattern

Observability

Every supervisor decision is recorded in traces:

# All supervisor-resolved traces
curl http://localhost:9090/traces | \
  jq '[.[] | select(.supervisor_reasoning != null)]'

# Average supervisor confidence
curl http://localhost:9090/traces | \
  jq '[.[] | select(.supervisor_confidence > 0) | .supervisor_confidence] | add / length'

# Low-confidence approvals (potential miscalibration)
curl http://localhost:9090/traces | \
  jq '[.[] | select(.supervisor_confidence > 0 and .supervisor_confidence < 0.85)]'

Trace fields added by supervisor resolution:

Field	Type	Description
`supervisor_reasoning`	string	Why the supervisor approved/denied
`supervisor_confidence`	float64	How confident the supervisor was (0.0–1.0)
`approved_by`	string	`"agent:supervisor"` (vs `"http:127.0.0.1:..."` for humans)

The spectrum of autonomy

A single tool can move along this spectrum over time as trust is established:

deny <──── human_approval <──── supervisor <──── allow

"never"    "human decides"    "agent decides,     "always"
                               human on escalation"

Example progression for gmail.send:

Day 1: deny — not ready
Day 30: human_approval — human reviews every send
Day 90: supervisor — agent reviews, escalates external recipients
Day 180: allow for internal recipients — trusted pattern

The policies evolve as trust is established through trace data.

Multi-supervisor topology

Multiple supervisors can coexist with different scopes using the ?tool= filter:

Worker Agents ──> agent-mesh ──> Approval Store
                                      │
                         ┌────────────┼────────────┐
                         v            v            v
                   Supervisor A  Supervisor B    Human
                   (filesystem)  (gmail)        (everything else)

# Supervisor A — only filesystem approvals
GET /approvals?status=pending&tool=filesystem.*

# Supervisor B — only gmail approvals
GET /approvals?status=pending&tool=gmail.*

Domain-specific supervision improves judgment quality: the filesystem supervisor understands code patterns, the gmail supervisor understands communication norms.

Reference implementation

A complete supervisor implementation is available in the agent7 repository at backend/app/services/supervisor/.

Features

Rule engine — first-match-wins rules with a simple DSL (starts_with, equals, contains). Fast path for known patterns (0ms).
LLM fallback — when no rule matches, calls Ollama for evaluation (~20s). Configurable model, system prompt, and confidence threshold.
Process manager — auto-spawns agent-mesh when it's down, monitors health, restarts on crash.
Memory integration — stores decisions in memory-mcp (via agent-mesh), recalls them on startup for context continuity across sessions.
JSONL audit trail — every decision logged with reasoning, confidence, rule matched, and evaluation time.

Quick start

cd ~/agent7
python -m backend.app.services.supervisor --config supervisor.yaml

See docs/deployment-modes.md for full setup options.