System Design — agent-mesh + mem7 + agent7

May 2026. Living document.

The stack in one sentence

agent-mesh enforces what agents can do. mem7 remembers what happened. agent7 shows it all and lets humans intervene.

Design principles

Each project works alone. agent-mesh without mem7 still enforces policies. mem7 without agent-mesh is a standalone memory server. agent7 without the others is a governance dashboard. Integration is opt-in, not required.
Open-source runtime, product on top. agent-mesh and mem7 are MIT/Apache — free, adoptable, no lock-in. agent7 is the management plane that makes them manageable at scale. That's where the business model sits.
Decisions are facts. When a human approves an action or a supervisor auto-resolves a request, that decision is stored as a queryable fact in mem7. It doesn't vanish into a log file.
Write path matters more than read path. The system becomes valuable when events automatically flow between components — not when a human manually checks dashboards.

Components

agent-mesh (runtime / data plane)

Go binary. Sidecar proxy between agents and their tools.

What it does	How
Policy enforcement	YAML rules: allow, deny, human_approval per tool per agent
Approval queue	Pending requests, resolve via API or Claude Code prompt
Rate limiting + loop detection	Per-agent, per-tool, configurable
Tracing	JSONL trace files, OTEL export, session tracking
Grants	Temporary sudo-like bypass for specific agents
Supervisor protocol	External process polls pending approvals, auto-resolves routine ones

Transports: MCP stdio (Claude Code, Cursor) · MCP Streamable HTTP at POST /mcp (Anthropic Managed Agents, remote clients) · HTTP REST (POST /tool/{name})

Current state: v0.9.0, stable, docs polished.

mem7 (memory substrate)

Go binary. MCP server for persistent, searchable, governed memory.

What it does	How
Store/recall/search/forget	7 MCP tools, markdown source of truth + SQLite FTS5 index
Hybrid search	BM25 + dense cosine (Ollama/OpenAI) + LLM reranking
Structured recall	`memory_context` returns JSON for SDK consumption
Tag-scoped access	Any agent reads/writes its own observations via tags
Temporal range queries	`since` / `until` filters on RFC3339 timestamps
Python SDK	`pip install mem7`, provider-agnostic, wraps all tools via HTTP

Current state: v0.4.1, 71% LoCoMo benchmark, SDK + SSE transport + daemon mode shipped.

agent7 (management plane / product)

Next.js 16 + FastAPI + PostgreSQL. Dashboard, governance engine, supervisor UI.

What it does	How
Agent catalog	Registry of declared agents with metadata, scoring, lifecycle
Governance engine	Rules, scoring (3-axis), severity escalation, validation verdicts
Trace viewer	Ingests agent-mesh JSONL, aggregates stats, detects ghosts/orphans
Memory viewer	Reads mem7 via SDK, displays stored facts and decisions
Human approval UI	Shows pending approvals from agent-mesh, human clicks approve/reject
Dependency graph	Declared (YAML) + inferred (traces), impact analysis
Diff engine	Breaking change detection on agent config changes

Current state: Early stage, dashboard + traces + sessions + memory debug view working. No version tag yet.

The integrated flow

Today: tool call with human approval

Developer → Claude Code → agent-mesh → gmail.send_email
                                │
                    policy: human_approval
                                │
                    Claude Code shows permission prompt
                                │
                    Developer says "yes"
                                │
                    agent-mesh resolves → email sent
                    agent-mesh writes trace → JSONL
                                │
                    Decision is gone. Next time, same question.

Target: tool call with governed memory

Developer → Claude Code → agent-mesh → gmail.send_email
                                │
                    policy: human_approval
                                │
             ┌──────────────────┼──────────────────┐
             │                  │                   │
      Supervisor (L1)     Human (L2)          agent7 UI
      in agent-mesh       via Claude Code     shows pending
             │            or agent7 UI              │
             │                  │                   │
      checks mem7:              │                   │
      "has Marc approved        │                   │
       gmail.send before?"      │                   │
             │                  │                   │
      if yes (3+ times) ───► auto-approve           │
      if no ────────────► escalate to human ◄───────┘
                                │
                    human says "yes"
                                │
                    agent-mesh resolves → email sent
                    agent-mesh writes to mem7:
                      key: "decision.gmail.send_email.20260507"
                      value: "approved by Marc, recipient: X, subject: Y"
                      tags: [decision, approval, gmail]
                      agent: "agent-mesh"
                                │
                    agent7 displays decision in audit trail

What changes

Step	Before	After
Approval source	Always human	Supervisor checks mem7 first, escalates if unsure
Decision storage	Lost in terminal history	Stored as fact in mem7, queryable
Learning	None — same prompt every time	Supervisor learns from past decisions
Audit	Grep JSONL logs	agent7 dashboard, filterable, searchable
Visibility	Only the developer who approved	Any team member via agent7

Data flow between components

                    agent7 (management plane)
                    ┌────────────────────────────┐
                    │  dashboard, approval UI,   │
                    │  governance, audit trail    │
                    └──────┬──────────┬──────────┘
                           │          │
              reads via    │          │  reads via
              HTTP API     │          │  Python SDK
                           │          │
                           ▼          ▼
┌─────────────────┐    ┌─────────────────┐
│   agent-mesh    │───►│     mem7        │
│   (runtime)     │    │   (memory)      │
│                 │    │                 │
│ • traces JSONL  │    │ • facts         │
│ • approvals     │    │ • decisions     │
│ • policies      │    │ • observations  │
└────────┬────────┘    └─────────────────┘
         │                     ▲
         │  writes decisions   │
         └─────────────────────┘
         (opt-in, if mem7 configured)

Write paths (who writes what where)

Writer	Target	What	When
Any agent	mem7	Observations, facts, context	During execution, via MCP store
agent-mesh	mem7	Approval/rejection decisions	On approval resolve (opt-in)
agent-mesh	JSONL	Tool call traces	Every tool call (always)
Supervisor	mem7	Auto-resolve rationale	On auto-approve (opt-in)
Human via agent7	agent-mesh API	Approve/reject action	Clicking in UI
agent7	PostgreSQL	Aggregated stats, governance scores	On trace ingestion, on sync

Read paths (who reads what from where)

Reader	Source	What	When
Supervisor	mem7	Past decisions for same pattern	Before deciding to auto-approve
Any agent	mem7	Stored facts, context, history	During execution, via MCP search
agent7	agent-mesh API	Pending approvals	Polling for approval UI
agent7	agent-mesh JSONL	Trace history	On ingestion (CLI or API push)
agent7	mem7 (SDK)	Stored decisions, facts	For memory viewer, audit trail
agent7	PostgreSQL	Scores, rules, lifecycle	For governance dashboard

Approval architecture (detailed)

Trust hierarchy

Level 0: Policy engine (agent-mesh)
         Static rules. Instant. No judgment.
         allow reads, deny deletes, require approval for sends.

Level 1: Built-in supervisor (in agent-mesh, Go)
         mem7 lookup. ~100ms. Pattern matching only.
         Checks past decisions: 3+ approvals, 0 rejections → auto-approve.
         Escalates unknowns to Level 1+.

Level 1+: External supervisor (in agent7, Python)
          Rule engine + Ollama LLM. ~2s rules, ~20s LLM. Bounded judgment.
          Handles novel cases, complex conditions, injection detection.
          Escalates unknowns to Level 2.

Level 2: Human
         Full judgment. Slow. Expensive attention.
         Sees pending in Claude Code prompt OR agent7 UI.
         Decision written back to mem7 via agent-mesh.

Two supervisor layers. Level 1 is built into agent-mesh — a MemoryReader that queries mem7 before the approval queue. It's a pre-filter, not a full resolver. Level 1+ is the external Python supervisor in agent7 that implements the supervisor protocol — it polls pending approvals and resolves them with rule evaluation + LLM. The supervisor/ package inside agent-mesh handles content redaction and injection detection on the protocol's outbound payloads (RedactParams, DetectInjection) — a separate concern from both layers.

Supervisor decision logic (pseudocode)

def evaluate(request, mem7_client):
    # Check mem7 for similar past decisions
    past = mem7_client.context(
        f"{request.tool} {request.agent}",
        tags=["decision"],
        limit=10
    )

    approved_count = sum(1 for m in past if "approved" in m.value)
    rejected_count = sum(1 for m in past if "rejected" in m.value)

    # Pattern: consistently approved → auto-approve
    if approved_count >= 3 and rejected_count == 0:
        return AutoApprove(reason=f"approved {approved_count} times before")

    # Pattern: recently rejected → auto-deny
    recent_reject = [m for m in past if "rejected" in m.value and is_recent(m, days=7)]
    if recent_reject:
        return AutoDeny(reason="rejected recently, escalate")

    # Unknown pattern → escalate to human
    return Escalate(reason="no clear precedent")

Where the approval UI lives

Claude Code terminal — the developer gets the prompt inline. This is the current flow, works for solo use.

agent7 web UI — for team use, overnight runs, or when multiple agents generate approvals faster than one human can handle. agent7 polls agent-mesh's pending queue, displays context, human clicks. agent7 POSTs back to agent-mesh /approval/resolve.

Both are thin clients of the agent-mesh approval API. If agent7 goes down, Claude Code still works. If both go down, requests queue in agent-mesh until someone resolves them (fail-safe, not fail-open).

Deployment

Solo developer (current setup)

laptop
├── agent-mesh (Go binary, sidecar)
│   ├── config.yaml (policies)
│   └── traces/ (JSONL)
├── mem7 (Go binary, MCP stdio via agent-mesh)
│   └── ~/.mem7/ (markdown + SQLite)
└── Claude Code / Cursor (agent)

No agent7 needed. agent-mesh + mem7 provide full governance + memory. The developer is the human-in-the-loop via terminal prompts.

Small team

shared server
├── agent-mesh (central, HTTP mode)
├── mem7 serve (HTTP, shared memory)
└── agent7 (dashboard + supervisor)
    └── PostgreSQL

developer laptops
└── agents connect to shared agent-mesh

agent7 adds value: team visibility, approval UI for shared agents, governance scoring, audit trail.

Enterprise (future)

agent7 SaaS (hosted)
├── governance engine
├── approval UI
├── audit + compliance
└── policy push → customer agent-mesh

customer infra
├── agent-mesh (sidecar per agent)
├── mem7 (per-team or central)
└── agents (any framework)

Anthropic Managed Agents (cloud harness + governed tools)

Anthropic cloud (harness, sandbox, multi-agent orchestration)
├── Managed Agent coordinator (Opus)
│   ├── sub-agents (reviewer, tester, security)
│   └── mcp_toolset → MCP Streamable HTTP
│
└── POST /mcp ──────────────────────────────►
                                              your infra
                                              ├── agent-mesh (policies, approval, traces)
                                              │   └── POST /rpc ──► mem7 (decisions)
                                              └── upstream MCP servers (ollama, arch7, ...)

What Anthropic provides: cloud sandbox, session durability, multi-agent coordination (threads), outcome grading (rubrics), vault credential management.

What agent-mesh adds: fine-grained deny policies (Managed Agents only have allow/ask), per-agent rate limiting, temporal grants, decision persistence to mem7, OTEL traces, governed memory cross-session.

Integration point: agent-mesh POST /mcp endpoint serves MCP Streamable HTTP. Managed Agent's MCP connector discovers tools via tools/list, calls them via tools/call. Policies apply transparently. Vault injects Authorization: Bearer agent:<id> for per-agent policy evaluation.

agent7 role: same as for any deployment — dashboard, approval UI, governance scoring, audit trail. The Managed Agent sessions generate traces and decisions that flow to agent7 like any other agent.

Implementation priorities

Phase 1: Decision write path (agent-mesh → mem7) ✓

Shipped in agent-mesh v0.8.7.

When an approval resolves in agent-mesh, the decision is stored in mem7 if configured.

Config field: memory.url + optional memory.token
On resolve: async POST to mem7 /rpc with memory_store
Key format: decision.<tool>.<approval_id>
Tags: [decision, approved|denied, <tool>, agent:<id>]
Value: human-readable summary — "approved by X — agent:Y tool:Z reason:..."
Metrics: agent_mesh_mem7_writes_{attempted,succeeded,failed}_total
Graceful degradation: failing mem7 never blocks approvals

Phase 2: Built-in supervisor reads mem7 ✓

Shipped in agent-mesh v0.9.1.

agent-mesh queries mem7 before submitting to the approval queue. This is the built-in Level 1 supervisor — a pre-filter that handles routine patterns.

MemoryReader queries mem7 memory_search with tool name + agent + tags=["decision"]
Counts past approvals/rejections from search results
Auto-approve if >= min_approvals (default 3) with 0 rejections
Escalate if ambiguous, rejected, or mem7 is down
Auto-approved decisions traced as supervisor:mem7 and written back to mem7
Config: supervisor.auto_approve (default true), supervisor.min_approvals (default 3)

Complements the external Python supervisor (in agent7): the built-in handles routine patterns (~100ms); the external supervisor handles novel cases with rule engine + Ollama LLM evaluation (~20s). Both escalate unknowns to humans.

Phase 3: agent7 reads mem7 via SDK

Replace the current ad-hoc memory debug view with proper SDK integration.

agent7 changes: - pip install mem7 in backend dependencies - Backend service: Mem7Client wrapping the SDK, configured via env var - API routes: /api/v1/memory/search, /api/v1/memory/decisions - Frontend: memory viewer page, decision audit trail with filters

Estimated effort: ~500 lines Python + frontend.

Phase 4: agent7 approval UI as thin client

agent7 displays pending approvals from agent-mesh and lets humans resolve them.

agent7 changes: - Poll agent-mesh /approval/pending (API already exists) - Display: tool call details, agent identity, past decisions from mem7 (context) - Action: approve/reject button → POST agent-mesh /approval/resolve - agent-mesh handles the mem7 write (phase 1), agent7 doesn't write to mem7 directly

Estimated effort: ~400 lines Python + frontend.

What this enables (end state)

Adaptive governance. Policies start strict (human approval for everything). As the system collects approved patterns in mem7, the supervisor auto-approves routine actions. Governance gets less intrusive over time without getting less safe.
Cross-agent memory. Agent A stores an observation. Agent B searches and finds it. The supervisor checks if a decision was already made. All through the same mem7 store, scoped by tags.
Audit without effort. Every decision is a fact in mem7. Every tool call is a trace in agent-mesh. agent7 joins them: "this agent called this tool, it was approved because of this past decision, here's the full chain." Compliance teams query, they don't grep.
Progressive adoption. Day 1: install agent-mesh, get policies and tracing. Day 30: add mem7, get persistent memory. Day 60: add agent7, get visibility and team governance. Each step is independently valuable.

Three projects, one system. Independent by default, powerful together.