Appearance
Agent Loop
Every interaction with Agentcy — chat, scheduled task, channel message, sub-agent — runs through the same loop. The loop is small but each turn weaves together six subsystems: memory, knowledge graph, an LLM, the catalog meta-tools, policy + approval gates, and connectors. The diagram below shows one full turn end-to-end.
Code entry point:
agentcy-chat::agent_loop::AgentLoop::run.
How the pieces fit
| Stage | Crate | What happens | Why it matters |
|---|---|---|---|
| 1. Input | agentcy-api | A user message arrives via chat, channel, webhook, or cron | All entry points converge on the same loop — the rules don't fork by channel |
| 2. Context build | agentcy-memory, agentcy-graph | Memory recall + graph context + transcript become the LLM prompt | The agent answers with company-specific facts, not just general knowledge |
| 3. LLM turn | agentcy-agent | LLM streams tokens and tool calls; sees only 4 catalog meta-tools | Keeps context small even with 300+ connector tools available |
| 4. Stream | agentcy-chat | Every event becomes an SSE chunk to the caller | UI / API client sees tokens, tool calls, approvals as they happen |
| 5. Tool call gates | kg-policy, agentcy-chat::approval | Route → Policy → Approval → Backend → Result | Zero-trust by default — write tools never bypass policy |
| 6. After turn | agentcy-memory, agentcy-chat::compaction | Memory writes + auto-compaction | Long conversations stay tractable; facts persist across sessions |
The rest of this page walks each numbered stage in the diagram.
1 · Input
A "turn" begins when a message arrives. Sources are unified by agentcy-api:
- Chat —
/api/v1/chat/conversations/:id/messages(browser, mobile, SDK) - Channels — Slack, WhatsApp, email, Teams (via OpenFang gateway)
- Webhooks —
POST /api/v1/triggers/:id/firefrom external systems - Cron — scheduled triggers (cron string → fire as message)
- Sub-agent — a parent agent's
dispatch_sub_agentinvocation
All of them produce the same ChatMessage { role, content, metadata }, which feeds stage 2.
2 · Context build (every turn)
Before the LLM sees anything, the loop assembles the prompt from three sources:
2a · Memory recall
rust
let recalled = agentcy_memory::recall(user_message, top_k = 5).await?;agentcy-memory runs a vector search against memvid (per-org embedding store) and returns the top-k facts. They are injected as a system note:
<memory>
- 2026-03-12: User confirmed "deploys go to staging first, never prod directly"
- 2026-04-02: 'checkout-svc' rolled back twice this month — see PR 412
- preference: terse Slack updates, full detail in PR comments
</memory>Memories come from explicit writes (the agent calling remember(kind, text)) and implicit captures (decision points, tool results flagged as durable). See Concept: Memory System.
2b · Graph context
rust
let entities = agentcy_graph::context_for(realm, message).await?;agentcy-graph queries Neo4j scoped to the conversation's realm. It resolves entities mentioned in the message (services, repos, dashboards) and returns a small slice of the graph as a system note:
<graph_context realm="prod">
service: checkout-svc → owns: github.com/acme/checkout
service: checkout-svc → emits: grafana.checkout-latency
incident: INC-2207 → caused by: checkout-svc · resolved: 2026-04-15
</graph_context>This is what makes the agent company-aware — it knows your services, your repos, your incident history, before any tool call.
2c · System prompt + transcript
The full prompt is then assembled:
[system] org policies + active realm + catalog instructions
[system] <memory>...</memory>
[system] <graph_context>...</graph_context>
[user] ...prior turns (may be compacted)...
[tool] ...prior tool_results...
[user] <current message>If transcript tokens exceed the budget, §6b compaction runs first.
3 · LLM turn
agentcy-agent is the multi-provider abstraction. The LLM sees the assembled prompt plus exactly four catalog meta-tools:
rust
list_connectors() -> Vec<ConnectorSummary>
search_connector_tools(query: String, top_k: u32) -> Vec<ToolSpec>
execute_connector_tool(connector: String, tool: String, args: Value) -> ToolResult
request_connector_access(connector: String, reason: String) -> RequestIdWhy four meta-tools instead of all 300+? Because:
- Context stays small — even with 26 connectors and hundreds of tools, the prompt only lists four.
- Scoping is real —
list_connectorsreturns only what the realm + policy + user role allows. - Discovery is semantic —
search_connector_toolsdoes vector search overToolSpec.description, so the agent findskubernetes.restart_deploymentwhen asked to "restart the checkout service". request_connector_accessis the polite escape hatch when a tool exists but isn't enabled — opens an admin ticket in the Activity feed.
Provider matrix
| Provider | Streaming | Tools | Vision | Local? |
|---|---|---|---|---|
| Anthropic (Claude) | ✓ | ✓ | ✓ | — |
| OpenAI (GPT) | ✓ | ✓ | ✓ | — |
| Google (Gemini) | ✓ | ✓ | ✓ | — |
| Ollama | ✓ | model-dependent | model-dependent | ✓ |
| vLLM | ✓ | model-dependent | model-dependent | ✓ |
| LM Studio | ✓ | model-dependent | model-dependent | ✓ |
Configure in Settings → Models or via env (OPENAI_API_KEY, ANTHROPIC_API_KEY, …). Default model is org-scoped, overridable per conversation.
4 · Stream (SSE)
Every chat response is server-sent events. Chunks the LLM emits and the loop wraps:
text
event: message_start { "message_id": "msg_..." }
event: content_delta { "delta": "text tokens" }
event: tool_call_start { "tool_call_id": "tc_...", "name": "...", "connector": "..." }
event: tool_call_delta { "tool_call_id": "tc_...", "args_delta": "..." }
event: approval_required { "approval_id": "ap_...", "tool": "...", "args": {...} }
event: tool_result { "tool_call_id": "tc_...", "result": {...} }
event: message_end { "finish_reason": "end_turn", "usage": {...} }See REST API: Chat streaming for the full contract.
5 · Tool call gates
When the LLM emits a tool_call, four checks run in order. Any failure short-circuits.
5a · Route
The loop dispatches based on tool name:
list_connectors,search_connector_tools,execute_connector_tool,request_connector_access→ handled byToolCatalog(in-memory registry)- Anything else → looked up in
ConnectorToolProviderand dispatched to the relevant connector crate
5b · Policy gate (kg-policy)
Every call evaluates against active Rego policies:
rego
package agentcy
deny[msg] {
input.tool == "kubernetes.delete_deployment"
not input.user.roles[_] == "platform-admin"
msg := "delete operations require platform-admin role"
}Inputs to the policy: realm, user, tool, args, connector, time_of_day. Denials are audited (policy_audit_log table) and surfaced to the LLM as a tool_error so it can adapt rather than crash:
json
event: tool_result
data: {
"tool_call_id": "tc_...",
"error": {
"code": "policy_denied",
"message": "delete operations require platform-admin role",
"policy_id": "pol_..."
}
}See Concept: Zero-Trust Policies.
5c · Approval gate
Read tools auto-approve. Write tools block until a human says yes:
- Loop creates an approval record in Postgres
- Emits
approval_requiredon the SSE stream - Blocks on a Tokio oneshot channel keyed by
approval_id - Resumes when the UI / API posts to
POST /api/v1/chat/conversations/:id/approvals/:approval_idwith{"approved": true|false} - If the timeout (
approval_timeout_secs, default 300) fires first, the call denies withapproval_timeout
The frontend ConnectorApprovalRenderer is the canonical UX. See How-To: Approval Flows.
5d · Backend
The tool finally runs. The backend is one of:
- Connector — one of 15 source types (github, aws, gcp, vercel, supabase, sql, mongodb, kubernetes, openapi, mcp, csv, json, remote-execution, …). Each connector crate implements
ConnectorToolProviderwith its own auth + rate-limiting. - Knowledge graph (Neo4j) — for tools that read/write the graph:
graph.search,graph.related,graph.upsert_entity. All graph access is realm-filtered so cross-tenant leakage is impossible. - Worker job — long-running tools dispatch to
agentcy-worker(Redis queue). The loop returns a job_id immediately and follows up withtool_resultonce the worker finishes.
5e · Tool result
The result (or tool_error) is:
- Appended to the transcript so the next iteration sees it
- Streamed to the caller as
tool_result - Looped back to stage 3 — the LLM gets another turn
6 · After the turn
6a · Memory write
If the agent decides a fact is durable, it explicitly calls remember(kind, text):
rust
remember("user_preference", "shaked prefers terse Slack updates")
remember("decision", "we chose Postgres over DynamoDB for billing v2")
remember("incident_root", "INC-2207 root cause: stale Redis key on checkout-svc")The write goes to memvid (vectors) and Neo4j (typed edges). Both are realm-scoped. The next turn's memory recall (2a) and graph context (2b) will see these writes — that's the feedback loop that makes the agent learn over a session and across sessions.
6b · Compaction
Long conversations cross the token budget. agentcy-chat::compaction:
- Walks the transcript oldest-first
- Summarizes turn pairs into a single
<compacted>...</compacted>system note - Preserves: user decisions, tool results flagged durable, the active task description
- Discards: chitchat, redundant tool results
Compaction halves context typically; itself logs to the audit trail so operators can tune the policy.
7 · Optional
7a · Sub-agent dispatch
When Orchestrator (OpenFang) is enabled, any agent can hand a task off:
rust
dispatch_sub_agent(
name: "code_reviewer",
task: "Review PR #412 for SQL injection patterns",
input: { "pr_url": "..." }
)The child runs its own loop with its own tool catalog and its own realm scope — useful when a subtask should only see a narrow tool set. The child returns a summary, which appears as a tool_result in the parent's transcript.
See frontend/app/(dashboard)/orchestrator/ and backend/crates/agentcy-api/src/routes/orchestration_gateway/.
7b · message_end
When the LLM emits finish_reason: end_turn, the loop:
- Closes the SSE stream with
message_end { usage } - Records cost (tokens × per-token rate) to
chat_usage - Persists final transcript snapshot for replay
How memory, graph, and connectors work together
A common question: "If the graph already has my data, why do I need connectors?" Three roles:
| Connectors (15) | Knowledge graph (Neo4j) | Memory (memvid + edges) | |
|---|---|---|---|
| Source of truth | The vendor (GitHub, AWS, …) | Imported snapshot of vendor state + Agentcy-only relationships | Things the agent learned during conversations |
| Latency | Live API call (slower) | Indexed (millisecond reads) | Indexed (millisecond reads) |
| Freshness | Always current | Last ingestion | Real-time (writes during turn) |
| Use it for | Actions ("restart the deploy") · live state | Joins, traversals, "what depends on what" | Preferences, decisions, incident history |
| Realm-scoped | Per-source config | Per-node property | Per-org by default |
The agent loop uses all three:
- Stage 2a/2b pull from memory + graph to prime the LLM
- Stage 3 lets the LLM decide whether to also hit a live connector
- Stage 5d runs the connector or graph query
- Stage 6a writes durable facts back to memory + graph
That's the full flywheel: observe (ingest into graph) → recall (memory + graph in context) → act (connectors, gated) → remember (memory + graph writes) → observe again on the next turn.
Loop limits
Runtime caps prevent runaway loops:
| Setting | Default | Meaning |
|---|---|---|
CHAT_MAX_TOOL_ITERATIONS | 20 | Max tool calls in one turn before forcing end_turn |
CHAT_MAX_TOKENS_PER_TURN | 32 000 | Hard ceiling — triggers compaction or termination |
approval_timeout_secs | 300 | How long the loop blocks waiting for a human |
policy_eval_timeout_ms | 250 | Per-call policy evaluation budget |
All four are org-configurable in Settings → Security.
Gotchas
- Don't bypass the catalog. Tempting to allow-list direct connector tools into the system prompt; resist — you'll leak tools across realms and trip policy tests.
- Streaming retries are tricky. If the LLM stream dies mid-tool-call, the loop marks the call failed and retries with
retry_hint. Don't parse partial streams client-side. - Approval timeouts leak sessions. If your UI doesn't handle
approval_timeoutevents, users see a spinner forever. UseConnectorApprovalRenderer. - Memory writes are realm-scoped. A fact remembered in realm
prodis invisible in realmstaging. Often desired; sometimes surprising.
Next
- Concept: Memory System — how recall + writes work end-to-end
- Concept: Knowledge Graph — realms, ingestion, traversals
- Concept: Zero-Trust Policies — Rego rules, audit log, role definitions
- How-To: Chat API & Streaming — opening a conversation, handling SSE
- How-To: Tool Calling & the Catalog — invoking tools by hand, pinning connectors
- How-To: Approval Flows — approval UX end-to-end