Agent Loop

Every interaction with Agentcy — chat, scheduled task, channel message, sub-agent — runs through the same loop. The loop is small but each turn weaves together six subsystems: memory, knowledge graph, an LLM, the catalog meta-tools, policy + approval gates, and connectors. The diagram below shows one full turn end-to-end.

Agent loop full cycle: input → context build (memory recall, graph context, system prompt) → LLM with four catalog meta-tools → tool call gates (route, policy, approval) → backend (connector or knowledge graph) → tool_result loops back to the LLM and streams to SSE; after the turn, memory writes and compaction run.

Code entry point: agentcy-chat::agent_loop::AgentLoop::run.

How the pieces fit

Stage	Crate	What happens	Why it matters
1. Input	`agentcy-api`	A user message arrives via chat, channel, webhook, or cron	All entry points converge on the same loop — the rules don't fork by channel
2. Context build	`agentcy-memory`, `agentcy-graph`	Memory recall + graph context + transcript become the LLM prompt	The agent answers with company-specific facts, not just general knowledge
3. LLM turn	`agentcy-agent`	LLM streams tokens and tool calls; sees only 4 catalog meta-tools	Keeps context small even with 300+ connector tools available
4. Stream	`agentcy-chat`	Every event becomes an SSE chunk to the caller	UI / API client sees tokens, tool calls, approvals as they happen
5. Tool call gates	`kg-policy`, `agentcy-chat::approval`	Route → Policy → Approval → Backend → Result	Zero-trust by default — write tools never bypass policy
6. After turn	`agentcy-memory`, `agentcy-chat::compaction`	Memory writes + auto-compaction	Long conversations stay tractable; facts persist across sessions

The rest of this page walks each numbered stage in the diagram.

1 · Input

A "turn" begins when a message arrives. Sources are unified by agentcy-api:

Chat — /api/v1/chat/conversations/:id/messages (browser, mobile, SDK)
Channels — Slack, WhatsApp, email, Teams (via OpenFang gateway)
Webhooks — POST /api/v1/triggers/:id/fire from external systems
Cron — scheduled triggers (cron string → fire as message)
Sub-agent — a parent agent's dispatch_sub_agent invocation

All of them produce the same ChatMessage { role, content, metadata }, which feeds stage 2.

2 · Context build (every turn)

Before the LLM sees anything, the loop assembles the prompt from three sources:

2a · Memory recall

rust

let recalled = agentcy_memory::recall(user_message, top_k = 5).await?;

agentcy-memory runs a vector search against memvid (per-org embedding store) and returns the top-k facts. They are injected as a system note:

<memory>
- 2026-03-12: User confirmed "deploys go to staging first, never prod directly"
- 2026-04-02: 'checkout-svc' rolled back twice this month — see PR 412
- preference: terse Slack updates, full detail in PR comments
</memory>

Memories come from explicit writes (the agent calling remember(kind, text)) and implicit captures (decision points, tool results flagged as durable). See Concept: Memory System.

2b · Graph context

rust

let entities = agentcy_graph::context_for(realm, message).await?;

agentcy-graph queries Neo4j scoped to the conversation's realm. It resolves entities mentioned in the message (services, repos, dashboards) and returns a small slice of the graph as a system note:

<graph_context realm="prod">
service: checkout-svc → owns: github.com/acme/checkout
service: checkout-svc → emits: grafana.checkout-latency
incident: INC-2207 → caused by: checkout-svc · resolved: 2026-04-15
</graph_context>

This is what makes the agent company-aware — it knows your services, your repos, your incident history, before any tool call.

2c · System prompt + transcript

The full prompt is then assembled:

[system]   org policies + active realm + catalog instructions
[system]   <memory>...</memory>
[system]   <graph_context>...</graph_context>
[user]     ...prior turns (may be compacted)...
[tool]     ...prior tool_results...
[user]     <current message>

If transcript tokens exceed the budget, §6b compaction runs first.

3 · LLM turn

agentcy-agent is the multi-provider abstraction. The LLM sees the assembled prompt plus exactly four catalog meta-tools:

rust

list_connectors() -> Vec<ConnectorSummary>
search_connector_tools(query: String, top_k: u32) -> Vec<ToolSpec>
execute_connector_tool(connector: String, tool: String, args: Value) -> ToolResult
request_connector_access(connector: String, reason: String) -> RequestId

Why four meta-tools instead of all 300+? Because:

Context stays small — even with 26 connectors and hundreds of tools, the prompt only lists four.
Scoping is real — list_connectors returns only what the realm + policy + user role allows.
Discovery is semantic — search_connector_tools does vector search over ToolSpec.description, so the agent finds kubernetes.restart_deployment when asked to "restart the checkout service".
request_connector_access is the polite escape hatch when a tool exists but isn't enabled — opens an admin ticket in the Activity feed.

Provider matrix

Provider	Streaming	Tools	Vision	Local?
Anthropic (Claude)	✓	✓	✓	—
OpenAI (GPT)	✓	✓	✓	—
Google (Gemini)	✓	✓	✓	—
Ollama	✓	model-dependent	model-dependent	✓
vLLM	✓	model-dependent	model-dependent	✓
LM Studio	✓	model-dependent	model-dependent	✓

Configure in Settings → Models or via env (OPENAI_API_KEY, ANTHROPIC_API_KEY, …). Default model is org-scoped, overridable per conversation.

4 · Stream (SSE)

Every chat response is server-sent events. Chunks the LLM emits and the loop wraps:

text

event: message_start       { "message_id": "msg_..." }
event: content_delta       { "delta": "text tokens" }
event: tool_call_start     { "tool_call_id": "tc_...", "name": "...", "connector": "..." }
event: tool_call_delta     { "tool_call_id": "tc_...", "args_delta": "..." }
event: approval_required   { "approval_id": "ap_...", "tool": "...", "args": {...} }
event: tool_result         { "tool_call_id": "tc_...", "result": {...} }
event: message_end         { "finish_reason": "end_turn", "usage": {...} }

See REST API: Chat streaming for the full contract.

5 · Tool call gates

When the LLM emits a tool_call, four checks run in order. Any failure short-circuits.

5a · Route

The loop dispatches based on tool name:

list_connectors, search_connector_tools, execute_connector_tool, request_connector_access → handled by ToolCatalog (in-memory registry)
Anything else → looked up in ConnectorToolProvider and dispatched to the relevant connector crate

5b · Policy gate (kg-policy)

Every call evaluates against active Rego policies:

rego

package agentcy

deny[msg] {
    input.tool == "kubernetes.delete_deployment"
    not input.user.roles[_] == "platform-admin"
    msg := "delete operations require platform-admin role"
}

Inputs to the policy: realm, user, tool, args, connector, time_of_day. Denials are audited (policy_audit_log table) and surfaced to the LLM as a tool_error so it can adapt rather than crash:

json

event: tool_result
data: {
  "tool_call_id": "tc_...",
  "error": {
    "code": "policy_denied",
    "message": "delete operations require platform-admin role",
    "policy_id": "pol_..."
  }
}

See Concept: Zero-Trust Policies.

5c · Approval gate

Read tools auto-approve. Write tools block until a human says yes:

Loop creates an approval record in Postgres
Emits approval_required on the SSE stream
Blocks on a Tokio oneshot channel keyed by approval_id
Resumes when the UI / API posts to POST /api/v1/chat/conversations/:id/approvals/:approval_id with {"approved": true|false}
If the timeout (approval_timeout_secs, default 300) fires first, the call denies with approval_timeout

The frontend ConnectorApprovalRenderer is the canonical UX. See How-To: Approval Flows.

5d · Backend

The tool finally runs. The backend is one of:

Connector — one of 15 source types (github, aws, gcp, vercel, supabase, sql, mongodb, kubernetes, openapi, mcp, csv, json, remote-execution, …). Each connector crate implements ConnectorToolProvider with its own auth + rate-limiting.
Knowledge graph (Neo4j) — for tools that read/write the graph: graph.search, graph.related, graph.upsert_entity. All graph access is realm-filtered so cross-tenant leakage is impossible.
Worker job — long-running tools dispatch to agentcy-worker (Redis queue). The loop returns a job_id immediately and follows up with tool_result once the worker finishes.

5e · Tool result

The result (or tool_error) is:

Appended to the transcript so the next iteration sees it
Streamed to the caller as tool_result
Looped back to stage 3 — the LLM gets another turn

6 · After the turn

6a · Memory write

If the agent decides a fact is durable, it explicitly calls remember(kind, text):

rust

remember("user_preference", "shaked prefers terse Slack updates")
remember("decision",        "we chose Postgres over DynamoDB for billing v2")
remember("incident_root",   "INC-2207 root cause: stale Redis key on checkout-svc")

The write goes to memvid (vectors) and Neo4j (typed edges). Both are realm-scoped. The next turn's memory recall (2a) and graph context (2b) will see these writes — that's the feedback loop that makes the agent learn over a session and across sessions.

6b · Compaction

Long conversations cross the token budget. agentcy-chat::compaction:

Walks the transcript oldest-first
Summarizes turn pairs into a single <compacted>...</compacted> system note
Preserves: user decisions, tool results flagged durable, the active task description
Discards: chitchat, redundant tool results

Compaction halves context typically; itself logs to the audit trail so operators can tune the policy.

7 · Optional

7a · Sub-agent dispatch

When Orchestrator (OpenFang) is enabled, any agent can hand a task off:

rust

dispatch_sub_agent(
    name: "code_reviewer",
    task: "Review PR #412 for SQL injection patterns",
    input: { "pr_url": "..." }
)

The child runs its own loop with its own tool catalog and its own realm scope — useful when a subtask should only see a narrow tool set. The child returns a summary, which appears as a tool_result in the parent's transcript.

See frontend/app/(dashboard)/orchestrator/ and backend/crates/agentcy-api/src/routes/orchestration_gateway/.

7b · message_end

When the LLM emits finish_reason: end_turn, the loop:

Closes the SSE stream with message_end { usage }
Records cost (tokens × per-token rate) to chat_usage
Persists final transcript snapshot for replay

How memory, graph, and connectors work together

A common question: "If the graph already has my data, why do I need connectors?" Three roles:

	Connectors (15)	Knowledge graph (Neo4j)	Memory (memvid + edges)
Source of truth	The vendor (GitHub, AWS, …)	Imported snapshot of vendor state + Agentcy-only relationships	Things the agent learned during conversations
Latency	Live API call (slower)	Indexed (millisecond reads)	Indexed (millisecond reads)
Freshness	Always current	Last ingestion	Real-time (writes during turn)
Use it for	Actions ("restart the deploy") · live state	Joins, traversals, "what depends on what"	Preferences, decisions, incident history
Realm-scoped	Per-source config	Per-node property	Per-org by default

The agent loop uses all three:

Stage 2a/2b pull from memory + graph to prime the LLM
Stage 3 lets the LLM decide whether to also hit a live connector
Stage 5d runs the connector or graph query
Stage 6a writes durable facts back to memory + graph

That's the full flywheel: observe (ingest into graph) → recall (memory + graph in context) → act (connectors, gated) → remember (memory + graph writes) → observe again on the next turn.

Loop limits

Runtime caps prevent runaway loops:

Setting	Default	Meaning
`CHAT_MAX_TOOL_ITERATIONS`	20	Max tool calls in one turn before forcing end_turn
`CHAT_MAX_TOKENS_PER_TURN`	32 000	Hard ceiling — triggers compaction or termination
`approval_timeout_secs`	300	How long the loop blocks waiting for a human
`policy_eval_timeout_ms`	250	Per-call policy evaluation budget

All four are org-configurable in Settings → Security.

Gotchas

Don't bypass the catalog. Tempting to allow-list direct connector tools into the system prompt; resist — you'll leak tools across realms and trip policy tests.
Streaming retries are tricky. If the LLM stream dies mid-tool-call, the loop marks the call failed and retries with retry_hint. Don't parse partial streams client-side.
Approval timeouts leak sessions. If your UI doesn't handle approval_timeout events, users see a spinner forever. Use ConnectorApprovalRenderer.
Memory writes are realm-scoped. A fact remembered in realm prod is invisible in realm staging. Often desired; sometimes surprising.

Concept: Memory System — how recall + writes work end-to-end
Concept: Knowledge Graph — realms, ingestion, traversals
Concept: Zero-Trust Policies — Rego rules, audit log, role definitions
How-To: Chat API & Streaming — opening a conversation, handling SSE
How-To: Tool Calling & the Catalog — invoking tools by hand, pinning connectors
How-To: Approval Flows — approval UX end-to-end

Agent Loop ​

How the pieces fit ​

1 · Input ​

2 · Context build (every turn) ​

2a · Memory recall ​

2b · Graph context ​

2c · System prompt + transcript ​

3 · LLM turn ​

Provider matrix ​

4 · Stream (SSE) ​

5 · Tool call gates ​

5a · Route ​

5b · Policy gate (kg-policy) ​

5c · Approval gate ​

5d · Backend ​

5e · Tool result ​

6 · After the turn ​

6a · Memory write ​

6b · Compaction ​

7 · Optional ​

7a · Sub-agent dispatch ​

7b · message_end ​

How memory, graph, and connectors work together ​

Loop limits ​

Gotchas ​

Next ​

Agent Loop

How the pieces fit

1 · Input

2 · Context build (every turn)

2a · Memory recall

2b · Graph context

2c · System prompt + transcript

3 · LLM turn

Provider matrix

4 · Stream (SSE)

5 · Tool call gates

5a · Route

5b · Policy gate (kg-policy)

5c · Approval gate

5d · Backend

5e · Tool result

6 · After the turn

6a · Memory write

6b · Compaction

7 · Optional

7a · Sub-agent dispatch

7b · message_end

How memory, graph, and connectors work together

Loop limits

Gotchas

Next