Skip to content

Agent Loop

Every interaction with Agentcy — chat, scheduled task, channel message, sub-agent — runs through the same loop. The loop is small but each turn weaves together six subsystems: memory, knowledge graph, an LLM, the catalog meta-tools, policy + approval gates, and connectors. The diagram below shows one full turn end-to-end.

Agent loop full cycle: input → context build (memory recall, graph context, system prompt) → LLM with four catalog meta-tools → tool call gates (route, policy, approval) → backend (connector or knowledge graph) → tool_result loops back to the LLM and streams to SSE; after the turn, memory writes and compaction run.

Code entry point: agentcy-chat::agent_loop::AgentLoop::run.

How the pieces fit

StageCrateWhat happensWhy it matters
1. Inputagentcy-apiA user message arrives via chat, channel, webhook, or cronAll entry points converge on the same loop — the rules don't fork by channel
2. Context buildagentcy-memory, agentcy-graphMemory recall + graph context + transcript become the LLM promptThe agent answers with company-specific facts, not just general knowledge
3. LLM turnagentcy-agentLLM streams tokens and tool calls; sees only 4 catalog meta-toolsKeeps context small even with 300+ connector tools available
4. Streamagentcy-chatEvery event becomes an SSE chunk to the callerUI / API client sees tokens, tool calls, approvals as they happen
5. Tool call gateskg-policy, agentcy-chat::approvalRoute → Policy → Approval → Backend → ResultZero-trust by default — write tools never bypass policy
6. After turnagentcy-memory, agentcy-chat::compactionMemory writes + auto-compactionLong conversations stay tractable; facts persist across sessions

The rest of this page walks each numbered stage in the diagram.

1 · Input

A "turn" begins when a message arrives. Sources are unified by agentcy-api:

  • Chat/api/v1/chat/conversations/:id/messages (browser, mobile, SDK)
  • Channels — Slack, WhatsApp, email, Teams (via OpenFang gateway)
  • WebhooksPOST /api/v1/triggers/:id/fire from external systems
  • Cron — scheduled triggers (cron string → fire as message)
  • Sub-agent — a parent agent's dispatch_sub_agent invocation

All of them produce the same ChatMessage { role, content, metadata }, which feeds stage 2.

2 · Context build (every turn)

Before the LLM sees anything, the loop assembles the prompt from three sources:

2a · Memory recall

rust
let recalled = agentcy_memory::recall(user_message, top_k = 5).await?;

agentcy-memory runs a vector search against memvid (per-org embedding store) and returns the top-k facts. They are injected as a system note:

<memory>
- 2026-03-12: User confirmed "deploys go to staging first, never prod directly"
- 2026-04-02: 'checkout-svc' rolled back twice this month — see PR 412
- preference: terse Slack updates, full detail in PR comments
</memory>

Memories come from explicit writes (the agent calling remember(kind, text)) and implicit captures (decision points, tool results flagged as durable). See Concept: Memory System.

2b · Graph context

rust
let entities = agentcy_graph::context_for(realm, message).await?;

agentcy-graph queries Neo4j scoped to the conversation's realm. It resolves entities mentioned in the message (services, repos, dashboards) and returns a small slice of the graph as a system note:

<graph_context realm="prod">
service: checkout-svc → owns: github.com/acme/checkout
service: checkout-svc → emits: grafana.checkout-latency
incident: INC-2207 → caused by: checkout-svc · resolved: 2026-04-15
</graph_context>

This is what makes the agent company-aware — it knows your services, your repos, your incident history, before any tool call.

2c · System prompt + transcript

The full prompt is then assembled:

[system]   org policies + active realm + catalog instructions
[system]   <memory>...</memory>
[system]   <graph_context>...</graph_context>
[user]     ...prior turns (may be compacted)...
[tool]     ...prior tool_results...
[user]     <current message>

If transcript tokens exceed the budget, §6b compaction runs first.

3 · LLM turn

agentcy-agent is the multi-provider abstraction. The LLM sees the assembled prompt plus exactly four catalog meta-tools:

rust
list_connectors() -> Vec<ConnectorSummary>
search_connector_tools(query: String, top_k: u32) -> Vec<ToolSpec>
execute_connector_tool(connector: String, tool: String, args: Value) -> ToolResult
request_connector_access(connector: String, reason: String) -> RequestId

Why four meta-tools instead of all 300+? Because:

  • Context stays small — even with 26 connectors and hundreds of tools, the prompt only lists four.
  • Scoping is real — list_connectors returns only what the realm + policy + user role allows.
  • Discovery is semantic — search_connector_tools does vector search over ToolSpec.description, so the agent finds kubernetes.restart_deployment when asked to "restart the checkout service".
  • request_connector_access is the polite escape hatch when a tool exists but isn't enabled — opens an admin ticket in the Activity feed.

Provider matrix

ProviderStreamingToolsVisionLocal?
Anthropic (Claude)
OpenAI (GPT)
Google (Gemini)
Ollamamodel-dependentmodel-dependent
vLLMmodel-dependentmodel-dependent
LM Studiomodel-dependentmodel-dependent

Configure in Settings → Models or via env (OPENAI_API_KEY, ANTHROPIC_API_KEY, …). Default model is org-scoped, overridable per conversation.

4 · Stream (SSE)

Every chat response is server-sent events. Chunks the LLM emits and the loop wraps:

text
event: message_start       { "message_id": "msg_..." }
event: content_delta       { "delta": "text tokens" }
event: tool_call_start     { "tool_call_id": "tc_...", "name": "...", "connector": "..." }
event: tool_call_delta     { "tool_call_id": "tc_...", "args_delta": "..." }
event: approval_required   { "approval_id": "ap_...", "tool": "...", "args": {...} }
event: tool_result         { "tool_call_id": "tc_...", "result": {...} }
event: message_end         { "finish_reason": "end_turn", "usage": {...} }

See REST API: Chat streaming for the full contract.

5 · Tool call gates

When the LLM emits a tool_call, four checks run in order. Any failure short-circuits.

5a · Route

The loop dispatches based on tool name:

  • list_connectors, search_connector_tools, execute_connector_tool, request_connector_access → handled by ToolCatalog (in-memory registry)
  • Anything else → looked up in ConnectorToolProvider and dispatched to the relevant connector crate

5b · Policy gate (kg-policy)

Every call evaluates against active Rego policies:

rego
package agentcy

deny[msg] {
    input.tool == "kubernetes.delete_deployment"
    not input.user.roles[_] == "platform-admin"
    msg := "delete operations require platform-admin role"
}

Inputs to the policy: realm, user, tool, args, connector, time_of_day. Denials are audited (policy_audit_log table) and surfaced to the LLM as a tool_error so it can adapt rather than crash:

json
event: tool_result
data: {
  "tool_call_id": "tc_...",
  "error": {
    "code": "policy_denied",
    "message": "delete operations require platform-admin role",
    "policy_id": "pol_..."
  }
}

See Concept: Zero-Trust Policies.

5c · Approval gate

Read tools auto-approve. Write tools block until a human says yes:

  1. Loop creates an approval record in Postgres
  2. Emits approval_required on the SSE stream
  3. Blocks on a Tokio oneshot channel keyed by approval_id
  4. Resumes when the UI / API posts to POST /api/v1/chat/conversations/:id/approvals/:approval_id with {"approved": true|false}
  5. If the timeout (approval_timeout_secs, default 300) fires first, the call denies with approval_timeout

The frontend ConnectorApprovalRenderer is the canonical UX. See How-To: Approval Flows.

5d · Backend

The tool finally runs. The backend is one of:

  • Connector — one of 15 source types (github, aws, gcp, vercel, supabase, sql, mongodb, kubernetes, openapi, mcp, csv, json, remote-execution, …). Each connector crate implements ConnectorToolProvider with its own auth + rate-limiting.
  • Knowledge graph (Neo4j) — for tools that read/write the graph: graph.search, graph.related, graph.upsert_entity. All graph access is realm-filtered so cross-tenant leakage is impossible.
  • Worker job — long-running tools dispatch to agentcy-worker (Redis queue). The loop returns a job_id immediately and follows up with tool_result once the worker finishes.

5e · Tool result

The result (or tool_error) is:

  1. Appended to the transcript so the next iteration sees it
  2. Streamed to the caller as tool_result
  3. Looped back to stage 3 — the LLM gets another turn

6 · After the turn

6a · Memory write

If the agent decides a fact is durable, it explicitly calls remember(kind, text):

rust
remember("user_preference", "shaked prefers terse Slack updates")
remember("decision",        "we chose Postgres over DynamoDB for billing v2")
remember("incident_root",   "INC-2207 root cause: stale Redis key on checkout-svc")

The write goes to memvid (vectors) and Neo4j (typed edges). Both are realm-scoped. The next turn's memory recall (2a) and graph context (2b) will see these writes — that's the feedback loop that makes the agent learn over a session and across sessions.

6b · Compaction

Long conversations cross the token budget. agentcy-chat::compaction:

  1. Walks the transcript oldest-first
  2. Summarizes turn pairs into a single <compacted>...</compacted> system note
  3. Preserves: user decisions, tool results flagged durable, the active task description
  4. Discards: chitchat, redundant tool results

Compaction halves context typically; itself logs to the audit trail so operators can tune the policy.

7 · Optional

7a · Sub-agent dispatch

When Orchestrator (OpenFang) is enabled, any agent can hand a task off:

rust
dispatch_sub_agent(
    name: "code_reviewer",
    task: "Review PR #412 for SQL injection patterns",
    input: { "pr_url": "..." }
)

The child runs its own loop with its own tool catalog and its own realm scope — useful when a subtask should only see a narrow tool set. The child returns a summary, which appears as a tool_result in the parent's transcript.

See frontend/app/(dashboard)/orchestrator/ and backend/crates/agentcy-api/src/routes/orchestration_gateway/.

7b · message_end

When the LLM emits finish_reason: end_turn, the loop:

  • Closes the SSE stream with message_end { usage }
  • Records cost (tokens × per-token rate) to chat_usage
  • Persists final transcript snapshot for replay

How memory, graph, and connectors work together

A common question: "If the graph already has my data, why do I need connectors?" Three roles:

Connectors (15)Knowledge graph (Neo4j)Memory (memvid + edges)
Source of truthThe vendor (GitHub, AWS, …)Imported snapshot of vendor state + Agentcy-only relationshipsThings the agent learned during conversations
LatencyLive API call (slower)Indexed (millisecond reads)Indexed (millisecond reads)
FreshnessAlways currentLast ingestionReal-time (writes during turn)
Use it forActions ("restart the deploy") · live stateJoins, traversals, "what depends on what"Preferences, decisions, incident history
Realm-scopedPer-source configPer-node propertyPer-org by default

The agent loop uses all three:

  1. Stage 2a/2b pull from memory + graph to prime the LLM
  2. Stage 3 lets the LLM decide whether to also hit a live connector
  3. Stage 5d runs the connector or graph query
  4. Stage 6a writes durable facts back to memory + graph

That's the full flywheel: observe (ingest into graph) → recall (memory + graph in context) → act (connectors, gated) → remember (memory + graph writes) → observe again on the next turn.

Loop limits

Runtime caps prevent runaway loops:

SettingDefaultMeaning
CHAT_MAX_TOOL_ITERATIONS20Max tool calls in one turn before forcing end_turn
CHAT_MAX_TOKENS_PER_TURN32 000Hard ceiling — triggers compaction or termination
approval_timeout_secs300How long the loop blocks waiting for a human
policy_eval_timeout_ms250Per-call policy evaluation budget

All four are org-configurable in Settings → Security.

Gotchas

  • Don't bypass the catalog. Tempting to allow-list direct connector tools into the system prompt; resist — you'll leak tools across realms and trip policy tests.
  • Streaming retries are tricky. If the LLM stream dies mid-tool-call, the loop marks the call failed and retries with retry_hint. Don't parse partial streams client-side.
  • Approval timeouts leak sessions. If your UI doesn't handle approval_timeout events, users see a spinner forever. Use ConnectorApprovalRenderer.
  • Memory writes are realm-scoped. A fact remembered in realm prod is invisible in realm staging. Often desired; sometimes surprising.

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.