Skip to content

RAG & Semantic Search

agentcy-rag is Agentcy's retrieval layer. It embeds every graph node into a vector, stores the vectors in Neo4j 5's HNSW indexes, and exposes semantic search. Embeddings are generated locally via fastembed-rs — no external API calls.

This page is the operator/integrator view. For the agent-facing side, see Agent Loop.

Query the graph semantically

bash
curl -X POST http://localhost:8080/api/v1/rag/search \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "query":   "why is the payment service flaky",
    "top_k":   10,
    "realms":  ["infrastructure","development"]
  }'

Response:

json
{
  "hits": [
    {
      "node_id":"n_01HABC",
      "labels":["Service"],
      "score":0.874,
      "properties":{ "name":"payments", "runtime":"go-1.22" },
      "source":{ "connector":"kubernetes", "external_id":"…" },
      "realm":"infrastructure"
    },
    {
      "node_id":"n_01HAB9",
      "labels":["PullRequest"],
      "score":0.811,
      "properties":{ "number":412, "title":"revert payments DB_POOL_SIZE" },
      "realm":"development"
    }

  ],
  "took_ms": 38
}

Re-rank with filters

Add filters to narrow before scoring:

json
{
  "query": "flaky payment",
  "filters": {
    "labels":   ["Service","PullRequest","Incident"],
    "after":    "2026-04-20T00:00:00Z",
    "exclude":  { "archived": true }
  },
  "top_k": 20
}

Re-rank order is: vector similarity → BM25 over name+title+description → recency boost.

Hybrid search (vector + full-text)

bash
curl -X POST http://…/rag/hybrid \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "query": "checkout 500 error",
    "text_weight": 0.4,
    "vector_weight": 0.6,
    "top_k": 20
  }'

Useful when the query has strong keywords (error codes, IDs) that pure vectors might miss.

What gets embedded

agentcy-rag embeds the canonical text representation of a node:

<labels> | <name> | <description or title> | <key props as k=v pairs>

Re-embedding happens on every property change to the text fields (name, title, description) and on label changes. Other property changes skip re-embed to save compute.

Manually trigger re-embedding of a label or realm (admin-only):

bash
curl -X POST http://…/rag/reindex \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{"labels":["Service"],"realm":"infrastructure"}'

Model choice

Default embedding model is bge-small-en (384 dims). Alternatives:

env
RAG_EMBED_MODEL=bge-base-en        # 768 dims, slower, better recall
RAG_EMBED_MODEL=nomic-embed-text   # 768 dims, tuned for retrieval

Changing the model requires a full reindex (you can't mix dimensions). The API rejects mixed-dim writes with a clear error.

Performance

Benchmarks on a single API process (M2 Pro, 1M nodes):

Opp50p99
embed 1 query18 ms45 ms
HNSW search, top_k=106 ms20 ms
hybrid search, top_k=1022 ms70 ms
full reindex 1M nodes~4 min

Batch embedding for pipeline writes uses fastembed batching (default 64).

Agent integration

The agent loop uses /rag/search implicitly when the LLM calls search_connector_tools — it's the same vector path under the hood, just restricted to ToolSpec rows. Agents can also call /rag/search directly via the rag.search tool when features.rag and the appropriate connector expose it (via MCP or a custom skill).

A common pattern: before calling a tool, the agent rag.search's for related incidents or PRs to inject context into its own plan.

Gotchas

  • Embedding model changes = reindex. There's no live migration between dims.
  • Very short nodes (label + id) don't embed well. Enrich them — add description via a pipeline step.
  • HNSW is approximate. For high-stakes queries, bump search_ef (env RAG_HNSW_SEARCH_EF, default 64) or use hybrid search.
  • Embeddings stay in Neo4j. Back up Neo4j, not a separate vector store. There's no external Pinecone/Weaviate.

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.