Appearance
RAG & Semantic Search
agentcy-rag is Agentcy's retrieval layer. It embeds every graph node into a vector, stores the vectors in Neo4j 5's HNSW indexes, and exposes semantic search. Embeddings are generated locally via fastembed-rs — no external API calls.
This page is the operator/integrator view. For the agent-facing side, see Agent Loop.
Query the graph semantically
bash
curl -X POST http://localhost:8080/api/v1/rag/search \
-H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
-d '{
"query": "why is the payment service flaky",
"top_k": 10,
"realms": ["infrastructure","development"]
}'Response:
json
{
"hits": [
{
"node_id":"n_01HABC",
"labels":["Service"],
"score":0.874,
"properties":{ "name":"payments", "runtime":"go-1.22" },
"source":{ "connector":"kubernetes", "external_id":"…" },
"realm":"infrastructure"
},
{
"node_id":"n_01HAB9",
"labels":["PullRequest"],
"score":0.811,
"properties":{ "number":412, "title":"revert payments DB_POOL_SIZE" },
"realm":"development"
}
…
],
"took_ms": 38
}Re-rank with filters
Add filters to narrow before scoring:
json
{
"query": "flaky payment",
"filters": {
"labels": ["Service","PullRequest","Incident"],
"after": "2026-04-20T00:00:00Z",
"exclude": { "archived": true }
},
"top_k": 20
}Re-rank order is: vector similarity → BM25 over name+title+description → recency boost.
Hybrid search (vector + full-text)
bash
curl -X POST http://…/rag/hybrid \
-H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
-d '{
"query": "checkout 500 error",
"text_weight": 0.4,
"vector_weight": 0.6,
"top_k": 20
}'Useful when the query has strong keywords (error codes, IDs) that pure vectors might miss.
What gets embedded
agentcy-rag embeds the canonical text representation of a node:
<labels> | <name> | <description or title> | <key props as k=v pairs>Re-embedding happens on every property change to the text fields (name, title, description) and on label changes. Other property changes skip re-embed to save compute.
Manually trigger re-embedding of a label or realm (admin-only):
bash
curl -X POST http://…/rag/reindex \
-H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
-d '{"labels":["Service"],"realm":"infrastructure"}'Model choice
Default embedding model is bge-small-en (384 dims). Alternatives:
env
RAG_EMBED_MODEL=bge-base-en # 768 dims, slower, better recall
RAG_EMBED_MODEL=nomic-embed-text # 768 dims, tuned for retrievalChanging the model requires a full reindex (you can't mix dimensions). The API rejects mixed-dim writes with a clear error.
Performance
Benchmarks on a single API process (M2 Pro, 1M nodes):
| Op | p50 | p99 |
|---|---|---|
| embed 1 query | 18 ms | 45 ms |
| HNSW search, top_k=10 | 6 ms | 20 ms |
| hybrid search, top_k=10 | 22 ms | 70 ms |
| full reindex 1M nodes | — | ~4 min |
Batch embedding for pipeline writes uses fastembed batching (default 64).
Agent integration
The agent loop uses /rag/search implicitly when the LLM calls search_connector_tools — it's the same vector path under the hood, just restricted to ToolSpec rows. Agents can also call /rag/search directly via the rag.search tool when features.rag and the appropriate connector expose it (via MCP or a custom skill).
A common pattern: before calling a tool, the agent rag.search's for related incidents or PRs to inject context into its own plan.
Gotchas
- Embedding model changes = reindex. There's no live migration between dims.
- Very short nodes (label + id) don't embed well. Enrich them — add
descriptionvia a pipeline step. - HNSW is approximate. For high-stakes queries, bump
search_ef(envRAG_HNSW_SEARCH_EF, default 64) or use hybrid search. - Embeddings stay in Neo4j. Back up Neo4j, not a separate vector store. There's no external Pinecone/Weaviate.
Next
- Concept: Knowledge Graph & Realms — where hits come from.
- Concept: Memory System — the other retrieval surface.
- Reference: REST API —
/ragroutes.