Skip to content

AWS — EKS

Use this when your org has standardized on Kubernetes. The base setup follows the generic Kubernetes guide; this page covers the AWS-specific extras: ALB ingress, IRSA, External DNS, EBS-backed PVs, and how to wire managed databases.

Reference Architecture

Cluster Prerequisites

These add-ons make life easier — install them once per cluster:

Add-onPurposeInstall
AWS Load Balancer ControllerProvisions ALBs from Ingress resourceshelm install ... aws-load-balancer-controller
EBS CSI DriverBacks PVCs for self-hosted Postgres / Neo4j / RedisEKS managed add-on
External DNSAuto-creates Route 53 records from Ingresshelm install external-dns
External Secrets OperatorPulls secrets from AWS Secrets Managerhelm install external-secrets
cert-managerTLS certs for any ingress not on ALBhelm install cert-manager
Karpenter (optional)Bin-packs nodes for costhelm install karpenter

Service Account → IAM (IRSA)

Agentcy uses IRSA so the API pod can read secrets and (optionally) talk to AWS-only connectors (S3, Bedrock, etc.) without long-lived keys.

bash
eksctl create iamserviceaccount \
    --cluster agentcy \
    --namespace agentcy \
    --name agentcy-api \
    --attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \
    --approve

For finer-grained access, attach a custom policy listing the exact secret ARNs and any connector resources.

Managed Databases

RDS Postgres + pgvector

Provision RDS Postgres 16 in a private subnet group. After creation:

sql
-- via psql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

Open port 5432 from the EKS node security group (or use security-group rules on the RDS security group referencing the SG attached by EKS).

ElastiCache Redis

A cache.t4g.small single-node cluster is plenty for Team-tier traffic. For HA pick a Multi-AZ replication group.

Context Engine — Basic (Neo4j / AuraDB)

Create an AuraDB project and instance in a region peered to your VPC. Pass the neo4j+s:// URI through to the API pods.

If you must keep the Basic graph in-cluster, the chart's bundled Neo4j subchart supports EBS-backed persistence. Don't run Community-edition Neo4j across multiple replicas — it isn't designed for that.

Context Engine — Advanced (kyma)

The Helm chart ships a kyma subchart you enable with kyma.enabled: true. It runs as a StatefulSet (or Deployment, since compute is stateless) and writes Arrow extents to an S3 bucket the chart can provision via the AWS Load Balancer Controller's IRSA annotation, or that you create out-of-band.

yaml
contextEngine: advanced

kyma:
  enabled: true
  image:
    repository: ghcr.io/agentcylabs/kyma
    tag: latest
  replicaCount: 2
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi
  objectStore:
    type: s3
    bucket: agentcy-kyma-prod
    region: us-east-1
    irsaRoleArn: arn:aws:iam::123:role/agentcy-kyma-s3
  catalog:
    # kyma catalog lives in the same RDS instance, separate database
    databaseSecretRef:
      name: agentcy-kyma-catalog
      key: url
  otlp:
    enabled: true
    nlb: true   # exposes :4317 via an NLB so emitters can ship OTLP directly

The API pod gets CONTEXT_ENGINE=advanced and points at http://kyma:8080. Reads stream over Arrow Flight gRPC; KQL/SQL/Cypher all work against the same engine. See getkyma.dev for OTLP/Kafka/file-drop ingest paths.

Helm Values for EKS

yaml
ingress:
  enabled: true
  className: alb
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443},{"HTTP":80}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123:certificate/xxxxx
    external-dns.alpha.kubernetes.io/hostname: agentcy.example.com
  hosts:
    - host: agentcy.example.com
      paths:
        - path: /api
          backend: api
        - path: /
          backend: frontend

api:
  serviceAccount:
    create: false
    name: agentcy-api    # IRSA-bound SA from eksctl above
  podAnnotations:
    eks.amazonaws.com/skip-containers: "envoy"

# Bundled databases off — using managed services
postgresql: { enabled: false }
neo4j:      { enabled: false }
redis:      { enabled: false }

externalSecrets:
  enabled: true
  secretStoreRef:
    name: aws-secrets
    kind: ClusterSecretStore
  remoteSecrets:
    databaseUrl:  { remoteRef: agentcy/postgres-url }
    redisUrl:     { remoteRef: agentcy/redis-url }
    neo4jUri:     { remoteRef: agentcy/neo4j, property: uri }
    neo4jUser:    { remoteRef: agentcy/neo4j, property: username }
    neo4jPassword:{ remoteRef: agentcy/neo4j, property: password }
    llmApiKey:    { remoteRef: agentcy/llm-api-key }
    jwtSecret:    { remoteRef: agentcy/jwt }

A ClusterSecretStore you only define once:

yaml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: aws-secrets
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
            namespace: external-secrets

Install

bash
helm repo add agentcy https://charts.agentcylabs.com
helm repo update

kubectl create namespace agentcy

helm install agentcy agentcy/agentcy \
    --namespace agentcy \
    --values eks-values.yaml

Auto-scaling

The chart enables HPA on the API by default. On EKS, add Karpenter (or the Cluster Autoscaler) so new nodes appear when HPA scales up:

yaml
api:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 12
    targetCPUUtilizationPercentage: 65
    targetMemoryUtilizationPercentage: 75

TLS, Domains, Multi-tenant

  • One domain per environment: simplest. Set ingress.hosts[0].host and let External DNS create the Route 53 record.
  • Wildcard for white-label: ACM wildcard cert (*.app.example.com), set tls.hosts: ["*.app.example.com"], and the API's tenant_resolver matches the host header.

CI/CD

GitOps with Argo CD or Flux pointed at a manifests repo is the cleanest pattern. Generate manifests via helm template and commit them, or use Argo's Helm plugin.

For pull-based image updates use Flux Image Automation or Keel — both will bump the image tag in your manifests when a new version is pushed.

Cost Notes

ComponentIndicative monthly cost (us-east-1)
EKS control plane$73
3 × m6g.large nodes (24/7)~$135
ALB~$20
RDS db.t4g.medium~$60
ElastiCache cache.t4g.small~$25
AuraDB Pro (8 GB)~$65
Floor~$380/mo

Karpenter typically cuts node cost 30–50% for bursty agent workloads.

Troubleshooting

Ingress provisions but ALB target health is unhealthy

The AWS LB Controller hits the pod IPs directly via target-type: ip. Confirm the pod readiness probe passes (kubectl get pods -n agentcy) and that the node SG allows traffic from the ALB SG.

403 from EBS CSI when creating PVCs

Attach the AmazonEBSCSIDriverPolicy to the node group's IAM role, or install the EBS CSI as a managed add-on with its own IRSA role.

Pod has unbound immediate PersistentVolumeClaims

Set a storageClassName (e.g., gp3) — EKS clusters don't have a default StorageClass unless the EBS CSI add-on installs one.

Secrets Manager AccessDenied

The agentcy-api ServiceAccount needs secretsmanager:GetSecretValue on the specific secret ARNs. Check kubectl describe pod — the AWS SDK will say which secret it failed to read.

Next Steps

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.