Skip to content

FinOps cost watch

Run cloud cost like an SRE runs latency. The agent pulls billing data daily, watches for anomalies, attributes spend to teams via tags + namespaces, and posts a digest to #finops with concrete recommendations. On-spike webhook fires within minutes.

10FinanceFinOps cost watch
Context Graph tribal knowledge
tag → team map · prior anomalies · accepted waste
Sources
AWS
AWS
GCP
GCP
k8s
k8s
daily + spikes
Agentcy
Agentcy
Agentcy
FinOps agent
cost digest
Output
Slack #finops
Slack #finops

At a glance

  • Inputs: AWS Cost Explorer, GCP Billing exports (BigQuery), Kubernetes (resource requests vs usage).
  • Trigger: cron daily 06:00 UTC + spike webhook (Cost Explorer Anomaly Detection / GCP Recommender).
  • Output: Daily Slack digest in #finops, on-spike @oncall-finops page.
  • Gates: read-only by default — write actions (e.g. tagging untagged resources) require approval.

Stack

LayerWhat we use
TriggersCron daily; AWS Cost Anomaly Detection webhook; GCP Recommender webhook
ConnectorsAWS (Cost Explorer + tagging), GCP (Billing + BigQuery), Kubernetes
AgentFinOps agent with skills for spend attribution + savings-plan math
PoliciesRego: read-only by default; tagging requires approval; deny modifying budget alerts without a ticket reference
MemoryEach anomaly is recorded — repeat anomalies don't re-page

What you'll build

  1. A daily task that pulls yesterday's spend, broken down by service and team tag.
  2. The agent computes day-over-day, week-over-week, and month-over-month deltas.
  3. Anomalies above a threshold (default: 25% WoW or $X absolute) become Slack alerts.
  4. The agent attributes the spike to the team that caused it via tags / namespaces / IAM owner.
  5. The digest includes savings recommendations: rightsizing, savings plans, idle resource hygiene.

Prerequisites

  • AWS Cost Explorer enabled (24-48h of history minimum)
  • GCP Billing export to BigQuery enabled
  • Kubernetes namespaces tagged with team= labels
  • Realm scoped to finops

Worked example

rego
# policies/finops.rego
package agentcy

# Read-only by default.
default allow := false

allow if {
    input.tool in {
      "aws.get_cost_and_usage",
      "aws.list_tagged_resources",
      "gcp.bigquery_query",
      "kubernetes.list_namespaces",
      "kubernetes.list_pods",
      "kubernetes.describe_node",
    }
}

# Tagging requires approval.
allow if {
    input.tool == "aws.tag_resources"
    input.approval.granted
}

Task:

yaml
name: finops-daily
schedule: "0 6 * * *"
realm: finops
agent: finops-agent
prompt: |
  Pull yesterday's spend from AWS Cost Explorer and GCP Billing.
  Break down by service and by team tag (or namespace for k8s).
  Compute D-1, W-1, M-1 deltas.
  For any anomaly > 25% WoW or > $500 absolute:
    1. Identify the team that owns the resource.
    2. Suggest 2-3 concrete remediations (rightsizing, idle, savings plan).
    3. Cite the data source for each claim.
  Post a digest to #finops. If anomalies > 100% spike, page #oncall-finops.

What good looks like

Wednesday 06:00 — FinOps daily

Yesterday's total: $28,431 (D-1 +4%, W-1 +12%)

🔴 Anomalies (3)

  • eks-cluster/prod namespace checkout — $4,200 (W-1 +180%) → Cause: HPA scaled to 40 pods after PR #2103 (code-review flagged this) → Suggest: revisit HPA target CPU (currently 40%, baseline was 70%)
  • s3://logs-archive egress — $890 (D-1 +400%) → New cross-region replication enabled — was that intentional?

🟡 Savings on the table (2)

  • 12 m5.xlarge instances idle > 7 days, candidate for rightsizing → ~$340/mo
  • Compute Savings Plan would save ~$1.2k/mo at current run rate

[Full breakdown →]

Variations

  • Per-team channels — fan out per-team digests to that team's #finops-team-X channel using the same agent.
  • PR comments — when a PR adds resources estimated > $X/mo, drop a comment (How-To: GitHub Channel).
  • Slack-driven exploration — bind to #ask-finops; "@agentcy why did spend spike yesterday in eu-west-1?" gets answered with citations.

Troubleshooting

  • Cost data lag. AWS Cost Explorer is up to 24h delayed. Don't fire the daily task before 06:00 UTC.
  • Missing tags. The first run will surface "untagged: $X". Use that list to drive a tagging campaign before scaling the recipe.
  • Noisy alerts. Tune the anomaly threshold and add memory-based suppression — once you've alerted on a spike, don't re-alert until it resolves or worsens by 50%.

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.