FinOps cost watch

Run cloud cost like an SRE runs latency. The agent pulls billing data daily, watches for anomalies, attributes spend to teams via tags + namespaces, and posts a digest to #finops with concrete recommendations. On-spike webhook fires within minutes.

10FinanceFinOps cost watch

Context Graph tribal knowledge

tag → team map · prior anomalies · accepted waste

Sources

AWS

GCP

k8s

daily + spikes

Agentcy

FinOps agent

cost digest

Output

Slack #finops

At a glance

Inputs: AWS Cost Explorer, GCP Billing exports (BigQuery), Kubernetes (resource requests vs usage).
Trigger: cron daily 06:00 UTC + spike webhook (Cost Explorer Anomaly Detection / GCP Recommender).
Output: Daily Slack digest in #finops, on-spike @oncall-finops page.
Gates: read-only by default — write actions (e.g. tagging untagged resources) require approval.

Stack

Layer	What we use
Triggers	Cron daily; AWS Cost Anomaly Detection webhook; GCP Recommender webhook
Connectors	AWS (Cost Explorer + tagging), GCP (Billing + BigQuery), Kubernetes
Agent	FinOps agent with skills for spend attribution + savings-plan math
Policies	Rego: read-only by default; tagging requires approval; deny modifying budget alerts without a ticket reference
Memory	Each anomaly is recorded — repeat anomalies don't re-page

What you'll build

A daily task that pulls yesterday's spend, broken down by service and team tag.
The agent computes day-over-day, week-over-week, and month-over-month deltas.
Anomalies above a threshold (default: 25% WoW or $X absolute) become Slack alerts.
The agent attributes the spike to the team that caused it via tags / namespaces / IAM owner.
The digest includes savings recommendations: rightsizing, savings plans, idle resource hygiene.

Prerequisites

AWS Cost Explorer enabled (24-48h of history minimum)
GCP Billing export to BigQuery enabled
Kubernetes namespaces tagged with team= labels
Realm scoped to finops

Worked example

rego

# policies/finops.rego
package agentcy

# Read-only by default.
default allow := false

allow if {
    input.tool in {
      "aws.get_cost_and_usage",
      "aws.list_tagged_resources",
      "gcp.bigquery_query",
      "kubernetes.list_namespaces",
      "kubernetes.list_pods",
      "kubernetes.describe_node",
    }
}

# Tagging requires approval.
allow if {
    input.tool == "aws.tag_resources"
    input.approval.granted
}

Task:

yaml

name: finops-daily
schedule: "0 6 * * *"
realm: finops
agent: finops-agent
prompt: |
  Pull yesterday's spend from AWS Cost Explorer and GCP Billing.
  Break down by service and by team tag (or namespace for k8s).
  Compute D-1, W-1, M-1 deltas.
  For any anomaly > 25% WoW or > $500 absolute:
    1. Identify the team that owns the resource.
    2. Suggest 2-3 concrete remediations (rightsizing, idle, savings plan).
    3. Cite the data source for each claim.
  Post a digest to #finops. If anomalies > 100% spike, page #oncall-finops.

What good looks like

Wednesday 06:00 — FinOps daily
Yesterday's total: $28,431 (D-1 +4%, W-1 +12%)
🔴 Anomalies (3)
eks-cluster/prod namespace checkout — $4,200 (W-1 +180%) → Cause: HPA scaled to 40 pods after PR #2103 (code-review flagged this) → Suggest: revisit HPA target CPU (currently 40%, baseline was 70%)
s3://logs-archive egress — $890 (D-1 +400%) → New cross-region replication enabled — was that intentional?
🟡 Savings on the table (2)
12 m5.xlarge instances idle > 7 days, candidate for rightsizing → ~$340/mo
Compute Savings Plan would save ~$1.2k/mo at current run rate
[Full breakdown →]

Variations

Per-team channels — fan out per-team digests to that team's #finops-team-X channel using the same agent.
PR comments — when a PR adds resources estimated > $X/mo, drop a comment (How-To: GitHub Channel).
Slack-driven exploration — bind to #ask-finops; "@agentcy why did spend spike yesterday in eu-west-1?" gets answered with citations.

Troubleshooting

Cost data lag. AWS Cost Explorer is up to 24h delayed. Don't fire the daily task before 06:00 UTC.
Missing tags. The first run will surface "untagged: $X". Use that list to drive a tagging campaign before scaling the recipe.
Noisy alerts. Tune the anomaly threshold and add memory-based suppression — once you've alerted on a spike, don't re-alert until it resolves or worsens by 50%.

Concept: Memory System — alert suppression
Connector: AWS
Connector: GCP

FinOps cost watch ​

At a glance ​

Stack ​

What you'll build ​

Prerequisites ​

Worked example ​

What good looks like ​

Variations ​

Troubleshooting ​

Next ​