Appearance
FinOps cost watch
Run cloud cost like an SRE runs latency. The agent pulls billing data daily, watches for anomalies, attributes spend to teams via tags + namespaces, and posts a digest to #finops with concrete recommendations. On-spike webhook fires within minutes.
10FinanceFinOps cost watch
Context Graph tribal knowledge
tag → team map · prior anomalies · accepted waste
Sources
AWS
GCP
k8s
daily + spikes
Agentcy
Agentcy
FinOps agent
cost digest
Output
Slack #finops
At a glance
- Inputs: AWS Cost Explorer, GCP Billing exports (BigQuery), Kubernetes (resource requests vs usage).
- Trigger: cron daily 06:00 UTC + spike webhook (Cost Explorer Anomaly Detection / GCP Recommender).
- Output: Daily Slack digest in
#finops, on-spike@oncall-finopspage. - Gates: read-only by default — write actions (e.g. tagging untagged resources) require approval.
Stack
| Layer | What we use |
|---|---|
| Triggers | Cron daily; AWS Cost Anomaly Detection webhook; GCP Recommender webhook |
| Connectors | AWS (Cost Explorer + tagging), GCP (Billing + BigQuery), Kubernetes |
| Agent | FinOps agent with skills for spend attribution + savings-plan math |
| Policies | Rego: read-only by default; tagging requires approval; deny modifying budget alerts without a ticket reference |
| Memory | Each anomaly is recorded — repeat anomalies don't re-page |
What you'll build
- A daily task that pulls yesterday's spend, broken down by service and team tag.
- The agent computes day-over-day, week-over-week, and month-over-month deltas.
- Anomalies above a threshold (default: 25% WoW or $X absolute) become Slack alerts.
- The agent attributes the spike to the team that caused it via tags / namespaces / IAM owner.
- The digest includes savings recommendations: rightsizing, savings plans, idle resource hygiene.
Prerequisites
- AWS Cost Explorer enabled (24-48h of history minimum)
- GCP Billing export to BigQuery enabled
- Kubernetes namespaces tagged with
team=labels - Realm scoped to
finops
Worked example
rego
# policies/finops.rego
package agentcy
# Read-only by default.
default allow := false
allow if {
input.tool in {
"aws.get_cost_and_usage",
"aws.list_tagged_resources",
"gcp.bigquery_query",
"kubernetes.list_namespaces",
"kubernetes.list_pods",
"kubernetes.describe_node",
}
}
# Tagging requires approval.
allow if {
input.tool == "aws.tag_resources"
input.approval.granted
}Task:
yaml
name: finops-daily
schedule: "0 6 * * *"
realm: finops
agent: finops-agent
prompt: |
Pull yesterday's spend from AWS Cost Explorer and GCP Billing.
Break down by service and by team tag (or namespace for k8s).
Compute D-1, W-1, M-1 deltas.
For any anomaly > 25% WoW or > $500 absolute:
1. Identify the team that owns the resource.
2. Suggest 2-3 concrete remediations (rightsizing, idle, savings plan).
3. Cite the data source for each claim.
Post a digest to #finops. If anomalies > 100% spike, page #oncall-finops.What good looks like
Wednesday 06:00 — FinOps daily
Yesterday's total: $28,431 (D-1 +4%, W-1 +12%)
🔴 Anomalies (3)
eks-cluster/prodnamespacecheckout— $4,200 (W-1 +180%) → Cause: HPA scaled to 40 pods after PR #2103 (code-review flagged this) → Suggest: revisit HPA target CPU (currently 40%, baseline was 70%)s3://logs-archiveegress — $890 (D-1 +400%) → New cross-region replication enabled — was that intentional?🟡 Savings on the table (2)
- 12 m5.xlarge instances idle > 7 days, candidate for rightsizing → ~$340/mo
- Compute Savings Plan would save ~$1.2k/mo at current run rate
[Full breakdown →]
Variations
- Per-team channels — fan out per-team digests to that team's
#finops-team-Xchannel using the same agent. - PR comments — when a PR adds resources estimated > $X/mo, drop a comment (How-To: GitHub Channel).
- Slack-driven exploration — bind to
#ask-finops; "@agentcy why did spend spike yesterday in eu-west-1?" gets answered with citations.
Troubleshooting
- Cost data lag. AWS Cost Explorer is up to 24h delayed. Don't fire the daily task before 06:00 UTC.
- Missing tags. The first run will surface "untagged: $X". Use that list to drive a tagging campaign before scaling the recipe.
- Noisy alerts. Tune the anomaly threshold and add memory-based suppression — once you've alerted on a spike, don't re-alert until it resolves or worsens by 50%.
Next
- Concept: Memory System — alert suppression
- Connector: AWS
- Connector: GCP