///Overview— 001

Kubernetes efficiency, across every cloud you run.

KubeHero is a unified cost and efficiency plane for AKS, GKE, and EKS. Find idle CPU, forgotten namespaces, and underused GPUs — then enforce a hard spending ceiling with a Kubernetes-native spend ceiling.

Request access See how it works

Stage

Pre-launch

Release

Q2 2026

License

Open core

Clouds

AKS · GKE · EKS

Deploy

Helm / SaaS

Agent

eBPF, read-only

///Cluster snapshot— snap

example data · anonymized

cluster-prod-us-east-1·18 nodes·162 pods·live

Right-sizedOvercommitWasting

node-0167%

node-0252%

node-0373%

node-0463%

node-0574%

node-0652%

node-0751%

node-0834%

node-0931%

node-1025%

node-1165%

node-1288%

node-1335%

node-1474%

node-1554%

node-1653%

node-1762%

node-1879%

Recoverable36 pods requesting more than they use$141,670 / mo

last scan · just now

///Problem— 002

Kubernetes is a scheduler, not an economist.

It does exactly what you ask. And what most teams ask for is 6× more capacity than they actually use. Here's what that looks like at the pod, node, and cluster layer.

01·DIAG-01

Requests are fiction.

Developers set CPU/memory requests once, to avoid the 3AM page. Industry studies report real utilization at ~13% of what pods request. The other 87% is paid-for air.

02·DIAG-02

Limits are scar tissue.

That 16 vCPU limit on a service that uses 0.4 average? Someone set it during an incident six months ago. Nobody touches it because nobody knows why it's there.

03·DIAG-03

GPUs are the silent killer.

A single idle A100 burns ~$32/hour. H100s worse. Most clusters have 30–60% GPU idle time that never shows up in a dashboard until finance opens the invoice.

04·DIAG-04

The autoscaler doesn't know your budget.

Karpenter and the cluster autoscaler optimize for scheduling, not spend. A bad deploy can spawn 400 nodes before anyone notices. By the time Slack lights up, you owe $18k.

///Solution— 003

One plane for every cluster.
Every dollar accounted for.

KubeHero runs a lightweight DaemonSet on every cluster and streams compressed telemetry to a control plane you host, or that we host for you. No invasive sidecars. No re-architecting. No vendor lock-in.

01·CAP-01

eBPF-accurate telemetry.

Kernel-level pod attribution. Not the 30s-averaged guesswork you get from metrics-server. Per-pod CPU, memory pressure, syscalls, I/O — second-granularity.

02·CAP-02

Unified cloud pricing.

Live EC2 + Savings Plans + Spot for EKS, committed-use for GKE, Spot VMs and Reserved Instances for AKS — one mental model, one cost-per-second number per pod.

03·CAP-03

GPU- and TPU-native.

DCGM-integrated GPU telemetry, tensor core utilization, per-process VRAM, MIG slice efficiency. TPU utilization via GCP SDK. Idle A100? Flagged in 60 seconds.

04·CAP-04

Policy engine + spend ceiling.

Budget CRDs. Automated rightsizing recommendations. A circuit-breaker that evicts runaway pods, caps HPA, or quarantines node pools before a bad deploy melts the card.

///The operator's view— cli

kubehero — scan cluster-prod-us-east-1

$ kubehero scan --cluster prod-us-east-1 --report waste

↳ connecting to control plane · ok

↳ querying 187 nodes · 2,341 pods · 7d window

WASTE REPORT cluster-prod-us-east-1

─────────────────────────────────────────────────────

● vectordb-ingress cpu.request=16 used=0.41 $8,640/mo recoverable

● model-server-a100 gpu=8 util=12% $18,200/mo recoverable

⚠ jobs-etl-nightly limit=32cpu burst=2.1 overcommit risk: HIGH

✓ frontend-gateway cpu.request=2 used=1.6 right-sized

─────────────────────────────────────────────────────

total 47 pods flagged · $38,940/mo recoverable · run `kubehero rightsize` to apply

$ ▎

///In product— 004

One pane of glass.
AKS, GKE, EKS — side by side.

The dashboard is built for operators, not dashboards-as-art. Spend rolls up from pod to cluster to fleet. Drill down until you see the exact workload wasting the money, then ship the fix — or arm the spend ceiling.

kubehero / fleetlive

Monthly spend

$608,240

+4.2%·vs previous 30d

Recoverable

$184,320

30.3%·of fleet spend

GPU idle share

41.8%

40× A100 / 32× H100·rolling 7d mean

Policies

12 active

0 active·spend ceiling: armed

/// clusters — 6sort · cost · desc

Cluster	Cloud	Region	Nodes	GPU	Cost / day	Recoverable	State
aks-westeu-prod-01	AKS	westeurope	142	8× A100	$4,820	$1,920	overcommit
aks-ne-staging	AKS	northeurope	24	—	$480	$110	healthy
gke-usc1-prod	GKE	us-central1	88	—	$2,140	$380	healthy
gke-euw4-batch	GKE	europe-west4	62	16× L4	$1,680	$540	overcommit
eks-use1-prod	EKS	us-east-1	210	32× H100	$12,940	$5,180	overrun
eks-usw2-dev	EKS	us-west-2	38	—	$620	$180	healthy

fleet/aks-westeu-prod-01· AKS· westeurope· 142 nodes· 8× A100live

/// node pools3 pools · 52 nodes · 8 gpu

system×4

Standard_D4s_v5

CPU

28%

MEM

62%

System + addons

app-burst×40

Standard_D16as_v5 · Spot

CPU

74%

MEM

55%

Stateless workloads

gpu-inference×8

Standard_NC24ads_A100_v4

CPU

22%

MEM

38%

GPU

18%

A100 · inference

insightGPU pool running at 18% mean utilization. 6 of 8 A100s idle > 4h/day.open rightsizing plan

/// top waste — 7d$28,960 / mo

model-server-a100$18,200 / mo

ns: ml-inferencegpu=8 util=12%

→ apply

vectordb-ingress$8,640 / mo

ns: retrievalcpu.req=16 used=0.41

→ apply

jobs-etl-nightlyovercommit risk: high

ns: datalimit=32cpu burst=2.1

→ review

frontend-gatewayright-sized

ns: edgecpu.req=2 used=1.6

last scan · 12s agokubehero rightsize --apply →

/// workflow — 004.b

Watch it work — end to end.

Connect a cloud account, stream telemetry, evaluate policies, act — all in under five minutes on a real cluster. Pause any step to read.

demo · kubehero workflow·step 01 / 04

///Connect— 004.b · 01

Connect any cluster in under five minutes.

Helm install the agent, paste an OIDC role ARN, and KubeHero discovers every AWS account, Azure subscription, and GCP project in scope.

AWS accounts4

Azure subscriptions2

GCP projects3

Discovered clusters6

AWS✓ connected

Account · 742190-prod

regionus-east-1 · us-west-2

clusterseks-use1-prod · eks-usw2-dev

GPU32× H100

Azure✓ connected

Subscription · 81c7…fe9b

regionwesteurope · northeurope

clustersaks-westeu-prod-01 · aks-ne-staging

GPU8× A100 (NC24ads_v4)

GCP✓ connected

Project · kubehero-prod-euw4

regionus-central1 · europe-west4

clustersgke-usc1-prod · gke-euw4-batch

GPU16× L4

3 clouds · 6 clusters · 502 nodes · 12,480 podsmTLS · OIDC · read-only by default

/// operator console — 004.d

Live telemetry, not yesterday's PDF.

Panels update every second from a real ClickHouse feed. Hover the burn-rate chart to scrub back through the window.

kubehero / operator console· fleet: prod-* · last 2h · refresh 1s

live

Fleet burn rateUSD / hr · prod clusters · 120s window

$4,481avg $4,480healthy

−120s−60snow

Top workload waste$k / mo recoverable · rolling 7d

model-server-a100$18.0k

ns: ml-inferencerank 1

vectordb-ingress$9.3k

ns: retrievalrank 2

etl-nightly$5.7k

ns: datarank 3

frontend-gateway$4.3k

ns: edgerank 4

api-ingress$3.1k

ns: edgerank 5

GPU utilization heatmap8 GPUs × 48s · darker = idle, bright = loaded

gpu-01

90%

gpu-02

58%

gpu-03

53%

gpu-04

33%

gpu-05

44%

gpu-06

31%

gpu-07

35%

gpu-08

idle · 0–25%light · 25–55%mid · 55–85%loaded · 85%+

Alert feedlive · signed · SIEM exportable

09:14:00burn rate 1.3× on prod-us-east-1

09:14:01new recommendation · vectordb-ingress · cpu 16→4

09:14:02rightsizing applied · model-server-a100

09:14:03ceiling crossed · prod-monthly · 82% of $100k

09:14:04cluster discovered · gke-euw4-batch · 62 nodes

source · eBPF + DCGM · via collector DaemonSetquery = ch.pod_cost_1s · WHERE cluster ~ "prod-*"

/// spend attribution — 004.e

Follow the money, from namespace to invoice.

Ribbon thickness is $/mo. Hover a node or a flow — everything else dims so you can see exactly which team's workload is running on which cloud, and what it costs.

kubehero / spend attribution· namespace → workload → cloud · rolling 30d

total $163k·hover or click to filter

namespaceworkloadcloud

model-server-a100ns ml-inference → AWS$45.0k

model-server-a100ns ml-inference → Azure$37.0k

vectordb-ingressns retrieval → AWS$22.0k

etl-nightlyns data → GCP$16.0k

vectordb-ingressns retrieval → GCP$12.0k

frontend-gatewayns edge → AWS$8.0k

api-ingressns edge → Azure$7.0k

etl-nightlyns data → AWS$6.0k

frontend-gatewayns edge → GCP$3.0k

api-ingressns edge → AWS$2.0k

click a column label or bar to pin · esc to clearsource · ch.pod_cost_1d · GROUP BY ns, workload, cloud

/// live edge — 004.c

Three things legacy FinOps tools can't do.

Sub-minute telemetry, retroactive Savings Plan re-attribution, and an enforcement layer with humanArm: true. Flexera, Cloudability, and their peers are structurally incapable of any of these.

/// 001flexera · 24h stale · showing 2026-04-22

Live burn rate

$4,820.40/ hr

–60snow

delta vs 1h avg+$380

last tick2s ago

resolution1s

/// 002flexera · no re-attribution · SP applies forward only

Savings Plan replay

–17.8%retroactive · 22d window

09:14:021Y compute Savings Plan committed — $960,000

09:14:03re-attribution started · 28.4M rows

09:17:48cost restated back to 2026-04-01 · –17.8%

before

$0.1872

/ pod-hr

after

$0.1538

/ pod-hr

/// 003flexera · alerts only · no enforcement layer

Ceiling policies

3 armed· 0 active

prod-monthly-ceilingarmed

kind: Budget·scope: prod-* clusterseval 4s ago

prod-burn-rate-2xarmed

kind: CeilingPolicy·scope: prod-us-east-1eval 4s ago

gpu-inference-capstandby

kind: CeilingPolicy·scope: ns:ml-inferenceeval 12s ago

human-arm requiredkubehero cap --arm →

///Spend ceiling— 005

Declare what you refuse to spend.
KubeHero enforces it.

Most cost tools report yesterday's damage. KubeHero lets you define a hard ceiling as a Kubernetes CRD and acts in real time when a bad deploy, a runaway cron, or a forgotten dev namespace starts overrun budget.

01Scale HPAs down to safe minimum

02Evict non-SLO workloads

03Quarantine offending node pools

04Page on-call, post to Slack

apiVersion: kubehero.io/v1
kind: BudgetPolicy
spec:
  ceiling: $8400/hr
  hardStop: true
  humanArm: true
  escalation: [hpa, evict, quarantine, page]

Simulated budget breach

Demo · no real clusters harmed

Burn rate

$11,217/hr

Ceiling

$8,400/hr

Overage

+$2,817/hr

Step 01 of 02 · Arm the policy

Arming will expose the execute control. This demo will evict simulated workloads and cannot be undone mid-flight (cooldown applies).

> awaiting execution...

///Pricing— 006

Free until it pays for itself.

Three ways to run KubeHero. Start free, move to Cloud when you want the hosted brain, self-host with a commercial license when compliance demands it. No seat taxes. No surprise bills.

01·TIER-01

Free

OSS · self-hosted

forever · Apache 2.0

eBPF agent (DaemonSet)
Basic dashboard & CLI
3 clusters · 7-day retention
Community Discord
GitHub issues

Clone on GitHub

Recommended

02·TIER-02

Cloud

hosted control plane

$10

per node / month · first 25 nodes free

Everything in Free
Managed control plane
Unlimited clusters · 90-day retention
Slack / PagerDuty / OpsGenie integrations
Budget CRDs + spend ceiling
Email support · 24h SLA

Request access

03·TIER-03

Enterprise

self-hosted · BSL commercial

Custom

air-gap capable

Everything in Cloud
SSO (SAML, OIDC) + SCIM
Multi-tenant RBAC
Unlimited retention
On-prem / air-gapped deploy
Dedicated solutions engineer
99.95% SLA

Talk to us

///Get access— 007

Onboarding design partners now.

We work directly with a small group of operators running real AKS, GKE, or EKS footprints — especially teams managing a GPU fleet. Design partners get hands-on setup, monthly roadmap input, and first-year pricing locked in.

01Production clusters across one or more of AKS / GKE / EKS

02Monthly cloud spend above $50K, or a meaningful GPU/TPU fleet

03A human who owns K8s cost end-to-end and can give us 30 min / week