Real-time GPU cost governance on Confluent Cloud for Apache Flink — energy-per-useful-token (J/1k) detection, forecasting & a closed remediation loop. Measured on IBM Granite-3.3-8B / NVIDIA L4.
real-time schema-registry gpu stream-processing forecasting apache-flink apache-kafka anomaly-detection finops confluent-cloud flink-sql opentelemetry stream-governance vllm cost-governance llm-inference nvidia-dcgm gpu-cost
-
Updated
Jun 22, 2026 - Python