No More Chaos: Audit and Inspect Kubernetes at Scale
Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io
Don’t miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
No More Chaos: Audit and Inspect Kubernetes at Scale - 陈杰, 阿里云 & 马金晶, 蚂蚁金服(杭州)网络技术有限公司
Accuracy in fault detection and efficiency of issue analysis are important for availability and stability of Kubernetes clusters.While there are huge number of resources, events, and metrics in Kubernetes. In our cluster, we noticed Kubernetes generates thousands of metrics data per second which makes it challenging to figure out the root cause from this ocean of data, not to mention analysis,data visualizion and alarms.In this talk, we will share experince and practices of auditing and inspecting Kubernetes at web scale. We’ll firstly talk about the how we design data metrics to reflect the stability of Kubernetes and how we consume these metrics and set out streaming alarm.We will use real cases to demo how we aggregate and analyze these metrics data.Finally,we will share the practices in Alibaba of building a automiatic system for real-time data inspection and analysis for Kubernetes.
