DevOps Nov 18, 2019

Improving Performance of Deep Learning Workloads With Volcano

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don’t miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Improving Performance of Deep Learning Workloads With Volcano - Ti Zhou, Baidu Inc

Baidu internally has improved the performance of large-scale deep learning workloads by using the Volcano project. The CRD-based computing resource model makes it possible to use resources more efficiently and configure computing models more flexibly. The Volcano project has unified abstraction of the underlying capabilities of group scheduling, fair share, priority queue, job suspend/resume, etc., which makes up for the lack of functionality of the native job based training operator. After using Volcano, Baidu’s internal resource utilization increased by 15%, and the training task completion speed increased by 10%. This talk will introduce the overall function of Volcano, transformation of the old operator to support Volcano, and the comparison of the performance of deep learning training tasks before and after using Volcano.

https://sched.co/UaZi