As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs. Learn more: https://platformcon.com/sessions/optimizing-ai-workloads-in-kubernetes-pruning-for-efficiency-and-scale

Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale