MLOps Engineer

Overview We are seeking a Mid-Level MLOps Engineer to build, operate, and evolve our Kubeflow-based ML platform on Azure. This role focuses on enabling reliable, scalable, and cost-efficient ML workflows by designing CI/CD pipelines, managing Kubernetes-based ML infrastructure, improving platform observability, and supporting MLE and Data Science teams across the model lifecycle. The ideal candidate is hands-on, comfortable working across infrastructure and ML workflows, and motivated to operationalize best practices in MLOps. Responsibilities Platform & Infrastructure: Deploy, configure, and operate Kubeflow components on Azure Kubernetes Service (AKS) Support Kubernetes workloads for training, inference, and batch pipelines Manage container images, registries, and ML runtime environments Assist with Kubeflow and Kubernetes upgrades under senior guidance CI/CD & Automation: Build and maintain CI/CD pipelines for ML workflows and platform services Automate model training, validation, and deployment pipelines Implement reproducibility and versioning for data, models, and pipelines Observability & Reliability: Implement logging, monitoring, and alerting at the platform level Diagnose and resolve workflow, pipeline, and infrastructure failures Support SLAs and reliability objectives for ML platforms Collaboration & Enablement: Work closely with MLEs and Data Scientists to onboard workflows onto Kubeflow Provide best practices, templates, and documentation for ML teams Collaborate with Infra and Security teams on access control and compliance needs Cost Awareness & Optimization: Assist with collecting and reporting costs at Kubeflow namespace or workflow level Identify optimization opportunities related to compute usage and scheduling Qualifications 3–6 years of experience in MLOps, DevOps, or Platform Engineering Hands-on experience with Kubeflow, Kubernetes, Terraform (IaC), and containerized ML workloads Strong experience with Azure cloud services (AKS, ACR, Storage, Networking, IAM, AD Groups) Proficiency in Python and familiarity with ML frameworks (TensorFlow, PyTorch, scikit-learn) Experience building CI/CD pipelines (GitHub Actions, Azure DevOps, Argo, etc.) Understanding of ML lifecycle management (training, inference, monitoring, retraining) Familiarity with observability tools (Prometheus, Grafana, Azure Monitor, DataDog) Strong collaboration and communication skills What makes us different? • Hybrid work model: combination of remote and collaborative office experience to enable innovation • Entrepreneurial environment in leading international company • Professional growth possibilities & learning opportunities • Variety of benefits to support your physical, emotional and financial wellbeing • Volunteering opportunities to help external communities About PepsiCo We believe that culture should be at the cornerstone of everything we do at PepsiCo. We are agile, innovative and not afraid of failure. We want our team to come to work every day excited to explore new ways to bring enjoyment, refreshment and fun to the world. PepsiCo Positive (pep+) is the future of our organization – a strategic end-to-end transformation, with sustainability at the center of how we will create growth and value by operating within planetary boundaries and inspiring positive change for the planet and people. So, if you’re ready to be a part of a playground for those who think big, we’d love to chat. *We encourage the diversity of applicants across gender, age, ethnicity, nationality, sexual orientation, social background, religion or belief and disability #LI-Hybrid

Similar jobs