Mgr, Engineering Program Management, AI Platforms & Infrastructure

We are looking for an experienced Engineering Program Manager (EPM) Manager to lead strategy, execution, and delivery across our AI/ML platform and infrastructure programs. In this role, you will drive cross-functional initiatives spanning Apple’s massive-scale GPU/TPU compute infrastructure, Foundation Model inference platforms, and hybrid-cloud AI systems. You will partner closely with engineering and operations leaders to translate complex technical requirements into actionable roadmaps. Crucially, you will be responsible for growing and scaling a high-performing EPM team to meet the rapidly expanding demands of Apple's generative AI and machine learning platforms. Minimum Qualifications 10+ years of experience in product or program management, with at least 3+ years in a people management or lead EPM role. Proven experience building and scaling teams, with the organizational savvy to expand team scope and influence across a highly matrixed environment. Extensive experience managing strategic relationships with top-tier cloud vendors and external partners, including infrastructure planning, contract alignment, and SLA enforcement. Strong strategic thinking with the ability to balance long-term platform roadmap priorities against near-term inference and training execution demands. Track record of delivering massive-scale cost optimization and operational efficiency programs in hybrid-cloud environments. Excellent communication and stakeholder management skills — able to translate complex technical infrastructure concepts for both deep engineering teams and executive audiences. Experience in multi-tenant, high-performance compute environments running large-scale Foundation Models or similar ML workloads. BS/MS in EE/CS/CE or equivalent Preferred Qualifications Deep technical background in AI/ML infrastructure, cloud operations, or distributed compute platforms, with direct experience in GPU/TPU capacity management and provisioning. Familiarity with large-scale distributed training frameworks (e.g., PyTorch, Megatron-LM, JAX) and their infrastructure implications at scale. Familiarity with FinOps practices in large-scale GPU/TPU environments. Experience navigating large-scale organizational change and team restructuring.

Similar jobs