Senior Cloud Platform Engineer

The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

About SambaNova Systems: Join the company that's building the future of AI computing. At SambaNova, we are disrupting the AI and high-performance computing space with our integrated hardware and software platform. Our DataScale systems and SambaFlow software are pushing the boundaries of what's possible with generative AI and large language models. We are a team of passionate innovators tackling some of the world's most challenging computational problems.

The Opportunity: Our Cloud Operations team is evolving into a high-velocity Internal Platform Team, and we need a visionary engineer to lead this transformation. This isn't just about keeping the lights on; it's about building the robust, scalable, and automated foundation that empowers our entire engineering organization to develop, deploy, and scale our groundbreaking AI software stack.

You will be the cornerstone engineer, taking ownership of our core infrastructure. You'll have the autonomy to design, build, and migrate systems onto a modern, DevOps-centric platform using the best tools for the job. Your work will directly impact the productivity of every engineer at SambaNova.

What You'll Do:

Lead the Transformation: Architect, build, and maintain our next-generation internal developer platform, automating and streamlining our cloud and on-prem infrastructure.

Master of Infrastructure as Code (IaC): Design, write, and manage Terraform modules to provision and manage resources across AWS, GCP, and Azure, ensuring consistency and reproducibility.

Orchestrate at Scale: Build and manage highly available, secure, and performant Kubernetes clusters that serve as the primary runtime for our diverse AI workloads.

Bridge the Gap: Design and implement robust networking solutions (VPCs, load balancers, firewalls, service meshes) that seamlessly connect our multi-cloud and hybrid environments.

Developer Enablement: Collaborate with AI and software engineering teams to understand their needs, provide golden paths to production, and build internal tools that accelerate their development cycles.

Drive Reliability: Implement best practices for observability (monitoring, logging, tracing) to ensure system reliability and performance, and participate in on-call rotation.

What We're Looking For (Must-Haves):

5 + years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure roles.

Deep, hands-on expertise with Kubernetes (EKS, GKE, or self-managed) in production environments. You understand pods, operators, CNIs, and CSI.

Proven mastery of Infrastructure as Code, particularly with Terraform, to manage complex, multi-cloud environments.

Strong proficiency with at least one major cloud provider (AWS, GCP, or Azure), with a solid understanding of the core services (compute, storage, networking, IAM).

A solid foundation in networking fundamentals (TCP/IP, DNS, HTTP, load balancing) and security best practices in the cloud.

A developer mindset: proficiency in at least one programming language (e.g., Python, Go, Java) to automate tasks and build tools. You understand the SDLC.

A systematic problem-solving approach, combined with strong communication skills and a sense of ownership.

What Will Make You Stand Out (Nice-to-Haves):

  • Experience in a hybrid environment bridging cloud and on-premise/data center infrastructure.
  • Experience managing infrastructure for data-intensive or ML/AI workloads.
  • Knowledge of building and maintaining CI/CD pipelines (e.g., GitLab CI, Jenkins, ArgoCD).
  • Experience with service mesh technologies (e.g., Istio, Linkerd).
  • Contributions to open-source projects or a public portfolio of code (GitHub).

Why SambaNova?

Massive Impact: You will be a key part of a critical platform with high visibility and direct impact on our product and engineers.

Cutting-Edge Technology: Work with a world-class team on one of the most advanced AI stacks in the industry.

Autonomy and Growth: We trust you to make technical decisions. This is a greenfield opportunity to build something remarkable from the ground up.

Competitive compensation, including equity, excellent benefits, and a flexible work environment.

How to Apply: If you are a versatile engineer who loves to build scalable systems and wants to use those skills to power the future of AI, we want to hear from you.

Base Salary Range:

Base Pay Range
$245,000$325,000 USD

Submission Guidelines
Please note that in order to be considered an applicant for any position at SambaNova Systems, you must submit an application form for each position for which you believe you are qualified.

EEO Policy
SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex (including pregnancy, childbirth, breastfeeding), sexual orientation, and any other applicable status protected by federal, state, or local laws.

Benefits Summary for US-Based, Full-Time Employment Positions
SambaNova offers a competitive total rewards package, including the base salary, plus equity and benefits. We cover 95% premium coverage for employee medical insurance, and 77% premium coverage for dependents and offer a Health Savings Account (HSA) with employer contribution. We also offer Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans in addition to Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care. Our library of well-being benefits available to you and your dependents includes a full subscription to Headspace, Gympass+ membership with access to physical gyms, One Medical membership, counseling services with an Employee Assistance Program, and much more.

Similar jobs