Senior Software Engineer Infra

You will own the design, build, and operation of Kubernetes cluster management tooling and automation that keeps the compute platform reliable and self-healing at scale. You will build developer-facing tooling and workflows that improve how engineers interact with Kubernetes, with a heavy emphasis on integrating AI-driven processes and support. You will deliver net-new compute capabilities for service owners, such as one-off jobs, cron scheduling, deployment strategies, EFS support, and automated right-sizing. You will drive operational excellence by automating toil, reducing on-call burden, and continuously improving platform observability and incident response. You will partner with Security, Reliability, and Observability teams to ensure the compute platform meets Coinbase's standards for security, uptime, and performance.

Responsibilities

Own the design, build, and operation of Kubernetes cluster management tooling and automation that keeps our compute platform reliable and self-healing at scale.
Build developer-facing tooling and workflows that improve how engineers across Coinbase interact with Kubernetes, with a heavy emphasis on integrating AI-driven processes and support.
Deliver net-new compute capabilities for service owners, such as one-off jobs, cron scheduling, deployment strategies, EFS support, and automated right-sizing.
Drive operational excellence by automating toil, reducing on-call burden, and continuously improving platform observability and incident response.
Partner with Security, Reliability, and Observability teams to ensure the compute platform meets Coinbase's standards for security, uptime, and performance.

Requirements

5+ years of software engineering experience, including 3+ years building and operating Kubernetes or similar compute orchestration systems (e.g., Mesos, Nomad, ECS).
Hands-on experience with AWS and/or GCP infrastructure services (e.g., EC2, EKS, IAM, VPC, networking) in a production environment at scale.
Demonstrated ability to design, implement, and operate distributed infrastructure systems, including diagnosing complex failures and driving them to root-cause resolution.
Hands-on experience with the CNCF ecosystem (e.g., Helm, Prometheus, ArgoCD, Envoy) and a track record of applying these tools to solve real infrastructure problems.
Proven ability to apply AI tooling to infrastructure workflows, improving automation, developer productivity, or operational efficiency.
Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

Benefits

equity
bonus eligibility
medical insurance
dental insurance
vision insurance