Senior DevOps Engineer — Customer Deployments & Infrastructure

At Shakudo, we're building the world's first operating system for data and AI. We use the term "operating system" in the truest sense: just like iOS, Windows, or Linux, Shakudo's end-to-end OS provides ever-evolving, fully automated, best-in-class open-source components tailored to each business's unique needs.

We are seeking a Senior DevOps Engineer to join our Engineering team and take ownership of deploying, configuring, and operating Shakudo in customer environments. This is a hands-on infrastructure role for someone who can work across Kubernetes, Helm charts, cloud and on-premise environments, and act as a trusted technical advisor to customers — diagnosing problems, designing deployment architectures, and ensuring Shakudo runs reliably in production.

In this role, you will own the deployment lifecycle from architecture to operations: assessing customer infrastructure, deploying Shakudo into complex environments, resolving production issues, and turning recurring problems into product improvements. This is not a traditional internal DevOps role — it is a mix of DevOps engineering, Kubernetes platform engineering, and solution architecture where success is measured by deployment reliability, customer satisfaction, and operational excellence.

Responsibilities

  • Own the deployment and operation of Shakudo across customer Kubernetes environments
  • Design, develop, customize, and troubleshoot Helm charts for complex production deployments
  • Work deeply with Kubernetes primitives including deployments, stateful sets, services, ingress, storage classes, secrets, config maps, RBAC, network policies, CRDs, and operators
  • Debug Kubernetes issues across scheduling, networking, storage, permissions, DNS, ingress, certificates, and workload reliability
  • Build repeatable deployment patterns that work across different customer infrastructure environments
  • Assess customer infrastructure and recommend the right deployment architecture for Shakudo
  • Work with customer platform, DevOps, security, and infrastructure teams to deploy Shakudo into their environments
  • Support deployments across AWS, GCP, Azure, hybrid cloud, and on-premise Kubernetes clusters
  • Design for enterprise constraints such as private networking, IAM/RBAC, security controls, observability, compliance requirements, and restricted environments
  • Help customers make the right trade-offs across reliability, scalability, performance, cost, and operational complexity
  • Build and maintain infrastructure-as-code using tools such as Terraform and related cloud-native tooling
  • Operate cloud managed services that interface with Shakudo Kubernetes clusters, including databases, storage, networking, secrets, and identity services
  • Support GPU infrastructure and specialized compute environments for data and AI workloads
  • Improve deployment automation, release processes, upgrade workflows, monitoring, and operational runbooks
  • Identify recurring deployment issues and turn them into product improvements, automation, or reusable patterns
  • Monitor, debug, and resolve production issues in customer environments
  • Lead root-cause analysis for infrastructure, deployment, and platform reliability issues
  • Execute product upgrades, maintenance windows, rollouts, and customer-specific configuration changes
  • Improve observability, alerting, logging, and operational visibility across deployments
  • Ensure customer environments are stable, secure, scalable, and maintainable
  • Act as a trusted technical advisor to customers during deployment and production operations
  • Explain infrastructure decisions clearly to both technical and non-technical stakeholders
  • Collaborate with Solution Engineering, Product Engineering, and Customer Engineering teams to translate customer requirements into robust deployment architectures
  • Document deployment designs, customer-specific configurations, best practices, and troubleshooting guides
  • Represent the voice of the customer internally and influence product and platform improvements
  • Qualifications

  • 5+ years of experience in DevOps, Platform Engineering, Infrastructure Engineering, SRE, or a related role
  • Strong hands-on experience with Kubernetes in production environments
  • Strong hands-on experience developing, maintaining, and troubleshooting Helm charts
  • Experience deploying and operating software in customer or enterprise environments
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Experience with infrastructure-as-code tools such as Terraform
  • Strong understanding of Kubernetes networking, storage, ingress, RBAC, secrets management, observability, and cluster operations
  • Ability to troubleshoot complex infrastructure issues across application, Kubernetes, cloud, and network layers
  • Familiarity with Python, Go, Bash, or TypeScript for automation and tooling
  • Strong communication skills and comfort working directly with customer technical teams
  • Ability to operate independently, make sound technical decisions, and drive deployments to completion
  • A Plus

  • Experience with data platforms, AI infrastructure, MLOps, or GPU workloads
  • Experience with Kubernetes operators, CRDs, GitOps, Argo CD, Flux, or similar deployment tooling
  • Experience with enterprise security requirements, private networking, identity providers, SSO, and compliance-driven environments
  • Experience deploying software into air-gapped, restricted, or customer-managed infrastructure
  • Prior experience in a customer-facing infrastructure, solution engineering, or solution architecture role
  • Contributions to open-source Kubernetes, DevOps, or infrastructure projects
  • Why Shakudo Stands Out

    • Work with cutting-edge technologies in machine learning and high-performance computing
    • Contribute to a platform that transforms how organizations leverage data and AI
    • Join a dynamic team that values innovation, efficiency, and diversity

    This is a work from office role based out of Bangalore (HSR Layout). Shakudo has offices in Toronto, San Francisco, and Bangalore.