Platform Engineer

Are you the kind of engineer who sees a cluster not as plumbing but as the foundation on which real lives depend? At Sand, you'll own the Kubernetes platform that ministries of health use to make decisions for millions of people. It's expanding country by country: a growing fleet that spans managed cloud and self-managed, sovereign on-premises clusters, because where health data must live is the law, not a preference. Every new deployment is a fresh reliability problem, not a copy-paste when a cluster wobbles, a health worker loses the data they need. You'll automate relentlessly, codify everything with GitOps, and maintain a single operating standard across the whole estate. If you're as curious about the lives behind the systems as you are about the systems themselves, this is your foundation to build on.

About Sand

Sand Technologies is a global Physical AI company using data and AI to make critical industries work better. We partner with governments, cities, and enterprises to improve how essential systems operate across healthcare, water, energy, telecommunications, and infrastructure.

Our work delivers proven real-world impact. We have built AI systems that help manage London’s water supply, supported telecom network planning across hundreds of cities, and developed digital healthcare platforms serving tens of millions of people across Africa. From intelligent command centers to AI-powered infrastructure platforms, we help organizations sense, analyze, and act in complex environments.

Our people are ambitious, curious, and relentlessly practical. Our teams work alongside clients in the field, solving hard problems and deploying solutions that last. With colleagues across Africa, Europe, the UK, and the US, we operate across the full stack - from research and engineering to deployment and capability building.

Our mission is simple: to harness AI to solve humanity’s most pressing challenges.

About the role

As our platforms scale across industries and deployment environments, we’re looking for a Platform Engineer who lives and breathes Kubernetes. You’ll own the foundations that everything else at Sand runs on — designing, operating, and hardening the clusters, pipelines, and reliability practices that let our product and delivery teams ship critical systems with confidence.

This is a deeply hands-on site reliability role. You’ll work across managed cloud Kubernetes and self-managed on-premises clusters, treat infrastructure as code, automate relentlessly through GitOps, and build the observability and operational discipline that keeps mission-critical workloads healthy.

What you’ll do

Kubernetes platform ownership: Design, deploy, and operate production Kubernetes clusters across AWS EKS and on-premises / bare-metal environments — owning cluster lifecycle, upgrades, multi-cluster topology, RBAC, networking, ingress, storage, and workload security.
Platform engineering: Build golden paths and self-service capabilities — Helm charts, operators/CRDs, reusable manifests, and internal tooling — that make it fast and safe for product and delivery teams to ship onto the platform.
GitOps & automation: Implement declarative, GitOps-driven delivery (Argo CD / Flux) and treat all infrastructure as code so environments are reproducible, auditable, and recoverable.
Reliability & SRE: Define SLOs and error budgets, lead incident response and blameless post-mortems, drive capacity and autoscaling strategy, and continuously improve the resilience and security posture of critical workloads.
Observability: Build and own monitoring, logging, tracing, and alerting (Prometheus, Grafana, OpenTelemetry, and similar) so the health of every deployed system is clear and actionable.
CI/CD pipelines: Develop and maintain CI/CD pipelines that streamline build, test, and deployment across services and environments.
R&D and problem-solving: Investigate infrastructure-level challenges, prototype solutions, and bring them into production.
Debugging & issue resolution: Troubleshoot and resolve infrastructure, networking, and cluster issues promptly to protect system integrity and performance.

Who you are

Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field — or equivalent practical experience.
Minimum of 5 years in cloud/platform engineering, with significant hands-on time operating Kubernetes in production.
Proven track record of designing and running scalable, secure, and reliable cloud- and container-based platforms.
Experience managing on-premise and bare-metal deployments of Kubernetes clusters.
Excellent problem-solving skills, with the ability to research deeply and develop innovative solutions to complex infrastructure challenges.
Strong communication skills and a genuine ability to collaborate across cross-functional, distributed teams.
Willingness to travel across the African continent to support our in-country teams when needed.

How we work

Due to the highly collaborative and internationally distributed nature of our work, successful candidates must be comfortable operating in small teams while contributing to larger, globally coordinated efforts. A strong sense of ownership, self-motivation and discipline in maintaining clear and consistent communication through virtual collaboration tools and video conferencing is essential.