Senior Infrastructure & Operations Engineer (Kubernetes / Platform Reliability)

We’re looking for a senior infrastructure and operations engineer to own and evolve our platform reliability. You’ll design, operate, and maintain our Kubernetes-based infrastructure, build reliable monitoring and alerting pipelines, and ensure our systems remain stable under real-world load and failure conditions. This is a hands-on role for someone with deep experience running production systems at scale and who focuses on making infrastructure predictable and stable. You’ll work across Kubernetes, networking, CI/CD, Cloudflare, and observability to create a platform engineers can trust.
What You’ll Do

  • Design, deploy, and maintain production Kubernetes clusters.

  • Own cluster reliability, upgrades, security, and performance.

  • Build and operate monitoring, logging, and alerting pipelines.

  • Ensure full-stack observability across infrastructure and services.

  • Design and maintain CI/CD pipelines that are fast, reproducible, and safe.

  • Improve deployment strategies (rollouts, canaries, rollbacks).

  • Automate infrastructure provisioning and configuration.

  • Investigate and resolve production incidents.

  • Improve system resilience, redundancy, and recovery strategies.

  • Define SLOs/SLIs and track reliability targets.

  • Optimize and maintain our Cloudflare setup (caching, routing, security, edge behavior).

  • Work closely with engineering teams to improve operational practices.

  • Identify and remove single points of failure.

What We’re Looking For
Must-have

  • Senior-level experience operating production infrastructure.

  • Deep, hands-on expertise with Kubernetes (cluster internals, networking, storage, security).

  • Strong networking fundamentals (TCP/IP, routing, DNS, TLS, load balancing).

  • Experience debugging distributed systems and network-related issues.

  • Experience optimizing CDN and edge setups, including Cloudflare.

  • Strong experience building monitoring and observability systems.

  • Experience with metrics, logs, traces, and alerting pipelines.

  • Experience designing reliable CI/CD pipelines.

  • Strong Linux fundamentals.

  • Experience with infrastructure as code and automation.

  • Ability to debug issues across the entire stack.

  • Experience handling incidents and conducting postmortems.

Nice-to-have

  • Experience with multi-cluster or multi-region setups.

  • Experience with high-throughput or data-heavy systems.

  • Experience with Elasticsearch or large-scale data infrastructure.

  • Experience with service meshes.

  • Experience with cost optimization and capacity planning.

  • Experience in regulated or reliability-focused environments.

How You Work

  • You assume infrastructure will fail and design accordingly.

  • You prioritize reliability, visibility, and recoverability.

  • You build systems that engineers trust in production.

  • You automate carefully and deliberately.

  • You are calm and methodical during incidents.

  • You focus on long-term stability over short-term fixes.

  • You document and standardize important processes.

Example Problems You Might Work On

  • Hardening Kubernetes clusters for high availability and safe upgrades.

  • Debugging network latency or connectivity issues across services.

  • Optimizing Cloudflare caching, routing, and edge security rules.

  • Building monitoring pipelines that provide reliable signals.

  • Designing alerting that is actionable and low-noise.

  • Improving deployment reliability and rollback safety.

  • Removing single points of failure in production systems.

  • Ensuring observability across all services and data pipelines.

Why Join Range

  • Join one of the fastest-growing sectors in Web3 as stablecoins reach mass adoption.

  • Competitive compensation with meaningful equity upside.

  • Strong potential for growth and leadership opportunities.

  • Remote-first culture with bi-yearly international off-sites.

  • Opportunities for global conference travel and ecosystem engagement.

  • Health and wellness benefits.

How to Apply
Send us:

  • A short introduction and your background.

  • Examples of infrastructure or platform work you’ve led.

  • Any public write-ups, repositories, or talks (if available).

  • We’re particularly interested in engineers who have built and operated Kubernetes platforms, improved network reliability, and optimized CDN/edge setups such as Cloudflare in production.

Similar jobs