Senior Infrastructure & Operations Engineer (Kubernetes / Platform Reliability)
We’re looking for a senior infrastructure and operations engineer to own and evolve our platform reliability. You’ll design, operate, and maintain our Kubernetes-based infrastructure, build reliable monitoring and alerting pipelines, and ensure our systems remain stable under real-world load and failure conditions. This is a hands-on role for someone with deep experience running production systems at scale and who focuses on making infrastructure predictable and stable. You’ll work across Kubernetes, networking, CI/CD, Cloudflare, and observability to create a platform engineers can trust.
What You’ll Do
Design, deploy, and maintain production Kubernetes clusters.
Own cluster reliability, upgrades, security, and performance.
Build and operate monitoring, logging, and alerting pipelines.
Ensure full-stack observability across infrastructure and services.
Design and maintain CI/CD pipelines that are fast, reproducible, and safe.
Improve deployment strategies (rollouts, canaries, rollbacks).
Automate infrastructure provisioning and configuration.
Investigate and resolve production incidents.
Improve system resilience, redundancy, and recovery strategies.
Define SLOs/SLIs and track reliability targets.
Optimize and maintain our Cloudflare setup (caching, routing, security, edge behavior).
Work closely with engineering teams to improve operational practices.
Identify and remove single points of failure.
What We’re Looking For
Must-have
Senior-level experience operating production infrastructure.
Deep, hands-on expertise with Kubernetes (cluster internals, networking, storage, security).
Strong networking fundamentals (TCP/IP, routing, DNS, TLS, load balancing).
Experience debugging distributed systems and network-related issues.
Experience optimizing CDN and edge setups, including Cloudflare.
Strong experience building monitoring and observability systems.
Experience with metrics, logs, traces, and alerting pipelines.
Experience designing reliable CI/CD pipelines.
Strong Linux fundamentals.
Experience with infrastructure as code and automation.
Ability to debug issues across the entire stack.
Experience handling incidents and conducting postmortems.
Nice-to-have
Experience with multi-cluster or multi-region setups.
Experience with high-throughput or data-heavy systems.
Experience with Elasticsearch or large-scale data infrastructure.
Experience with service meshes.
Experience with cost optimization and capacity planning.
Experience in regulated or reliability-focused environments.
How You Work
You assume infrastructure will fail and design accordingly.
You prioritize reliability, visibility, and recoverability.
You build systems that engineers trust in production.
You automate carefully and deliberately.
You are calm and methodical during incidents.
You focus on long-term stability over short-term fixes.
You document and standardize important processes.
Example Problems You Might Work On
Hardening Kubernetes clusters for high availability and safe upgrades.
Debugging network latency or connectivity issues across services.
Optimizing Cloudflare caching, routing, and edge security rules.
Building monitoring pipelines that provide reliable signals.
Designing alerting that is actionable and low-noise.
Improving deployment reliability and rollback safety.
Removing single points of failure in production systems.
Ensuring observability across all services and data pipelines.
Why Join Range
Join one of the fastest-growing sectors in Web3 as stablecoins reach mass adoption.
Competitive compensation with meaningful equity upside.
Strong potential for growth and leadership opportunities.
Remote-first culture with bi-yearly international off-sites.
Opportunities for global conference travel and ecosystem engagement.
Health and wellness benefits.
How to Apply
Send us:
A short introduction and your background.
Examples of infrastructure or platform work you’ve led.
Any public write-ups, repositories, or talks (if available).
We’re particularly interested in engineers who have built and operated Kubernetes platforms, improved network reliability, and optimized CDN/edge setups such as Cloudflare in production.