Senior DevOps Engineer

You will scale and operate infrastructure that powers millions of trades daily across CeFi and DeFi venues. You will ensure system performance, reliability, and security; participate in a rotational on-call schedule and handle real-time incidents; build and maintain infrastructure-as-code with Terraform and configuration management with Ansible; instrument and use observability tools such as Prometheus and Grafana for incident response and capacity planning; optimize throughput and reduce latency at compute, network, storage, and application layers; implement secure CI/CD pipelines, IAM controls and threat detection; and collaborate closely with developers, traders, and security teams to deliver reliable production systems.

Responsibilities

  • Ensure performance and reliability of mission-critical trading infrastructure by proactively identifying and resolving bottlenecks across compute, network, storage, and application layers
  • Support operate and continuously improve highly available low-latency systems under global trading load with a strong focus on automation and self-service
  • Participate in a daytime rotational on-call schedule and handle real-time operational incidents and root cause analysis
  • Use metrics and observability tools such as Prometheus and Grafana to detect anomalies monitor performance trends and support incident response
  • Build manage and scale infrastructure-as-code using Terraform and configuration management tools like Ansible
  • Collaborate with trading engineering and security teams to align infrastructure improvements with application demands
  • Implement robust security practices including IAM threat detection and secure CI/CD pipelines
  • Apply low-level optimization strategies to improve throughput reduce latency and increase system efficiency

Requirements

  • 5+ years in DevOps or SRE roles focused on high-performance production-grade systems
  • Proficiency in Linux administration
  • Experience managing AWS cloud infrastructure
  • Proficiency with containerized workloads with Kubernetes
  • Strong scripting and automation skills in Python and Bash
  • Knowledge of Go JavaScript or TypeScript is a plus
  • Hands-on experience with observability stacks such as Prometheus and Grafana
  • Hands-on production incident response experience
  • Experience with Google Cloud and Azure is desirable
  • Experience with automation and IaC tools including Ansible and Terraform
  • Understanding of performance engineering system and network tuning profiling and capacity planning
  • Knowledge of cloud networking security best practices IAM and SIEM tools

Benefits

  • Competitive compensation
  • Flexible working environment
  • Work with a global high-performance team spanning APAC Europe and US

Similar jobs