Senior DevOps Engineer
You will scale and operate infrastructure that powers millions of trades daily across CeFi and DeFi venues. You will ensure system performance, reliability, and security; participate in a rotational on-call schedule and handle real-time incidents; build and maintain infrastructure-as-code with Terraform and configuration management with Ansible; instrument and use observability tools such as Prometheus and Grafana for incident response and capacity planning; optimize throughput and reduce latency at compute, network, storage, and application layers; implement secure CI/CD pipelines, IAM controls and threat detection; and collaborate closely with developers, traders, and security teams to deliver reliable production systems.
Responsibilities
- Ensure performance and reliability of mission-critical trading infrastructure by proactively identifying and resolving bottlenecks across compute, network, storage, and application layers
- Support operate and continuously improve highly available low-latency systems under global trading load with a strong focus on automation and self-service
- Participate in a daytime rotational on-call schedule and handle real-time operational incidents and root cause analysis
- Use metrics and observability tools such as Prometheus and Grafana to detect anomalies monitor performance trends and support incident response
- Build manage and scale infrastructure-as-code using Terraform and configuration management tools like Ansible
- Collaborate with trading engineering and security teams to align infrastructure improvements with application demands
- Implement robust security practices including IAM threat detection and secure CI/CD pipelines
- Apply low-level optimization strategies to improve throughput reduce latency and increase system efficiency
Requirements
- 5+ years in DevOps or SRE roles focused on high-performance production-grade systems
- Proficiency in Linux administration
- Experience managing AWS cloud infrastructure
- Proficiency with containerized workloads with Kubernetes
- Strong scripting and automation skills in Python and Bash
- Knowledge of Go JavaScript or TypeScript is a plus
- Hands-on experience with observability stacks such as Prometheus and Grafana
- Hands-on production incident response experience
- Experience with Google Cloud and Azure is desirable
- Experience with automation and IaC tools including Ansible and Terraform
- Understanding of performance engineering system and network tuning profiling and capacity planning
- Knowledge of cloud networking security best practices IAM and SIEM tools
Benefits
- Competitive compensation
- Flexible working environment
- Work with a global high-performance team spanning APAC Europe and US