SRE III

SEIII/SRE Engineer
Responsibilities:

  • You will build and operate distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.

  • You will help build and operate a unified platform across EA, extract and process massive data from spanning 20+ game studios, and use the insight to serve massive online requests

  • You will use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.

  • You will perform root cause analysis and post-mortems with an eye towards future prevention.

  • You will design and build CI/CD pipelines.

  • You will create monitoring, alerting and dashboarding solutions that improve visibility into EA's application performance and business metrics.

  • You will produce documentation and support tooling for online support teams.

  • You will develop reporting systems that inform on important metrics, detect anomalies, and forecast future results

  • Develop and Operate both SQL and NoSQL solutions

  • You will build complex queries to solve data mining problems

  • You will develop large-scale online platform to personalize player experience and provide reporting and feedback

  • You will help in interviewing and hiring the best candidates for the team

  • You will help mentor the team members and help them grow in their skillsets

  • You will be responsible for driving growth and modernization efforts and projects for the team


Qualifications:

  • 7+ years of experience with Virtualization, Containerization, Cloud Computing (AWS preferred), VMWare ecosystems, Kubernetes, or Docker.

  • 7+ years of experience supporting high-availability production-grade Data infrastructure and applications with defined SLIs and SLOs.

  • Systems Administration or Cloud experience, including a strong understanding of Linux / Unix.

  • Network experience, including an understanding of standard protocols/components.

  • Automation and orchestration experience including Terraform, Helm, Chef, Packer.

  • Experience writing code in Python, Golang, or Java.

  • Experience with Monitoring tech stack like Prometheus, Grafana, Loki, Alertmanager

  • Experience with distributed system to serve massive concurrent requests

  • Experience working with large-scale systems and data platforms/warehouses