Senior Software Engineer - Command Center

You will lead incident response and operational reliability work for company-wide services. You will coordinate mitigation efforts during active incidents, define and maintain global dashboards and alerts, and build reliability tooling and processes to reduce customer impact. You will drive post-incident reviews and follow-up tracking, design failure mitigation strategies, and mentor peers while delivering executive-level reliability reporting.

Responsibilities

  • Drive long-term reliability and observability strategy
  • Partner across engineering teams to raise operational excellence
  • Lead incident mitigation and coordinate service owners during active incidents
  • Define and maintain global dashboards and alerts tied to critical user journeys
  • Develop and maintain incident management processes and procedures
  • Own and evolve incident response tooling and adoption
  • Drive post-incident governance, postmortems, and follow-up tracking
  • Design and implement failure mitigation strategies to avoid full-region failovers
  • Define frameworks to improve monitoring, alerting, and observability
  • Deliver executive-level reporting on service quality and reliability
  • Mentor engineers and contribute to hiring and engineering culture

Requirements

  • 5+ years of software engineering experience including operating production systems
  • 2+ years focused on reliability engineering, infrastructure, distributed systems, or production operations
  • Hands-on incident leadership experience (e.g., incident commander, IMOC, primary on-call)
  • Strong communication and cross-functional collaboration skills during high-severity incidents
  • Deep knowledge of systems reliability, observability frameworks, and fault-tolerant design
  • Experience with multi-region or multi-cluster architectures, capacity planning, and failover strategies
  • Familiarity with modern observability stacks such as OpenTelemetry, Prometheus, and Grafana
  • Demonstrated ability to drive improvements in MTTD, MTTR, availability, or customer impact

Benefits

  • Performance driven compensation with multipliers, bonus programs, and equity ownership
  • 401(k) matching
  • 100% paid health insurance for employees and 90% coverage for dependents
  • Lifestyle wallet benefits spending account for wellness and learning
  • Employer-paid life and disability insurance
  • Fertility benefits and mental health benefits
  • Paid time off, company holidays, sick time, and parental leave
  • Exceptional office experience with catered meals and events

Similar jobs