Site Reliability Engineer — Info Apps

As a SRE, you won’t just be responding to alerts; you will be shaping the evolution of our observability strategy, a mentor for incident management, and a champion for automation. You will help us refine our "Golden Signals" and ensure our Kubernetes-based ecosystem remains world-class. Minimum Qualifications Experience: 5+ years in SRE, DevOps, or Infrastructure roles with a proven track record of managing high-traffic, internet-facing production environments. Kubernetes Expertise: Deep experience building and operating container orchestration systems (EKS/GKE/Vanilla K8s). You should be comfortable troubleshooting from the networking layer up to the application pod. Observability Champion: Expert knowledge of the 4 Golden Signals (Latency, Traffic, Errors, and Saturation). Proficiency with tools like Prometheus, Grafana, and Splunk is essential. Cloud Proficiency: Hands-on experience designing and maintaining resilient infrastructure on public cloud providers (AWS, GCP, or Azure). Scripting & Automation: Strong ability to code at a scripting level (Python or Go preferred) to automate toil and build self-healing systems. Incident Leadership: Experience leading incident response, performing Root Cause Analysis (RCA), and implementing blameless post-mortems to improve system resilience. Infrastructure as Code: Proficient in Terraform, CloudFormation, or Pulumi to manage immutable infrastructure. Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience) Preferred Qualifications Search & Data: Specialized experience operating and tuning Solr or Elasticsearch at scale. Networking: Strong understanding of TCP/IP, Load Balancing (ELB/ALB), and Service Mesh (Istio/Linkerd). Data Systems: Experience with Kafka, Cassandra, or Postgres in a distributed environment.

Site Reliability Engineer — Info Apps

Similar jobs

Site Reliability Engineer, Enterprise Technology Services

Site Reliability Engineer, Teamcenter, Enterprise Technology Services

Site Reliability Engineer (SRE)

Site Reliability Engineer (Edge Services), Infrastructure Services

Senior Site Reliability Engineer - Apple Information Security

Site Reliability Engineer, Enterprise Technology Services