Site Reliability Engineer — Human Engineering

The Human Engineering Software team builds tools used across Apple for user studies, research participant management, health data collection, and privacy-preserving analytics. Our infrastructure spans Django backends, Kubernetes clusters (self-hosted and AWS), PostgreSQL, Redis, Kafka, Elasticsearch and a growing set of internal service integrations. This role is engineering-forward SRE. You'll spend as much time designing systems as operating them. You'll work closely with our full-stack engineers to improve how services communicate, how we observe production behavior, and how we ship changes safely. You'll have a seat at the architecture table — we want you proposing solutions, not just implementing them. Minimum Qualifications BS in Computer Science, Engineering, or equivalent practical experience, with 3+ years of experience in distributed systems Deep experience with Kubernetes in production — cluster operations, networking, storage, troubleshooting Strong proficiency designing and operating services in AWS (EC2, EKS, RDS, S3, IAM, VPC) Hands-on infrastructure-as-code experience (Terraform, Helm, or equivalent) Proficiency in at least one backend language (Python, Go, or similar) — you can write production services, not just scripts Experience with CI/CD pipeline design and GitOps workflows Strong understanding of networking fundamentals: DNS, load balancing, TLS, firewall rules, service discovery Excellent communication skills. You can explain a complex system to a room of engineers who didn't build it Experience building internal automation or self-service tooling (Slack bots, CLI tools, workflow orchestration) that reduced manual operational work Preferred Qualifications BS in Computer Science, Engineering, or equivalent practical experience, with 5+ years of experience in distributed systems Experience with event-driven architectures (Kafka, RabbitMQ, or similar messaging systems) Experience with service mesh or API gateway patterns (Istio, Envoy, Kong, or similar) Familiarity with Django/Python web applications and their operational characteristics (Celery, Gunicorn, PostgreSQL) Experience with observability tooling beyond basic monitoring: distributed tracing, SLO frameworks, structured logging Background working with sensitive data (health data, PII) and associated compliance requirements Experience leading incident response and building on-call culture Contributions to internal or open-source infrastructure tooling

Site Reliability Engineer — Human Engineering

Similar jobs

Site Reliability Engineer — Human Engineering

Site Reliability Engineer, Physical Infrastructure

Site Reliability Engineering Manager, Apple Data Platform

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer