Site Reliability Engineer
You will keep the production platform reliable, observable, and operable as the system scales. You will operate production on call duties and incident response, define and refine SLIs SLOs and error budgets, and help product teams stay within them. You will strengthen observability across metrics, logs, traces, and alerting, and ship infrastructure through code in a GitOps workflow. You will look after PostgreSQL with performance tuning, online migrations on large tables, HA DR, and CDC pipelines. You will mentor engineers on reliability and database fundamentals through code reviews and pairing.
Responsibilities
- Operate production day-to-day - oncall, incident response, postmortems, and the follow-ups that actually close the loop.
- Own reliability practice - define and refine SLIs/SLOs and error budgets, and help product teams live within them.
- Strengthen our observability across metrics, logs, traces, and alerting.
- Ship infrastructure through code in a GitOps workflow - cloud resources and Kubernetes workloads alike.
- Look after PostgreSQL: performance tuning, schema and migration review, online migrations on large tables, HA/DR, and CDC pipelines.
- Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.
Requirements
- 4+ years in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
- Hands-on experience operating production services on Kubernetes, and shipping infrastructure as code in a GitOps workflow.
- Solid working knowledge of PostgreSQL in production — query plans, pg_stat_*, indexing and schema trade-offs, and what a safe online migration looks like on a non-trivial table.
- Cloud networking fundamentals (VPCs, routing, L4/L7 load balancing, DNS, TLS) and comfort debugging cross-service connectivity.
- Comfortable with a modern observability stack and proficient with Linux at the operator level.
- Practiced in incident response - calm under pressure, structured debugging, postmortems that drive change.
- At least working proficiency in Go or Python, plus strong written and verbal communication.
- Genuine interest in databases and in growing your PostgreSQL DBA expertise.
Benefits
- Stock options
- Health benefits
- New Hire Home-Office Setup (USD 500 one-time)
- Monthly stipend (USD 150 per month via Brex Card)