Site Reliability Engineer, Enterprise Technology Services
As Site Reliability and Operations Engineer (SRE), you’ll be part of the action—working closely with application teams to automate operations, optimize infrastructure, and solve issues in an exciting, fast-paced environment. You’ll play a vital role in ensuring that our systems are reliable, scalable, and high-performing.
THIS ROLE IS DESIGNED FOR DRIVEN INDIVIDUALS WHO:
- Love learning new technologies and thrive in solving sophisticated challenges.
- Are independent, motivated, and excited to take on ambitious projects.
- Excel at collaborating with engineering teams and can stay calm under pressure.
- Have a passion for delivering quality, reliable solutions in a dynamic, high-energy workplace
We are seeking dedicated Site Reliability Engineers (SREs) at all levels of experience, from junior to senior, to join our teams.
Minimum Qualifications
5+ years of experience in Site Reliability Engineering, DevOps, Software Engineering, or a related field
experience with (Java) or scripting (Python / Bash / LUA)
Hands on experience in one or more databases (Relational / NoSQL like Oracle, MongoDB)
Education: Bachelor’s or Master’s degree in Computer Science or a related field (equivalent practical experience)
Preferred Qualifications
Hands on experience with monitoring and logging tools (e.g., Prometheus, Splunk, Grafana, CloudWatch)
Proficient in Linux, Networking concepts (TLS/SSL, DNS, Load Balancers, etc..) and troubleshooting skills in large scale environments
Source control management such as Git / Understanding of CI/CD, Release Engineering and DevOps
Understanding of security standards, policies, and cryptography
Experience with Incident / Problem management and RCA
Strong Network, Load Balancing (Nginx, Envoy, NetScaler) experience is a huge plus
Experience in webMethods development and administration
Good solid understanding using Kubernetes concepts such as networking, Storage, Secrets, Deployments, Containers. AWS or GCP are preferred.
Knowledge or experience in Governance and Compliance.
Understanding of SRE principles, including observability, error budgeting, service reliability measurements through SLA & SLO & SLI, corresponding telemetry standards and practices, and product feedback.