Site Reliability Engineers

Site Reliability Engineers - Multiple Openings

The Role:

You will join a team working with Observability, Escalations, Post-mortems, Correction of Errors, and other practices that will contribute to the company's goal of cloud resiliency. You will be responsible for driving processes around reliability, best practices, cultural change, and enforcement of these practices.

The main responsibilities of the position include:

Honor and practice the Resiliency pillar of the Well Architected Framework in all tasks and responsibilities

Conduct Chaos Engineering experiments and relevant exercises to improve resiliency and fault-tolerance

Research workloads for migrating to the cloud with minimal disruption and impact

Monitor cloud migration projects to ensure seamless transitions

Design, consult, re-platform, and re-factor the observability of current cloud infrastructure

Coordinate with other IT departments and teams regarding observability for both individual and organizational needs

Regularly assess cloud deployments for compliance with the company’s standards and best practices

Investigate and correct areas where observability is lagging

Stay up to date and provide training on new and current technologies, services, tools, methodologies, and practices

Occasionally participate in service capacity planning, software performance analysis, and system tuning

Mentor colleagues in technical skills and knowledge

Analyze, oversee, and remediate the company’s resiliency

Participate in on-call support 24/7 based on a rotation schedule

Main requirements:

BSc/MSc degree in Computer Science or related field

5+ years of cloud services experience, with at least 3 years on AWS cloud

3+ years of experience in SRE or a similar role

Experience with monitoring, APM, logging, and notification tools

Familiarity with incident, problem and change management procedures and practices

Advanced knowledge of SRE practices and methods

Understanding and practice of Service Levels

Strong troubleshooting skills and the ability to mentor others

Extensive experience with Kubernetes and related technologies, services, and ecosystem

Advanced knowledge of CI/CD, Infrastructure as Code (IaC) concepts and tools, especially HCL Terraform and AWS CloudFormation

Experience with versioning tools like Git

Strong organizational and documentation skills

Exceptional time management and research abilities

Advanced Linux, networking, and scripting skills

The following will be considered an advantage:

Experience with platforms like Kafka (MSK)

Experience with RDBMSs, particularly Postgres and MySQL

Knowledge of scripting languages such as Python or Go

Benefit from:

Attractive remuneration package and perks

Intellectually stimulating work environment

Continuous personal development and international training opportunities

The Hiring Experience: What Awaits You

Show Your Skills – Online Technical Challenge

Let’s Connect – Intro Chat with Talent Acquisition

Deep Dive – First Interview with Your Future Team

Final Connection – Final Interview

All applications will be treated with strict confidentiality!

Site Reliability Engineers

The main responsibilities of the position include:

Main requirements:

The following will be considered an advantage:

Benefit from:

The Hiring Experience: What Awaits You

Similar jobs

Site Reliability Engineer

Site Reliability Engineer - Vice President

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer - SRE