DevOps Engineer / Site Reliability Engineer (SRE)

Role Overview

We are seeking a highly skilled DevOps / SRE Engineer to design, implement, and maintain scalable, secure, and reliable infrastructure. The ideal candidate should have strong expertise in GCP Cloud, Kubernetes, Jenkins pipelines, Git, Jboss/Wildfly and Linux, with a solid understanding of automation, monitoring, and CI/CD practices.

Key Responsibilities

Design, implement, and manage cloud infrastructure on Google Cloud Platform (GCP).
Build and maintain CI/CD pipelines using Jenkins and other automation tools.
Deploy, manage, and scale applications using Kubernetes (GKE preferred).
Manage source code and workflows using Git-based repositories (GitHub/GitLab/Bitbucket).
Ensure system reliability, performance, and scalability using SRE principles.
Automate provisioning and configuration using Infrastructure as Code (IaC) tools.
Monitor system health, logs, and metrics using tools like Prometheus, Grafana, Stackdriver (Cloud Monitoring).
Troubleshoot production issues and provide root cause analysis.
Maintain and optimize Linux-based systems and environments.
Work closely with development teams to improve deployment frequency and system stability.
Implement security best practices across infrastructure and pipelines.

Core Technical Skills

Cloud & Infrastructure

Strong hands-on experience with Google Cloud Platform (GCP):
Compute Engine, GKE, Cloud Storage, IAM, VPC
Experience with Infrastructure as Code tools like:
Terraform / Deployment Manager

Containerization & Orchestration

Deep expertise in:
Docker
Kubernetes (K8s) – deployment, scaling, networking, troubleshooting

CI/CD & Automation

Strong experience with:
Jenkins (Pipeline as Code)
CI/CD pipeline design and optimization
Familiarity with tools like:
ArgoCD / Spinnaker (nice to have)

Version Control

Proficiency in:
Git (branching, merging strategies, pull requests)

Operating Systems

Strong knowledge of:
Linux (Ubuntu, RHEL, CentOS) system administration
Shell scripting (Bash, Python preferred)

Monitoring & Logging

Experience with:
Prometheus, Grafana
Google Cloud Monitoring / Logging
ELK stack (Elasticsearch, Logstash, Kibana)

SRE Practices

Implement SLIs, SLOs, SLAs
Incident management and postmortem analysis
Reliability, observability, and performance tuning
Error budgeting and capacity planning

Nice-to-Have Skills

Experience with microservices architecture
Knowledge of service mesh (Istio, Linkerd)
Exposure to security tools (Vault, IAM roles, secrets management)
Experience with scripting/programming languages like:
Python / Go

Soft Skills

Strong problem-solving and analytical mindset
Excellent collaboration and communication skills
Ability to work in an agile environment
Ownership and accountability for production systems

Education

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)

Certifications (Optional but Preferred)

Google Cloud Professional DevOps Engineer
Google Associate Cloud Engineer
Certified Kubernetes Administrator (CKA)

Key Deliverables

Highly available, scalable infrastructure
Efficient CI/CD pipelines with minimal downtime
Reliable monitoring & incident response systems
Continuous improvement of system performance and deployment processes

DevOps Engineer / Site Reliability Engineer (SRE)

Similar jobs

Principal DevOps Engineer

SRE DevOps Engineer

Sr DevOps Engineer| USA

SRE DevOps Engineer

Senior Site Reliability Engineer

Jr. - Mid Level DevOps Engineer