DevOps Engineer / Site Reliability Engineer (SRE)

Role Overview

We are seeking a highly skilled DevOps / SRE Engineer to design, implement, and maintain scalable, secure, and reliable infrastructure. The ideal candidate should have strong expertise in GCP Cloud, Kubernetes, Jenkins pipelines, Git, Jboss/Wildfly and Linux, with a solid understanding of automation, monitoring, and CI/CD practices.

Key Responsibilities

  • Design, implement, and manage cloud infrastructure on Google Cloud Platform (GCP).
  • Build and maintain CI/CD pipelines using Jenkins and other automation tools.
  • Deploy, manage, and scale applications using Kubernetes (GKE preferred).
  • Manage source code and workflows using Git-based repositories (GitHub/GitLab/Bitbucket).
  • Ensure system reliability, performance, and scalability using SRE principles.
  • Automate provisioning and configuration using Infrastructure as Code (IaC) tools.
  • Monitor system health, logs, and metrics using tools like Prometheus, Grafana, Stackdriver (Cloud Monitoring).
  • Troubleshoot production issues and provide root cause analysis.
  • Maintain and optimize Linux-based systems and environments.
  • Work closely with development teams to improve deployment frequency and system stability.
  • Implement security best practices across infrastructure and pipelines.

Core Technical Skills

Cloud & Infrastructure

  • Strong hands-on experience with Google Cloud Platform (GCP):
  • Compute Engine, GKE, Cloud Storage, IAM, VPC
  • Experience with Infrastructure as Code tools like:
  • Terraform / Deployment Manager

Containerization & Orchestration

  • Deep expertise in:
  • Docker
  • Kubernetes (K8s) – deployment, scaling, networking, troubleshooting

CI/CD & Automation

  • Strong experience with:
  • Jenkins (Pipeline as Code)
  • CI/CD pipeline design and optimization
  • Familiarity with tools like:
  • ArgoCD / Spinnaker (nice to have)

Version Control

  • Proficiency in:
  • Git (branching, merging strategies, pull requests)

Operating Systems

  • Strong knowledge of:
  • Linux (Ubuntu, RHEL, CentOS) system administration
  • Shell scripting (Bash, Python preferred)

Monitoring & Logging

  • Experience with:
  • Prometheus, Grafana
  • Google Cloud Monitoring / Logging
  • ELK stack (Elasticsearch, Logstash, Kibana)

SRE Practices

  • Implement SLIs, SLOs, SLAs
  • Incident management and postmortem analysis
  • Reliability, observability, and performance tuning
  • Error budgeting and capacity planning

Nice-to-Have Skills

  • Experience with microservices architecture
  • Knowledge of service mesh (Istio, Linkerd)
  • Exposure to security tools (Vault, IAM roles, secrets management)
  • Experience with scripting/programming languages like:
  • Python / Go

Soft Skills

  • Strong problem-solving and analytical mindset
  • Excellent collaboration and communication skills
  • Ability to work in an agile environment
  • Ownership and accountability for production systems

Education

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)

Certifications (Optional but Preferred)

  • Google Cloud Professional DevOps Engineer
  • Google Associate Cloud Engineer
  • Certified Kubernetes Administrator (CKA)

Key Deliverables

  • Highly available, scalable infrastructure
  • Efficient CI/CD pipelines with minimal downtime
  • Reliable monitoring & incident response systems
  • Continuous improvement of system performance and deployment processes

Similar jobs