DevOps Engineer / Site Reliability Engineer (SRE)
Role Overview
We are seeking a highly skilled DevOps / SRE Engineer to design, implement, and maintain scalable, secure, and reliable infrastructure. The ideal candidate should have strong expertise in GCP Cloud, Kubernetes, Jenkins pipelines, Git, Jboss/Wildfly and Linux, with a solid understanding of automation, monitoring, and CI/CD practices.
Key Responsibilities
- Design, implement, and manage cloud infrastructure on Google Cloud Platform (GCP).
- Build and maintain CI/CD pipelines using Jenkins and other automation tools.
- Deploy, manage, and scale applications using Kubernetes (GKE preferred).
- Manage source code and workflows using Git-based repositories (GitHub/GitLab/Bitbucket).
- Ensure system reliability, performance, and scalability using SRE principles.
- Automate provisioning and configuration using Infrastructure as Code (IaC) tools.
- Monitor system health, logs, and metrics using tools like Prometheus, Grafana, Stackdriver (Cloud Monitoring).
- Troubleshoot production issues and provide root cause analysis.
- Maintain and optimize Linux-based systems and environments.
- Work closely with development teams to improve deployment frequency and system stability.
- Implement security best practices across infrastructure and pipelines.
Core Technical Skills
Cloud & Infrastructure
- Strong hands-on experience with Google Cloud Platform (GCP):
- Compute Engine, GKE, Cloud Storage, IAM, VPC
- Experience with Infrastructure as Code tools like:
- Terraform / Deployment Manager
Containerization & Orchestration
- Deep expertise in:
- Docker
- Kubernetes (K8s) – deployment, scaling, networking, troubleshooting
CI/CD & Automation
- Strong experience with:
- Jenkins (Pipeline as Code)
- CI/CD pipeline design and optimization
- Familiarity with tools like:
- ArgoCD / Spinnaker (nice to have)
Version Control
- Proficiency in:
- Git (branching, merging strategies, pull requests)
Operating Systems
- Strong knowledge of:
- Linux (Ubuntu, RHEL, CentOS) system administration
- Shell scripting (Bash, Python preferred)
Monitoring & Logging
- Experience with:
- Prometheus, Grafana
- Google Cloud Monitoring / Logging
- ELK stack (Elasticsearch, Logstash, Kibana)
SRE Practices
- Implement SLIs, SLOs, SLAs
- Incident management and postmortem analysis
- Reliability, observability, and performance tuning
- Error budgeting and capacity planning
Nice-to-Have Skills
- Experience with microservices architecture
- Knowledge of service mesh (Istio, Linkerd)
- Exposure to security tools (Vault, IAM roles, secrets management)
- Experience with scripting/programming languages like:
- Python / Go
Soft Skills
- Strong problem-solving and analytical mindset
- Excellent collaboration and communication skills
- Ability to work in an agile environment
- Ownership and accountability for production systems
Education
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
Certifications (Optional but Preferred)
- Google Cloud Professional DevOps Engineer
- Google Associate Cloud Engineer
- Certified Kubernetes Administrator (CKA)
Key Deliverables
- Highly available, scalable infrastructure
- Efficient CI/CD pipelines with minimal downtime
- Reliable monitoring & incident response systems
- Continuous improvement of system performance and deployment processes