SRE/DevOps Engineer
We are looking for a SRE / DevOps Engineer to build and scale enterprise\-grade cloud platforms. This is a balanced role (70% engineering, 30% operations) focused on:
<\/span><\/div>
<\/span><\/div>
- building reliable and scalable cloud infrastructure,<\/span>
<\/span><\/li>- driving automation and platform engineering,<\/span>
<\/span><\/li>- improving observability and operational maturity,<\/span>
<\/span><\/li>- enabling resilient production systems.<\/span><\/span>
<\/li><\/ul>This role is ideal for engineers who are curious about how systems behave in production, enjoy debugging and automation, and want to grow into strong Site Reliability Engineers over time.
<\/div>Work closely with senior engineers and platform teams to improve reliability, scalability, deployment workflows, and production operations across enterprise\-grade cloud platforms.
<\/div>
<\/div>Key Responsibilities<\/b>
<\/div>- Design and implement scalable AWS infrastructure for production systems
<\/span><\/li>- Build Infrastructure\-as\-Code modules for consistent and reproducible environments
<\/span><\/li>- Develop and maintain CI/CD pipelines for deployment, testing, and validation
<\/span><\/li><\/ul>Build automation for: <\/b>deployment workflows, system health verification, smoke testing, recovery validation and operational efficiency
<\/span><\/div>- Contribute to monitoring, alerting, logging, and observability systems
<\/span><\/li>- Participate in production issue debugging, incident response, and system stability improvements
<\/span><\/li>- Collaborate across engineering teams to improve platform capabilities and operational maturity
<\/span><\/li>- Contribute to scalable and cost\-aware cloud infrastructure design alongside senior engineers
<\/span><\/li>- Improve reliability and reduce operational toil through scripting, automation, and reusable tooling
<\/span><\/li>- Work on secure, resilient, and highly available cloud environments
<\/span><\/li>- Support modernization and improvement of existing systems without disrupting production stability
<\/span><\/li><\/ul>
<\/div>
<\/span><\/div><\/span>Requirements<\/h3>
- 2\u20134 years of experience in DevOps / SRE / Cloud Engineering roles
<\/span><\/li><\/ul>Strong hands\-on experience with:<\/b>
<\/div>AWS production environments
<\/div>Infrastructure\-as\-Code (Terraform or CloudFormation)
<\/div>CI/CD pipelines (Jenkins, GitHub Actions, or similar)
<\/div>
<\/div>- Strong scripting/programming skills in Python or Bash (must)
<\/span><\/li>- Proven experience with debugging production issues, improving system stability, automating infrastructure or operational workflows, and working with cloud\-native systems.
<\/span><\/li>- Good understanding of distributed systems, cloud architecture, observability, scalability, and cost\-aware infrastructure practices.
<\/span><\/li>- Familiarity with containerized environments such as Docker, Kubernetes, or ECS
<\/span><\/li>- Strong problem\-solving mindset with willingness to learn and take ownership
<\/span><\/li>- Good communication and collaboration skills<\/span>
<\/li><\/ul>Tech Stack<\/b>
<\/div>- AWS (RDS, Lambda, EventBridge, ECS/Kubernetes, CloudWatch, IAM, VPC)
<\/span><\/li>- Terraform / CloudFormation (IaC)
<\/span><\/li>- CI/CD: Jenkins, GitHub Actions
<\/span><\/li>- Observability: CloudWatch, Prometheus, Grafana
<\/span><\/li>- Scripting/Development: Python, Bash (Node.js a plus)
<\/span><\/li>- Chaos Engineering tools (AWS FIS, Gremlin, etc.) are good to have, not mandatory
<\/span><\/li><\/ul>Good to Have<\/b>
<\/div>- Exposure to production incident handling or on\-call support
<\/span><\/li>- Experience with Kubernetes or ECS
<\/span><\/li>- Exposure to monitoring, alerting, and observability tooling
<\/span><\/li>- Basic understanding of reliability engineering concepts
<\/span><\/li>- Exposure to database operations, backup/recovery, or disaster recovery concepts
<\/span><\/li>- Background in backend engineering before moving to DevOps/SRE
<\/span><\/li>- Curiosity toward automation, reliability engineering, and platform scalability
<\/span><\/li><\/ul>
<\/div><\/span>Benefits<\/h3>
- Opportunity to work on large\-scale cloud platforms and mission\-critical systems
<\/span><\/li>- Work closely with experienced SRE and platform engineering teams
<\/span><\/li>- Exposure to advanced areas such as Digital Twin, AI/ML systems, and cloud\-native architectures
<\/span><\/li>- Opportunity to grow into reliability engineering and platform ownership roles
<\/span><\/li>- Work with a collaborative and engineering\-focused team culture
<\/span><\/li>- Be part of a company passionate about solving real engineering problems through technology
<\/span><\/span><\/span><\/li><\/ul>
<\/div><\/span> - Work closely with experienced SRE and platform engineering teams
- Experience with Kubernetes or ECS
- Terraform / CloudFormation (IaC)
- Proven experience with debugging production issues, improving system stability, automating infrastructure or operational workflows, and working with cloud\-native systems.
- Strong scripting/programming skills in Python or Bash (must)
- Participate in production issue debugging, incident response, and system stability improvements
- Build Infrastructure\-as\-Code modules for consistent and reproducible environments
- driving automation and platform engineering,<\/span>