Site Reliability Developer 3

Oracle Cloud Infrastructure (OCI) is building the next generation of cloud services to support mission-critical workloads for customers across Japan. As a Site Reliability Developer (IC3), you will help operate and improve the reliability, scalability, and performance of the Japan Sovereign Cloud platform. Working closely with software engineering, cloud operations, and global OCI teams, you will leverage software engineering principles to automate operations, resolve complex production issues, and enhance service resiliency.

This role includes an initial hands-on operational learning period to understand 24x7 shift workflows, alerts, incidents, escalation paths, runbooks, and customer-impacting reliability risks. This role requires participation in a 24x7 shift rotation and collaboration across both Japanese and international teams. You will partner with shift teams to capture recurring operational issues, improve alert actionability, maintain operational documentation, and contribute practical fixes through tooling, automation, and process improvements.

Qualifications

- Bachelor’s degree in computer science, Engineering, Information Technology, or equivalent practical experience
- Native-level Japanese language proficiency and business-level English communication skills
- 2+ years of experience in Site Reliability Engineering, Systems Engineering, Cloud Operations, DevOps, or Software Development and experience supporting Linux-based production environments
- Knowledge of cloud computing, networking, distributed systems, and automation technologies
- Experience with scripting or programming languages such as Python, Java, Go, Shell, or similar
- Willingness to participate in a 24x7 on-call and shift-based operational support model
- Ability to learn day-to-day sovereign cloud operations, follow shift procedures, and identify recurring operational pain points
- Ability to improve runbooks, alert response guidance, and operational handoff quality

Career Level - IC3

Similar jobs