Rakuten Cloud Platform (RCP) Engineer (L3 / Infrastructure Specialist), CICD Factory Department (RS GSD Div)
Job Description:
About Organization
Rakuten Symphony is at the forefront of revolutionizing the telecommunications industry, building the future of mobile networks with innovative, cloud-native, and Open RAN solutions. Our mission is to empower operators globally to deliver cutting-edge services efficiently and at scale.
Job Duties
Kernel-Level Optimization: Conduct deep-dive performance tuning (sysctl, tuna, /proc) to optimize RHEL for real-time (RT) workloads.
RAN Workload Tuning: Configure and tune vCU/vDU components, implementing CPU isolation (isolcpus), IRQ affinity, and NUMA alignment to achieve deterministic performance.
Network Acceleration: Optimize packet processing using DPDK and SR-IOV to bypass kernel bottlenecks for high-throughput RAN traffic.
Advanced Troubleshooting: Lead Root Cause Analysis (RCA) for high-severity incidents using advanced tracing tools (ftrace, trace-cmd, perf).
Automation & Hardening: Develop complex Ansible playbooks to automate kernel hardening and performance profile deployments across large-scale clusters.
Lifecycle Management: Manage vulnerability patching, OS upgrades, and hardware-software interaction (Intel RDT, CAT, BIOS-level tuning).
Virtualization: Expert management of KVM/QEMU, Red Hat OpenShift Virtualization, and containerized vDU pods within Kubernetes.
Technical Skills & Tools
Core: Kubernetes, Linux (RHEL 8/9, Rocky Linux), Docker, Helm, Git
Hardware: Supermicro, Dell iDRAC, HP
Automation: Ansible, Python, Shell
Monitoring/Diagnostics: top, htop, vmstat, iostat, pqos, ftrace, perf
Certifications: RHEL, CKA (Certified Kubernetes Administrator)
Minimum Qualifications
Experience: 8–12 years of relevant experience in telecommunications infrastructure engineering or SRE, with a proven track record in large-scale, complex environments.
Kubernetes & Virtualization: Deep knowledge of bare-metal Kubernetes deployments and virtualization in development/production environments.
Networking: Expertise in host-level networking, DNS/DHCP architecture, TLS management, load balancing, and advanced container networking.
SRE Practices: Proven application of SRE principles, including major incident leadership, capacity/scalability strategy, and advanced change management.
Security: Strong architectural knowledge of security hardening, RBAC models, pod security mechanisms, and secret management.
Technical Depth: In-depth knowledge of process scheduling (SCHED_DEADLINE, SCHED_FIFO), memory management, and I/O schedulers.
Languages:
English (Overall - 3 - Advanced)