IT Service Manager

Incident Management

  • Manage the Incident Management (IM) function, ensuring timely communication and coordination, including coverage outside regular business hours when required, as well as participation in a rotational on-call schedule during weekends.

  • Oversee and continuously improve the Incident Management process, ensuring it is well documented, understood, and consistently followed.

  • Monitor performance metrics and dashboards to identify opportunities for process improvement and drive enhancements.

  • Serve as the escalation point for high-impact incidents and lead Major Incident Management activities, including the formal declaration of major incidents and activation of the Major Incident Team (War Room/Virtual Command Centre).

  • Coordinate resolution and recovery efforts, ensuring efficient utilization of resources and clear communication with stakeholders.

  • Update and maintain accurate and detailed incident records within the ITSM tool, capturing key events, decisions, and actions.

Problem Management

  • Manage the Problem Management function to strengthen ITSM/ITSO practices.

  • Perform Root Cause Analysis (RCA) using methodologies such as 5 Whys, Ishikawa, and Kepner-Tregoe.

  • Present RCA findings and permanent fixes in the Problem Management Review Forum, and drive continuous improvement across Technology Service Operations.

  • Train and support team members on Problem Management processes, with a strong emphasis on RCA methodologies, processes, and protocols.

  • Partner with the ServiceNow team to enhance the ITSM Problem Management module in alignment with ITIL 4 standards.

  • Conduct lessons-learned sessions to strengthen prevention and response strategies.

Job Requirements

  • A minimum of 6 years of experience in the IT Service Management space, operating as an Incident Manager or Problem Manager.

  • Lead and manage the Problem Management function to strengthen overall ITSM/ITSO practices and enhance service reliability.

  • Perform structured Root Cause Analysis (RCA) using proven methodologies such as 5 Whys, Ishikawa (Fishbone Diagram), and Kepner-Tregoe to identify the underlying causes of recurring issues.

  • Present RCA findings and permanent corrective actions in the Problem Management Review Forum, driving continuous improvement across Technology Service Operations.

  • Strong RCA capability across multiple methodologies. The client conducts a written business case assessment as part of the screening process.

  • Comfortable coordinating cross-functional war rooms involving L1 Operations, DBAs, Application Teams, Senior Management, and MD-level stakeholders.

  • Collaborate with cross-functional teams to ensure the timely implementation of permanent fixes and preventive measures.

  • Train and mentor team members on Problem Management processes, emphasizing RCA techniques, documentation standards, and adherence to ITIL 4 protocols.

  • Partner with the ServiceNow Platform Team to enhance and optimize the Problem Management module, ensuring alignment with ITIL 4 best practices and organizational requirements.

  • Facilitate lessons-learned sessions following major incidents or problems to strengthen prevention, response, and recovery strategies.

  • Continuously review and refine Problem Management processes to improve efficiency, reduce recurrence, and support proactive service stability.

  • The client requires an individual who has progressed from a technical background into an ITSM-focused role.

Shift & On-Call Requirements

  • Core working hours are 8:30 AM to 5:30 PM SGT.

  • Beyond core hours, the role requires participation in a 24/7 on-call rotation, including weekends, as infrastructure changes and incidents may occur outside standard business hours.

Minimum Technical Literacy Expected

  • Understands what a database is and common database failure modes.

  • Understands networking fundamentals, including routing, IP addressing, and circuit dependencies.

  • Can actively participate in technical war room discussions without requiring technical concepts to be explained.

  • Can assess the potential blast radius across multiple application dependencies during a live incident.

  • Knows at least 5 of the 10 recognised RCA methodologies (e.g., 5 Whys, Ishikawa, Kepner-Tregoe, etc.) and can apply them effectively. This is a mandatory screening criterion.

Similar jobs