Incident and Escalation Manager

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

The Incident and Escalation Manager (IEM) plays a critical role within DDN’s Global Services and Support (GSS) organization. This is a senior operational leader responsible for managing the most complex and business-critical customer incidents, executive escalations, and systemic problems across DDN’s global customer base.

This role operates as the central command authority during major incidents and critical customer situations, ensuring rapid cross-functional coordination across Support, Engineering, Product, Sales, and Executive leadership.

Beyond incident and escalation response, the IEM drives organizational improvements to incident, escalation, and problem management frameworks, influencing operational strategy, tooling, and service delivery processes across DDN’s GSS organization.

This role requires deep operational judgment, strong technical fluency, and the ability to influence senior stakeholders while restoring customer confidence during high-impact situations affecting mission-critical AI and HPC infrastructure.

Key Responsibilities

Major Incident Command

  • Lead command of high-severity incidents and service disruptions impacting mission-critical AI, HPC, and enterprise environments.
  • Serve as Incident Commander for the most complex and visible incidents, coordinating rapid response across Support, Engineering, and Field teams.
  • Drive structured incident response including:
  • Rapid triage and technical engagement
  • Cross-functional coordination
  • Real-time decision making
  • Service restoration
  • Provide clear and timely communications to executives, account teams, and customers during major incidents.

Executive Escalation Management

  • Own resolution of executive-level customer escalations and critical account situations where standard processes have stalled or failed.
  • Partner with Sales leadership, Customer Success, and Engineering to resolve complex technical or service challenges impacting strategic accounts.
  • Drive structured escalation management including:
  • Executive situation briefings
  • Customer recovery plans
  • Cross-organizational alignment
  • Restore customer confidence through clear ownership, transparency, and decisive leadership.

Operational Strategy & Program Leadership

  • Drive the evolution of DDN’s Incident, Escalation, and Problem Management programs across the Global Services organization.
  • Define and improve processes including:
  • Major Incident Management
  • Executive Escalation Management
  • Post-Incident & Escalation Reviews (PIER)
  • Systemic problem tracking and prevention
  • Partner with leadership to establish operational standards, governance models, and response frameworks.
  • Lead initiatives to improve MTTA, MTTR, escalation handling, and service reliability across products and services.

Post-Incident Analysis & Continuous Improvement

  • Lead formal Post-Incident and Escalation Reviews (PIER) for critical events.
  • Drive root cause analysis across technical and operational domains.
  • Identify systemic gaps and ensure corrective and preventive actions are implemented.
  • Analyze incident trends and escalation patterns to deliver data-driven insights and operational improvements.
  • Influence engineering and product teams to address systemic reliability and supportability issues.

Customer Advocacy

  • Serve as a trusted advisor and customer advocate during crisis situations.
  • Represent the voice of the customer internally to drive service improvements and product reliability.
  • Partner with executive leadership to ensure strategic customers receive consistent, high-quality incident management support.

Qualifications

Required

  • 12+ years of experience in Incident Management, Escalation Management, Problem Management, or Technical Operations in the high-tech or enterprise IT space.
  • Proven experience leading high-severity incidents and executive escalations in AI, HPC, or large-scale infrastructure environments.
  • Availability for on-call rotations, including off hours, weekend, and holiday support.
  • Ability to work across time zones and lead global teams in real-time incidents.
  • Comfortable in high-pressure, customer-facing situations with strong decision-making capabilities.
  • Strong technical background with the ability to grasp complex systems and collaborate with Engineering teams under pressure.
  • Deep knowledge of ITIL frameworks, particularly around Incident, Problem, Change, and Escalation Management.
  • Exceptional communication skills with the ability to manage both technical details and executive-level updates.
  • Analytical thinker with strong data interpretation and reporting skills.
  • Customer-delight mindset with a do whatever it takes attitude.
  • Experience influencing teams and leaders without direct authority.

Preferred

  • Experience supporting AI, HPC, or large-scale data infrastructure environments.
  • Familiarity with parallel file systems such as Lustre or distributed storage architectures.
  • Background in managing customer-facing issues in a 24x7 support or cloud services model.
  • Understanding of software development lifecycle and modern DevOps practices.
  • ITIL v3 or v4 Certification.
  • Experience with Salesforce, Jira, Microsoft Office, Slack, Confluence.

DDN

DDN has a very strong orientation towards these 4 characteristics and any successful employee will demonstrate these capabilities:

Self-Starter - Takes independent action to identify and solve problems. Seeks out relevant information needed to make decisions. Gets involved with new initiatives.

Success/Achievement Orientation - Delivers quality results consistently. Targets, achieves (or exceeds) measurable results. Sets challenging goals, focuses on critical priorities, and is accountable.

Problem Solving - Recognizes problems and responds with a systematic assessment that identifies and addresses cause of issue. Practical, realistic, and resourceful.

Innovative - Builds and improves key business processes that enhance the effectiveness of DDN. Generates new ideas, challenges the status quo, and solves problems creatively.

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote