Infrastructure Operational Resiliency Planning and Testing Lead

Job Description:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.

Being a Great Place to Work and providing a culture of caring is core to how we drive Responsible Growth. We are intentional about fostering an inclusive workplace where every teammate has the opportunity to succeed, build a career and contribute to our shared success. This includes attracting and developing exceptional talent, recognizing and rewarding performance, and supporting our teammates’ physical, emotional, and financial wellness through affordable, competitive and flexible benefits.

We value the unique perspectives individuals bring from all backgrounds and career paths - whether shaped by military service, community college education, or a wide range of work and life experiences. These journeys foster resilience, leadership and innovation, strengthening our workforce and positively impact the communities we serve.

Bank of America is committed to an in-office culture that supports collaboration, engagement, and career development. Our approach includes clear in-office expectations, while providing an appropriate level of flexibility based on role-specific responsibilities and business needs.

At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!

Job Description Summary

This role sits at the intersection of infrastructure resiliency, recovery planning, and enterprise execution. It is designed for a strong technical lead who can coordinate large-scale deliverables across engineering, operations, risk, and control teams while keeping complex work moving toward clear outcomes. The position focuses on Enterprise Critical Infrastructure planning, restoral testing, documentation, and governance, with relevance to public cloud, hybrid infrastructure, storage, backup, disaster recovery, observability, and broader infrastructure operational resiliency. Success in this role requires both delivery discipline and enough technical depth to engage credibly with subject matter experts, challenge assumptions, and turn recovery requirements into practical, audit-ready execution.

Position Summary

The Planning and Testing Lead will drive critical resiliency work across the firm’s infrastructure landscape, partnering with Enterprise Critical Infrastructure owners, application teams, infrastructure SMEs, and control partners to develop plans, coordinate testing, and improve recovery readiness. This is a technical leadership role for someone who understands how modern infrastructure is built and restored across public cloud, hybrid environments, storage, backup, disaster recovery, observability, and SRE-aligned operating models.

The person in this role will lead development and maintenance of ECI Restoral Plans, Restoral Test Plans, and execution documentation, ensuring the work is complete, traceable, and aligned with enterprise standards and control expectations. The ideal candidate is likely to come from infrastructure operations, disaster recovery, platform engineering, or resiliency-focused technical leadership roles and should be comfortable organizing SMEs, navigating cross-team dependencies, and translating technical recovery complexity into clear plans, evidence, and sustainable operating routines.

Key Responsibilities

  • Lead kickoff meetings and recurring working sessions with ECI Owners, infrastructure SMEs, application teams, control partners, and technology leaders to plan, track, and execute ECI planning and testing deliverables.
  • Coordinate development and ongoing maintenance of ECI Restoral Plans, Restoral Test Plans, Restoral Test Execution documentation, test evidence, observations, after-action reporting, and remediation tracking.
  • Provide technical leadership across infrastructure SMEs supporting public cloud, hybrid hosting, storage, backup, and disaster recovery/restoral domains.
  • Drive execution of documentation and testing milestones for assigned ECIs, escalating risks, dependencies, and delivery concerns as needed to meet required timelines.
  • Facilitate inline quality assurance and forum-based quality control reviews; incorporate feedback through checklists, review routines, and feedback trackers.
  • Coordinate review and approval workflows with ECI Owners, senior leaders, infrastructure SMEs, and control partners for restoral plans, test plans, test results, improvement options, and supporting evidence.
  • Support mapping of Prioritized Critical Service dependencies to ECIs and maintain dependency mapping outputs through recurring review routines.
  • Analyze infrastructure recovery sequencing and timing information, identify dependency conflicts or circular dependencies, and partner with stakeholders to document workarounds, recovery order updates, and improvement recommendations.
  • Coordinate ECI restoral testing activities, including representative sample testing, tabletop testing, evidence collection, observations, after-action reporting, and remediation tracking.
  • Support Maximum Tolerable Downtime activities by applying defined methodology inputs, documenting results, and helping identify ECIs requiring risk assessment or improvement options.
  • Partner with ECI Owners and leadership to develop, quality review, socialize, and decision restoral improvement options and recommendations for ECIs that exceed assigned recovery expectations.
  • Ensure final documentation, approvals, testing evidence, and remediation artifacts are maintained in appropriate repositories and aligned to document management, governance, and audit-ready traceability expectations.
  • Track success metrics and provide status updates to stakeholders and leadership pertaining to target outcomes, delivery, performance, risks, issues, and schedule.
  • Collaborate with sponsors and stakeholders to ensure execution is aligned with deliverable requirements, enterprise change expectations, and resiliency governance objectives.

Required Qualifications

  • 7+ years of experience in technology, infrastructure operations, infrastructure resiliency, disaster recovery, restoration planning, recovery testing, technology risk, or a closely related technical execution role.
  • Experience leading complex, cross-functional technical deliverables across multiple stakeholder groups, including infrastructure SMEs, application teams, control partners, and technology leaders.
  • Working knowledge of public cloud and hybrid infrastructure environments, with emphasis on AWS and/or Azure, hybrid compute, storage, backup, and disaster recovery/restoral capabilities.
  • Experience with technical infrastructure documentation, restoral planning, recovery testing, operational resiliency processes, or infrastructure risk assessments.
  • Ability to provide technical leadership over SMEs, challenge assumptions, organize technical inputs, and translate infrastructure recovery requirements into clear planning, testing, and governance deliverables.
  • Strong understanding of infrastructure dependencies, recovery sequencing, test evidence, observations, after-action reporting, and remediation tracking.
  • Experience facilitating structured working sessions, review forums, and approval processes with technical SMEs, control partners, and senior leaders.
  • Demonstrated ability to manage risks, issues, milestones, and deliverables in a controlled technology environment.
  • Strong analytical, communication, and partnership skills with the ability to work across infrastructure, application, risk, and compliance stakeholders.
  • Ability to work under pressure and manage competing requirements while maintaining quality, control discipline, and delivery focus.

Desired Qualifications

  • Experience with Enterprise Critical Infrastructure planning, restoral testing, or infrastructure operational resiliency governance.
  • Experience with public cloud infrastructure, including AWS and/or Azure, and how cloud-hosted services are recovered, tested, monitored, and governed in a hybrid enterprise environment.
  • Experience with storage, backup, disaster recovery, data protection, infrastructure recovery testing, or recovery evidence validation.
  • Experience with observability, monitoring, SRE practices, service health indicators, or operational readiness measures used to assess infrastructure recovery or restoral health.
  • Experience with risk assessments, control requirements, audit-facing deliverables, and evidence-based governance processes.
  • Experience with dependency mapping, sequencing/timing analysis, technical workarounds, and infrastructure recovery order validation.
  • Experience supporting documentation that requires formal approvals, version control, traceability, and audit-ready evidence.
  • Strong written communication skills with the ability to produce clear, structured, technical documentation for senior technology, risk, and control audiences.
  • Experience with automation, scripting, or infrastructure-as-code approaches such as Terraform, Ansible, PowerShell, or Python to standardize, validate, or scale operational processes.

Enterprise Job Description:
This job is responsible for planning and coordinating the execution of large program deliverables which requires engagement across multiple organizations. Key responsibilities include communicating target outcomes, coordinating delivery, resource planning, providing visibility of program health, and managing program risks, compliance and financials. Job expectations include ensuring delivery meets the client’s expectations in terms of the target outcomes, timeline, and cost and facilitating sync points between business and technology leaders and Risk and Compliance partners.

Responsibilities:

  • Leads and coordinates routines to support delivery of large programs, such as kick-offs, status reviews, stakeholder meetings, change controls, and tollgates
  • Broadens relationships with business and technology leaders across multiple organizations, as well as Compliance and Risk
  • Establishes target outcomes in partnership with stakeholders and leaders
  • Tracks success metrics and provides status updates to stakeholders and leadership pertaining to the target outcomes, delivery, performance, risks, issues, and schedule
  • Collaborates with sponsors and stakeholders to ensure that execution is aligned with deliverable requirements
  • Manages program financials and supports resource planning
  • Ensures adherence with Enterprise Change Management standards

Skills:

  • Collaboration
  • Project Management
  • Result Orientation
  • Solution Delivery Process
  • Stakeholder Management
  • Analytical Thinking
  • Business Acumen
  • Influence
  • Risk Management
  • Solution Design
  • Technical Strategy Development
  • Infrastructure Operations
  • Technical Strategy Development
  • Infrastructure Operations
  • Disaster Recovery
  • Operational Resiliency
  • Cloud Infrastructure
  • Storage and Backup
  • Observability
  • SRE Practices

Shift:

1st shift (United States of America)

Hours Per Week:

40

Similar jobs