Senior Manager, Systems Engineering
Key Responsibilities
System Installation & Configuration – Software Administration:
- Mentors and supervises team to monitor and maintain operating systems to ensure optimal installation and performance.
- Oversees the administration of middleware products in environments.
- Ensures effectiveness in driving the deployment, maintenance, and operation of internal applications as needed.
- Monitors regular administration and mentors team to conduct performance trend analyses and manage server capacity to ensure service performance meets and exceeds standards.
- Oversees the application and knowledge-sharing of application monitoring tools to optimize and ensure efficiency.
System Installation & Configuration – Installation and Configuration:
- Mentors team to leverage deep knowledge to support installing and configuring servers, cloud infrastructure, and all software and environments.
- Oversees team efforts to optimize system configurations and backups independently to ensure the optimal performance and stability of the server infrastructure.
- Supervises and coordinates hardware maintenance, auditing, installation, and provisioning, ensuring all tasks are performed efficiently and effectively.
- Leads collaboration efforts with internal technical experts and third-party vendors to resolve integration challenges.
System Installation & Configuration – Identity & Access Management:
- Guides team to drive administration of access privileges in the identity and access management system, ensuring accurate and secure access to IT resources.
- Ensures proactive monitoring and analysis of user activity in the identity and access management system and provides insights on system activity.
- Ensures effective provisioning of access management systems.
Service Lifecycle Management – Batch Processing:
- Guides team to proactively monitor the batch process to ensure updates are applied and ensures any issues that arise are resolved.
- Oversees the use of advanced batch management techniques using different work schedulers to configure jobs and job streams, define dependencies, and report job performance.
- Ensures alignment of scheduling and budgets of batch monitoring services with Service Level Agreements (SLA).
Service Lifecycle Management – Security Maintenance:
- Updates and improves procedures to provide assurance that compute and storage devices are secure.
- Guides team to support as needed to maintain privileged accounts/secrets integrity of systems and compute and file system security for the compute and storage environment.
- Evaluates and provides expertise on high-level service and infrastructure dashboards, taking the lead in addressing identified anomalies.
- Directs and coordinates the team to implement monthly, quarterly, or hotfix patches to address security vulnerabilities or bugs.
Service Lifecycle Management – System & Security Improvements:
- Leads team in developing and implementing enhancements to improve the performance reliability, and security, of systems and environments.
- Guides and coordinates with Service teams to identify and address gaps in operational capabilities and support scalability and resiliency.
Incident Management & Support – Incident Management:
- Supervises team through the end-to-end incident management lifecycle to ensure systems are stable, secure, and performing accurately.
- Coordinates efforts to analyze and interpret incident-based data for team metrics and key performance indicators (KPIs) to identify patterns, root causes, and solutions in system and network incidents.
- Provides mentorship in leading incident review meetings to ensure oversight for operational performance and solution implementation.
- Oversees partnerships with third party vendors and cross-functional teams (e.g., Development, Cloud Engineering, Product Engineering, other IT teams) to ensure effective collaboration for implementation and/or resolution for high-severity incidents, risks, or migrations.
- Mentors team to investigate system issues and facilitate high-severity incident triage to drive incident resolution and prevention.
Incident Management & Support – Escalation Cases:
- Supervises team for escalated support cases by ensuring collaboration with internal technical teams and third party vendors to drive issue resolution for a wide range of production environment problems (e.g., immense growth, scaling, leveraging the cloud, extremely high performance, high availability requirements).
Incident Management & Support – Technical Support:
- Supervises team in maintaining the production environment by guiding the review of system error logs and ticket queues, and coordinating consultations with other groups involved in maintaining the environments.
- Aligns schedules to drive to ongoing technical support and service objectives.
- Oversees resolution of complex, critical customer system issues and ensures documentation of technical solutions.
Incident Management & Support – Backups and Disaster Recovery:
- Provides mentorship across teams to foster understanding systems to execute and support backup, restore, and disaster recovery processes.
- Reviews disaster recovery drill plans to ensure preparedness and compliance.
Communication & Documentation – Technical Communication:
- Oversees communication of complex technical information to both technical and nontechnical personnel including senior management.
- Leads and coordinates training programs to educate personnel.
- Guides team to serve as the technical liaison and provide domain-specific expertise to cross-organization projects, programs, and activities.
Communication & Documentation – Documentation & Reporting:
- Guides the creation and maintenance of documentation on ticket updates, code contributions, infrastructure, configurations, processes, and procedures (e.g., Disaster Recovery plans, Standard Operating Procedures, Corrective and Preventative Action Plans).
- Reviews weekly and monthly reports on system performance and incident progress to support operational and management outcomes, and advises on business impacts.
- Ensures accuracy of technical documentation in the area of standards and best practices for internal use.
Additional Responsibilities (as needed)
Cloud Infrastructure Support:
- Leads collaborations with DevOps and Site Reliability Engineer (SRE) teams to manage large-scale infrastructure.
- Mentors team to effectively implement continuous integration and continuous deployment (CI/CD) pipelines.
- Oversees patching and version upgrades to support cloud infrastructure.
Automation:
- Leads improvement efforts to reduce incidents and problems with automation and simplify server management.
- Reviews and provides feedback on reusable frameworks, standards, and automation to support Oracle Cloud Infrastructure.
- Oversees the management of Workload Automation tools including design support, administration, and optimization efforts.
- Coaches team in troubleshooting complex issues with automation tools, agents, and other connectivity to 3rd party applications.
- Oversees the configuration and management of cloud technologies.
Core Responsibilities
Planning & Execution:
- Manages multiple medium- to large-scale projects or initiatives across teams, ensuring timelines, deliverables, and budgets when applicable are monitored and met.
- Provides direction to teams on project work, setting priorities, and aligning with business needs.
- Guides teams on adjusting plans to accommodate resource or timeline changes.
Collaboration & Partnership:
- Drives cross-functional partnerships to align expectations and shared objectives across multiple teams.
- Coaches team members to develop strategic relationships with business leaders, stakeholders, and external partners to foster collaboration and long-term success.
- Promotes inclusivity by actively seeking and listening to diverse perspectives, ensuring others feel heard and respected.
Problem Solving:
- Provides direction to multiple teams on addressing complex operational and/or technical issues as well as providing guidance on analyzing complex data and/or information to identify solutions.
- Reviews and provides insights into unresolved or critical issues, helping the team to identify potential solutions.
Continuous Learning:
- Models engaging in continuous learning to deepen expertise and stay ahead of industry trends, integrating best practices into strategic planning.
- Leverages feedback to drive personal and team skill improvements.
- Identifies skill gaps across teams, and empowers team members to pursue learning and knowledge sharing opportunities that build their expertise in new areas and coaches them to apply learnings to advance the organization.
Continuous Improvement:
- Drives team to collaborate on, develop, and implement ideas to increase the efficiency and effectiveness of processes, protocols, and workflows within and across teams, providing oversight.
- Guides team to adopt new ideas for alternative approaches and methods and encourages feedback for continued improvement.
Performance and Development:
- Drives performance across teams by providing feedback and coaching in alignment with performance management processes, guidelines, and expectations.
- Discusses development goals with team members, shares opportunities to facilitate career development, and ensures individual goals are aligned with broader organizational goals.
- Develops and manages talent acquisition pipeline by leading candidate interviews, monitoring promotion eligibility, and/or orchestrating talent resources.
Career Level - M3