Performance & Reliability Engineer
You Are:
As a Performance & Reliability Engineer, you will focus on ensuring the stability, scalability, and performance of complex systems. To excel in this position, you should be comfortable leveraging tools and processes such as Python scripting for automation, infrastructure-as-code platforms, and advanced monitoring solutions.
The Work:
- Your day-to-day responsibilities will involve automating deployment processes, managing infrastructure as code, and implementing robust observability frameworks to monitor application health.
- You will be tasked with defining and tracking service level objectives and indicators, proactively identifying areas for improvement, and troubleshooting both application and infrastructure issues to minimize downtime.
- Your expertise will be crucial in driving reliability initiatives and optimizing system performance across diverse environments.
- Familiarity with performance engineering techniques and deployment management strategies be highly beneficial.
- Experience with SRE methodologies, including the creation and maintenance of SLOs and SLIs, will help you set measurable reliability targets and ensure consistent service delivery.
- Exposure to troubleshooting complex distributed systems and implementing observability best practices will further enhance your ability to maintain high system availability and resilience.
- Design, implement, and maintain automation solutions to improve reliability and efficiency of systems.
- Develop and manage Infrastructure as Code (IaC) for scalable and repeatable deployments.
- Collaborate with development and operations teams to define and track SLOs/SLIs
- Troubleshoot and resolve application and infrastructure issues to ensure high availability.
- Build and enhance observability frameworks for monitoring, alerting, and performance analysis.
- Drive performance engineering initiatives to optimize system throughput and latency.
- Manage deployment processes and ensure smooth releases with minimal downtime.
- Proactively identify and address reliability risks across the technology stack.
Here’s What You Need:
- 3+ years of experience in Site Reliability Engineering
- 1+ years of experience with Generative AI
- 2 + years of experience with SRE Observability
- 3+ years of experience with automation including scripting for automation, infrastructure-as-code platforms, and advanced monitoring solutions.
Bonus Points If You Have:
- Experience working in a federal or public sector environment.
- Experience with Amazon Web Services (AWS)
As required by local law, Accenture Federal Services provides reasonable ranges of compensation for hired roles based on labor costs in the states of California, Colorado, Hawaii, Illinois, Maine, Maryland, Massachusetts, Minnesota, New Jersey, New York, Vermont, Virginia, Washington, and the District of Columbia, and the city of Cleveland. The base pay range for this position in these locations is shown below. Compensation for roles at Accenture Federal Services varies depending on a wide array of factors, including but not limited to office location, role, skill set, and level of experience. Accenture Federal Services offers a wide variety of benefits. You can find more information on benefits here. We accept applications on an on-going basis and there is no fixed deadline to apply.