Senior DevOps Engineer
Blip is a leading tech company focused on software engineering solutions for sports entertainment.
We operate at scale. As part of Flutter Entertainment, we play an essential role in the Group's goal of becoming the global leader in online sports betting and iGaming, developing innovative products and platforms for over 14 million monthly customers worldwide.
We are serious about Tech. We are problem-solvers with big ambitions, keeping a people-first mindset at the core of our work. We prioritise flexibility as we strive to deliver the best technological products and tackle the greatest industry challenges.
Recognising that everyone brings their own strengths, backgrounds and new perspectives, we empower you to be yourself. That uniqueness shapes the culture of belonging we are so proud of.
The Role
As a Senior DevOps Engineer, you'll design, build, and mature the observability ecosystem that underpins our platform and services. Your focus will be delivering deep visibility into system behavior by combining system telemetry with user signals to provide a holistic view of performance, reliability, and experience. You'll also explore how AI and machine learning can enhance observability, from intelligent alerting and anomaly detection to accelerating root cause analysis.
This is a hands-on role. You'll partner closely with engineering and product teams to deliver scalable observability capabilities, serve as a subject matter expert in monitoring, alerting, and incident management, and equip teams with self-service insights and tooling. By connecting system behavior to real user impact and leveraging AI-assisted workflows to surface issues faster, you'll drive improvements in reliability, performance, and data-informed decision-making across the organization.
What You’ll Be Doing
Contribute in defining and driving the observability strategy and roadmap across multiple teams, aligning with business priorities and engineering goals.
Designing and improve scalable observability capabilities that provide actionable insights into system health, performance, and user experience.
Establishing and standardizing best practices for monitoring, alerting, incident management, and postmortems across the organization.
Driving operational excellence by evolving incident management, on-call practices, and post-incident learning, ensuring systemic improvements over local fixes.
Leading cross-team initiatives to improve end-to-end reliability, identifying systemic risks and driving their resolution.
Leveraging automation and AI-assisted workflows to accelerate root cause analysis and reduce operational toil at scale.
Partnering with engineering and product leadership to translate observability insights into strategic roadmap decisions.
Identifying trends across system and user signals to proactively detect, prevent, and mitigate large-scale issues.
Optimizing observability platforms for cost, scalability, and long-term sustainability.
Mentoring engineers and raising the reliability and observability maturity across the organization.
What You’ll Bring
Significant hands-on experience in observability engineering, SRE, platform engoneering, or related roles, with a track record of driving impact beyond individual teams.
Strong expertise in monitoring and observability, with significant hands-on experience in Datadog.
Experience defining and driving observability or reliability strategy across teams or domains.
Proficiency with Kubernetes, cloud infrastructure (AWS), and infrastructure-as-code tools (Terraform).
Proven ability to influence technical direction and decision-making across multiple teams and stakeholders.
Deep understanding of distributed systems principles (e.g. consistency, availability, partition tolerance) and their real-world trade offs.
Experience defining and implementing SLOs, SLIs, and alerting strategies, including user-centric and business-aligned metrics.
Strong software engineering fundamentals, with proficiency in at least one modern programming language (e.g. Go, Java, Python, or TypeScript), and the ability to design scalable systems, build tooling and automation, and operate effectively within large, complex code bases, including those leveraging AI-generated contributions.
Experience driving large-scale improvements through automation, reducing organizational toil, and eliminating classes of recurring issues.
Strong analytical skills, with the ability to translate technical signals into business and customer impact.
Excellent communication and stakeholder management skills, with the ability to influence both technical and non-technical audiences.
A mindset of ownership, with a focus on long-term impact, scalability, and continuous improvement.
A Sneak Peek Into Our Tech Stack
AWS, Kubernetes, Terraform, Helm, Ansible, Vault, Datadog and PagerDuty
This is what you should have. What do we have, you ask? Well...you can check our amazing perks & benefits right here !
So ... Are you in?
Equal opportunities
At Blip, we are committed to creating a diverse and inclusive workplace. We strongly encourage people from all backgrounds, ways of thinking, and working to apply.
We are committed to including everyone regardless of their race, disability, age, gender identity, sexual orientation, and religion.
Everyone brings different perspectives and experiences; you don’t have to meet all the requirements listed to apply for this role.
If you need any adjustments to apply for the position and to ensure this role aligns with your needs, please send an email to accommodations@blip.pt .
We will only respond to inquiries related to disabilities.