Senior Platform Engineer
At Omnia Training, we’ve brought together some of the UK’s most innovative defence training organisations under one powerful mission: to transform the British Army’s training system and create the best-trained Army in the world.
Omnia Training is redefining the British Army’s collective training. To do that, we are looking for the best and brightest minds from across the UK. Omnia Training is at the heart of the UK’s bold Land Industrial Strategy.
This is more than a job — it’s a mission. You will be part of a high-impact, collaborative environment, where every person in our team plays a critical role in delivering Omnia Training’s vision; designing, delivering, and transforming collective training.
Please note that this role will require onsite working in Warminster. Due to the nature of this role, Skyral can only consider applications from candidates who live in the UK and are eligible for SC clearance.
What You’ll Be Responsible For:
Deploy and configure systems on MODCloud / D2S / OpenShift
Manage Kubernetes environments, networking and connectivity, as well as security-aligned configurations.
Build and maintain CI/CD pipelines, GitOps workflows, and environment configurations to ensure repeatable and reliable deployments.
Support running systems by monitoring health, diagnosing platform and infrastructure issues and resolving deployment failures.
Work closely with integration engineers, customers and modelling engineers.
Support and maintain existing simulation and training systems, as well as existing deployment and virtualisation tools.
Apply SRE practices to improve system reliability, including observability (metrics, logs, tracing), incident response, and root cause analysis.
What We Are Looking For:
This is not a pure cloud or greenfield platform role. You will be working across cloud-native services, legacy systems, and integrated simulation environments, ensuring they operate reliably as a single platform.
Strong systems and infrastructure mindset
Calm under pressure during outages or failures
Pragmatic and delivery-focused, with a bias toward keeping systems running.
Strong collaborator across engineering disciplines
Adopts an SRE mindset, focusing on reliability, observability, and continuous improvement of running systems.
Key Technical Proficiencies:
Expert working knowledge of Kubernetes, Helm, Teraform, Ansible, and Docker.
Understanding of Distributed Systems in production.
Experience working within constrained or regulated environments (e.g. MODCloud, D2S, OpenShift) and adapting to their tooling and limitations.
Experience building and operating CI/CD pipelines with automated deployment workflows.
Familiarity with GitOps approaches and tools such as ArgoCD.
Strong understanding of Network fundamentals, Zero Trust solutions, service to service communications and distributed system connectivity.
Ability to diagnose issues across infrastructure, networking, and application layers.
Experience supporting or integrating legacy and non-cloud-native systems alongside modern infrastructure.
Experience in developing with Go or Python as well as shell scripting.
Experience applying Site Reliability Engineering (SRE) practices such as monitoring, alerting, incident response, and service reliability improvement.