Member of Technical Staff, Infrastructure Engineer
Who we are
Odyssey is an AI lab pioneering general world models: causal, multimodal systems that learn to predict and interact with the world over long horizons. This foundational technology promises to revolutionize robotics, science, healthcare, education, gaming, defense, and beyond.
Odyssey’s founders previously pioneered the most complex application of physical AI: self-driving cars. They’ve now brought together a world-class research team from DeepMind, Tesla, Waymo, Meta, Apple, and Wayve, who have made significant contributions to language models (DeepMind Gemini), video models (DeepMind Veo), world models (Wayve GAIA), and autonomous systems (Tesla FSD).
Odyssey has raised significant venture capital from GV, Amazon, AMD, EQT, NVIDIA, Natural Capital, In-Q-Tel, Elad Gil, Jeff Dean, Guillermo Rauch, Garry Tan, Kyle Vogt, and researchers from OpenAI, DeepMind, MSL, Recursive, and Thinking Machines.
What we're looking for
We are looking for an engineer who thrives on building the engines that make groundbreaking research and products possible. You think in systems, love performance, and get energy from turning theoretical bottlenecks into beautifully efficient reality. You’re excited to design and support infrastructure not just for scale, but for speed, creativity, and discovery. You want to build the compute substrate that lets Odyssey’s world models imagine, act, and interact in real time.
What you’ll do
Develop and operate our low-latency model inference platform, ensuring high availability, scalability, and efficient resource utilization for Odyssey’s world models.
Engineer and scale our core data processing infrastructure (e.g., Flyte, Ray with k8s) to handle petabyte-scale datasets.
Design, build, and maintain our large-scale, GPU-based training clusters for deep learning, focusing on usability, high throughput and reliability.
Automate infrastructure provisioning, configuration, monitoring, and alerting using Infrastructure as Code (IaC) principles.
Drive performance tuning, cost optimization, and reliability improvements across the entire stack.
Collaborate closely with researchers and product developers to understand their requirements, optimize their workflows, and improve platform usability.
Who you are
Motivated by building for the frontier: you want to shape the compute and infrastructure foundation of a lab redefining how people create and interact with media.
Strong programming skills (e.g., Python, Go, or similar) and a solid understanding of software engineering best practices.
Deep, hands-on experience with containerization (e.g., Docker), container orchestration (Kubernetes) and Infrastructure as Code (Terraform).
Proven experience building and managing large-scale, distributed systems with GPU computational workloads (e.g., compute platforms, data pipelines, or high-availability services).
Experienced in designing infrastructure for ML workloads where performance, parallelism, and data movement are critical.
A collaborative mindset and excellent communication skills, with a passion for building developer-friendly platforms.