MLOps Engineer

TLDR: We're looking for an MLOps Engineer to sit at the boundary between Research and Production. You'll own the infrastructure that takes a trained model and makes it production-safe: rollout pipelines, quality and latency gates, canary deployments, and the dashboards that decide whether a release ships or rolls back.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.

  • We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others

  • We process over 100M+ API calls every month

  • We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.

You will:

  • Integrate new text and multimodal models into our serving paths and verify they behave correctly under production-like traffic.

  • Build and maintain rollout pipelines for frequent model releases.

  • Create smoke, quality, and performance gates for model promotion.

  • Operate local and cluster GPU deployments on Kubernetes.

  • Build dashboards for latency, throughput, queue depth, GPU usage, fallback rate, and quality drift.

  • Run A/B and canary rollouts for model, prompt, routing, and serving config changes.

  • Debug production issues across model config, tokenizer, serving API, router, queue, Kubernetes, GPU runtime, and CI jobs.

  • Optimize serving cost and reliability across mixed GPU capacity.

Who you are

  • Experience with an inference serving engine such as SGLang, vLLM, Dynamo, or TensorRT-LLM, and a working understanding of the request lifecycle through gateway, router, frontend, worker, queue, and model engine.

  • Solid Kubernetes GPU experience: NVIDIA device plugin, GPU scheduling, resource requests/limits, node affinity, taints, tolerations, and node pools.

  • Understanding of multi-node communication libraries and kernels, CUDA runtime, and container runtime compatibility, and the ability to debug across those layers.

  • Ability to design and implement CI/CD for model serving: image and config versioning, smoke tests, quality regression tests against benchmarks, latency/throughput gates, canary rollout, and rollback.

  • Strong observability instincts — you can define the dashboards and alerts that decide whether a model gets promoted or rolled back (p50/p95/p99 latency, TTFT, TPOT, queue depth, GPU utilization/memory, error/timeout/OOM rates, fallback rate, route distribution, canary vs. baseline, cost per successful request).

  • Production debugging across the whole stack from Rust to k8s configs.

  • Clear communication of engineering tradeoffs.

Nice-to-haves

  • Rust backend experience.

  • NCCL, UCX, NVSHMEM, RDMA, InfiniBand, RoCE, or EFA.

  • ClickStack / Datadog.

  • Terraform for GPU infrastructure.

  • DCGM exporter, Prometheus, OpenTelemetry.

  • Experience with a high model rollout cadence (2–3 releases per week).

Why White Circle

  • Paid time off in line with your local regulations, no matter where you work from

  • Work from Paris (hybrid) with a relocation package available, or work from London (note: we are unable to provide relocation support for London-based roles)

  • Comprehensive medical insurance for our France-based team (please note that we are in the process of setting up our UK office and therefore cannot offer medical insurance for London-based roles yet)

  • All the hardware, tools, and services you need

  • Covered subscriptions for AI agents and IDEs

  • Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez

How we hire

  1. Introductory call with HR (25 min)

  2. Take-home test task

  3. Technical interview with Head of Applied Research (60 min)

  4. Final conversation with our CEO (45 min)