Research Scientist (AI Behaviours)

TLDR: We're looking for a research scientist to study how LLM agents fail in the wild, who can elicit deception, misalignment, and unsafe behaviour in concrete experiments, and build out the understanding of how agents break or misbehave in realistic and user-related scenarios.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.

We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others
We process over 100M+ API calls every month
We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.

About the team

White Circle's fundamental research team works on the science of how AI systems fail: where agents break, why misalignment and unsafe behaviours emerge, and how to catch them before they reach the real world. We build the evals, benchmarks, environments, and tooling that empirically study the most pressing AI safety concerns — some of which become the guardrails shipped in our products, and some of which become public writeups.

You will:

Own research projects end to end — from an unclear concern ("how do we even define sloppy research outputs?") to a falsifiable experiment, clean baselines, and a result you can defend.
Develop automated audit agents that discover and characterise suspect model behaviour at scale.
Study how misalignment and bias actually show up when real users interact with agents, and turn what you find into evals our products can ship.
Pressure-test frontier agents in realistic, high-stakes scenarios to find where they break before our customers do.
Run white-box and block-box investigations to understand how AI models fail.
Publish what you learn as public blog posts and conference papers, and feed the rest back into our internal guardrails.

You’ll fit right in if you:

A track record of empirical research in agent behaviour, model evaluation, alignment, or a closely adjacent area.
Strong ML engineering. You can independently build a research MVP involving fine-tuning, agent inference, and evals, without waiting on a platform team.
Evidenced skills in experimental design under real conditions: isolating agent failure modes, calibrating judges and baselines, and distinguishing genuine signal from artifact.
You can take a vague behavioural question and define the experiment that answers it, when there's no playbook — then run it fast and iterate.
An AI power-user — fluent with frontier models and coding agents in your daily work.

A big plus:

Published research at A* venues (NeurIPS / ICML / ICLR / ACL and similar).
Interpretability depth — familiarity with modern interp tooling and concepts (NLAs, SAEs, persona vectors, etc.) and the ability to run whitebox investigations on our internal and open-source models.
An MSc or PhD in machine learning, computer science, cognitive science, computational neuroscience, physics, or a related quantitative field.
AI safety fellowship (MATS, ASTRA, Anthropic Fellows, etc.), or a comparable self-directed research record.

Why White Circle

Paid time off in line with your local regulations, no matter where you work from.
Work from Paris (hybrid) with a relocation package available, or work from London (note: we are currently unable to provide relocation support or medical insurance for London-based roles).
Comprehensive medical insurance for our France-based team.
All the hardware, tools, and services you need.
Covered subscriptions for AI agents and IDEs.
Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez.

How we hire

Introductory call with HR (25 min)
Take-home test task
Technical interview with Head of Fundamental Research (60 min)
Final conversation with our CEO (45 min)

Please submit your application in English.