Senior Machine Learning Engineer

You will define and uphold the quality bar for agentic AI systems across the organization You will design evaluation frameworks and guide model selection to ensure systems meet standards for correctness safety latency and user satisfaction You will shape how agentic systems are built evaluated and improved across the company You will collaborate with Product Data Science and Engineering to align on objectives and deliverables and you will mentor engineers to raise the bar on reliability and production readiness

Responsibilities

Lead the design and evolution of agentic AI systems that power intelligent customer experiences.
Define the technical direction for evaluating autonomous agents including reasoning quality planning tool selection memory task completion safety latency and user experience.
Design and build scalable evaluation frameworks for agentic systems using automated evals benchmark datasets LLM as a Judge techniques and human feedback to continuously improve agent performance.
Drive model selection and optimization across frontier foundation models fine tuned models retrieval systems and tool using agents balancing quality latency cost and reliability.
Partner closely with Product Data Science and Engineering to establish launch criteria quality standards and measurable success metrics for production agentic systems.
Improve agent reliability by investigating production failures identifying root causes across reasoning planning retrieval and tool execution and driving architectural improvements.
Mentor engineers and influence technical direction across teams while helping establish best practices for building reliable production ready agentic AI systems.

Requirements

Significant experience building and deploying production AI systems powered by large language models autonomous agents or multi step reasoning workflows.
Deep understanding of modern agent architectures including tool calling planning memory retrieval augmented generation RAG orchestration and multi agent systems.
Experience designing evaluation frameworks for agentic AI including automated evals benchmark datasets LLM as a Judge methodologies human evaluation pipelines and continuous quality measurement.
Strong understanding of the tradeoffs between prompting fine tuning retrieval and agent orchestration and when to apply each approach.
Experience evaluating frontier foundation models across quality latency safety cost robustness and production readiness.
Proven ability to debug complex agent behaviors identify failure modes and improve reasoning reliability and overall system performance.
Strong software engineering skills with experience building scalable distributed systems and production ML infrastructure.
Demonstrated technical leadership through architecture design mentorship and influencing engineering direction across multiple teams.
Experience with agent frameworks AI observability platforms model evaluation tooling or regulated AI applications is a strong plus.

Benefits

Challenging high impact work to grow your career
Performance driven compensation with multipliers for outsized impact bonus programs equity ownership and 401(k) matching
Best in class benefits to fuel your work including 100% paid health insurance for employees with 90% coverage for dependents
Lifestyle wallet a highly flexible benefits spending account for wellness learning and more
Employer paid life disability insurance fertility benefits and mental health benefits
Time off to recharge including company holidays paid time off sick time parental leave and more
Exceptional office experience with catered meals events and comfortable workspaces
Base pay range bonuses equity and benefits included