Senior AI Testing Engineer (Generative AI)

Senior AI Testing Engineer (Generative AI)<\/b>
<\/p>

Location India Remote / Hybrid / In\-office \u2014 [specify your actual working\nmodel here]
<\/p>

Experience 5\u20138 years total experience in software testing, QA engineering,\nor SDET roles, with at least 2\u20133 years of meaningful, hands\-on exposure to\nGenerative AI systems, LLM applications, or AI quality engineering.
<\/p>

Role Overview
<\/p>

We are looking for a Senior AI Testing Engineer to own quality across our\nGenerative AI products and platform.
<\/p>

This role is fundamentally about engineering quality into AI systems \u2014 not\nrunning test scripts. You'll design evaluation frameworks, build automated\ntesting pipelines, and define what "good" looks like for LLM outputs,\nRAG systems, AI agents, and voice AI applications. You'll work directly with AI\nengineers and product teams to make sure our systems are reliable, safe, and\nmeasurably improving over time.
<\/p>

If you understand how LLMs fail, know how to catch hallucinations before\nusers do, and want to build the quality infrastructure that underpins\nproduction AI at scale \u2014 this is the role.
<\/p>

Key Responsibilities
<\/p>

Evaluation Strategy & Frameworks
<\/p>

· <\/span><\/span><\/span>Design\nand own comprehensive testing strategies for Generative AI products \u2014 including\nLLM applications, RAG pipelines, AI agents, voice AI systems, and workflow\nautomation
<\/p>

· <\/span><\/span><\/span>Define\nevaluation methodologies covering functional testing, response quality,\nhallucination detection, safety and guardrail testing, prompt injection, bias\nand toxicity, retrieval quality, latency benchmarking, and agent workflow\nvalidation
<\/p>

· <\/span><\/span><\/span>Build\nreusable AI testing frameworks and automation pipelines for continuous\nevaluation
<\/p>

· <\/span><\/span><\/span>Create\ndatasets, benchmark suites, and golden test sets for GenAI evaluation
<\/p>

Automated Evaluation
<\/p>

· <\/span><\/span><\/span>Develop\nautomated evaluation pipelines using LLM\-as\-a\-Judge and hybrid evaluation\nmethods
<\/p>

· <\/span><\/span><\/span>Implement\nCI/CD\-integrated AI evaluation pipelines
<\/p>

· <\/span><\/span><\/span>Drive\nobservability and monitoring strategies for production AI systems
<\/p>

Quality Standards & Collaboration
<\/p>

· <\/span><\/span><\/span>Define\nmeasurable quality KPIs for AI systems
<\/p>

· <\/span><\/span><\/span>Establish\ntesting standards, best practices, and governance processes for GenAI\napplications
<\/p>

· <\/span><\/span><\/span>Work\nclosely with AI engineers, product, and platform teams to embed quality\nthroughout the development lifecycle
<\/p>

Required Skills & Experience
<\/p>

Testing & Engineering Experience
<\/p>

· <\/span><\/span><\/span>5\u20138\nyears in software testing, QA engineering, SDET, or test automation
<\/p>

· <\/span><\/span><\/span>2\u20133\nyears of hands\-on experience testing or evaluating production\-grade Generative\nAI or LLM\-based systems
<\/p>

· <\/span><\/span><\/span>Strong\ntest automation skills in Python
<\/p>

· <\/span><\/span><\/span>Experience\ndesigning scalable automated testing frameworks
<\/p>

· <\/span><\/span><\/span>Familiarity\nwith API testing, integration testing, and performance testing
<\/p>

Generative AI Knowledge
<\/p>

· <\/span><\/span><\/span>Solid\nunderstanding of how LLM systems work \u2014 and how they fail
<\/p>

· <\/span><\/span><\/span>Experience\nwith RAG architectures, prompt engineering, AI agents, embedding models, and\nvector databases
<\/p>

· <\/span><\/span><\/span>Understanding\nof LLM evaluation methodologies and AI system failure modes
<\/p>

GenAI Testing Frameworks
<\/p>

· <\/span><\/span><\/span>Hands\-on\nexperience with at least one or more GenAI evaluation frameworks, such as:\nDeepEval, Ragas, LangSmith, Promptfoo, TruLens, OpenAI Evals, or LangChain\nevaluation tools
<\/p>

Quality Engineering
<\/p>

· <\/span><\/span><\/span>Expertise\nin test strategy, test planning, test automation architecture, defect lifecycle\nmanagement, and quality metrics
<\/p>

· <\/span><\/span><\/span>Ability\nto define and track measurable quality KPIs for AI systems
<\/p>

Preferred Qualifications
<\/p>

· <\/span><\/span><\/span>Experience\nwith cloud platforms (AWS, Azure, or GCP)
<\/p>

· <\/span><\/span><\/span>Familiarity\nwith MLOps / LLMOps workflows
<\/p>

· <\/span><\/span><\/span>Experience\nwith CI/CD pipelines and DevOps practices
<\/p>

· <\/span><\/span><\/span>Exposure\nto monitoring and observability tooling for AI systems
<\/p>

· <\/span><\/span><\/span>Understanding\nof security and compliance for GenAI products
<\/p>

· <\/span><\/span><\/span>Experience\nwith conversational AI or voice AI systems
<\/p>

<\/p>

<\/div><\/span>