AI Testing

BCE Global Tech's Global Quality Engineering (GQE) function is\nbuilding one of Canada's most ambitious AI quality programs \u2014 certifying every\nAI and agentic system deployed across Bell Canada before it reaches production.\nAs a QA AI Specialist, you sit at the intersection of artificial intelligence,\nsoftware engineering, and quality assurance: a hybrid role that does not yet\nhave a textbook, because the discipline is being written in real time.
<\/p>

<\/p>

You will do two things simultaneously. First, you will bring\nAI into GQE's existing testing practice \u2014 embedding AI\-powered capabilities\ninto the test automation tooling, pipelines, and frameworks that 250 QA\nengineers already use every day. Second, you will build and operate the\nevaluation frameworks that test the AI systems being created by other Bell\nengineering teams \u2014 agents, orchestration pipelines, RAG applications,\nSalesforce AgentForce workflows, and ServiceNow Now Assist integrations.
<\/p>

<\/div><\/span>

Requirements<\/h3>
Key Responsibilities:<\/span><\/span><\/span><\/span>
<\/span><\/span><\/div>
1. AI\-Enhanced QA Tooling<\/span>
<\/span><\/span><\/h3>
Modernize GQE\u2019s QA stack by embedding AI to improve speed, coverage, and intelligence:<\/span>
<\/span><\/span><\/p>
Integrate AI\-driven test generation into Selenium, Playwright, and Postman frameworks<\/span>
<\/span><\/span><\/li>
Use predictive models to prioritize tests based on code changes and defect history<\/span>
<\/span><\/span><\/li>
Enable self\-healing automation for UI/API changes<\/span>
<\/span><\/span><\/li>
Automate defect triage and root\-cause analysis using failure clustering<\/span>
<\/span><\/span><\/li>
Support natural\-language test authoring (English/French) for non\-technical QA<\/span>
<\/span><\/span><\/li>
Continuously pilot emerging AI testing tools via a technology radar<\/span>
<\/span><\/span><\/li><\/ul>
2. AI Evaluation & Quality Pipelines<\/span>
<\/span><\/span><\/h3>
Build scalable evaluation systems tailored for AI behavior, not rule\-based logic:<\/span>
<\/span><\/span><\/p>
Implement LLM\-as\-Judge pipelines on Vertex AI (Gemini) across key quality dimensions.<\/span>
<\/span><\/span><\/li>
Generate large, diverse, and adversarial test corpora from seed intents<\/span>
<\/span><\/span><\/li>
Evaluate RAG systems using metrics like faithfulness, relevance, and recall (RAGAS)<\/span>
<\/span><\/span><\/li>
Validate multi\-step agent workflows, tool usage, and escalation behavior<\/span>
<\/span><\/span><\/li>
Embed AI evaluations into CI/CD as mandatory release gates.<\/span>
<\/span><\/span><\/li><\/ul>
3. AI Safety & Adversarial Testing<\/span>
<\/span><\/span><\/h3>
Operate a dedicated AI red\-teaming capability to uncover AI\-specific risks:<\/span>
<\/span><\/span><\/p>
Execute prompt injection and poisoned\-context attacks on RAG systems.<\/span>
<\/span><\/span><\/li>
Run automated jailbreak and constraint\-bypass probes (e.g., Garak)<\/span>
<\/span><\/span><\/li>
Systematically test hallucination, numerical accuracy, and domain knowledge<\/span>
<\/span><\/span><\/li>
Assess toxicity, bias, and fairness across English and French interactions<\/span>
<\/span><\/span><\/li>
Stress\-test agentic systems for runaway actions and scope violations<\/span>
<\/span><\/span><\/li><\/ul>
4. Continuous Quality Evolution<\/span>
<\/span><\/span><\/h3>
Ensure the quality framework evolves as models and systems change:<\/span>
<\/span><\/span><\/p>
Monitor production AI outputs for quality drift and trigger re\-certification<\/span>
<\/span><\/span><\/li>
Feed real production failures back into the test corpus<\/span>
<\/span><\/span><\/li>
Track model/version changes and generate quality delta reports.<\/span>
<\/span><\/span><\/li>
Maintain a living benchmark of Bell\-specific AI quality standards<\/span>
<\/span><\/span><\/li>
Continuously adopt new evaluation research and industry best practices<\/span>
<\/span><\/span><\/li>
Partner early with AI/ML teams to embed quality by design<\/span>
<\/span><\/span><\/li><\/ul>
5. AI Quality Certification Operations<\/span>
<\/span><\/span><\/h3>
Lead technical execution of the AIQC program:<\/span>
<\/span><\/span><\/p>
Own Tier 2 & 3 certification testing from corpus design to red\-teaming<\/span>
<\/span><\/span><\/li>
Calibrate LLM\-as\-Judge rubrics using human\-labeled golden datasets<\/span>
<\/span><\/span><\/li>
Produce clear AI Quality Certificates with scores, risks, and conditions<\/span>
<\/span><\/span><\/li>
Advise teams on AI testability, prompts, and evaluation instrumentation<\/span>
<\/span><\/span><\/li>
Contribute to AIQC playbooks, documentation, and knowledge sharing<\/span>
<\/span><\/span><\/li><\/ul>
\u200b<\/span>
<\/span><\/span><\/div><\/div>
Required<\/span><\/span><\/span><\/span><\/b>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>5+ years of software quality engineering\nexperience, with at least 2 years working directly with AI/ML systems, LLMs, or\nAI\-powered applications<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Hands\-on experience building or evaluating\nLLM\-based applications \u2014 including prompt engineering, RAG pipelines, or\nagentic workflows<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Proficiency in Python: test framework\ndevelopment, API integration, data processing, and evaluation scripting<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Experience with modern test automation\nframeworks (Playwright, Selenium, Pytest, RestAssured, Postman/Newman) and\nCI/CD platforms (GitHub Actions, Google Cloud Build, Jenkins)<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Working knowledge of at least one major AI/ML\nplatform \u2014 Google Vertex AI, Azure OpenAI, or AWS Bedrock \u2014 with hands\-on API\nusage<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Strong conceptual understanding of how LLMs\nwork: tokenization, temperature and sampling, context windows, grounding,\nhallucination mechanics, and fine\-tuning<\/span>
<\/span><\/span><\/p>
▸<\/span><\/span> <\/span><\/span><\/span><\/span><\/span><\/span>Demonstrated ability to design test strategies\nfor non\-deterministic systems \u2014 moving beyond assertion\-based testing to\nprobabilistic, rubric\-based evaluation<\/span>
<\/span><\/span><\/p>
\u200b<\/span><\/span>
<\/span><\/span><\/span><\/div><\/div>

<\/div><\/span>
Benefits<\/h3>
What We Offer:<\/span><\/b>
<\/p>
Competitive salaries and comprehensive health benefits<\/span>
<\/li>
Flexible work hours and remote work options<\/span>
<\/li>
Professional development and training opportunities<\/span>
<\/li>
A supportive and inclusive work environment<\/span>
<\/li>
Access to cutting\-edge technology and tools.<\/span>
<\/li><\/ul>

<\/div><\/span>