Assessment Designer & Learning Analyst
About Mercor
Mercor's mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development. Our vast talent network trains frontier AI models in the same way teachers teach students: by sharing knowledge, experience, and context that can't be captured in code alone. Today, more than 30,000 experts in our network collectively earn over $3 million a day.
Mercor is creating a new category of work where expertise powers AI advancement. Achieving this requires an ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the systems that are redefining society. Mercor is a profitable Series C company valued at $10 billion. We work in-person five days a week in our San Francisco, NYC, or London offices.
We're looking for an Assessment Designer & Learning Analyst who can build rigorous measurement systems and use data to understand what actually drives expert performance.
This is not an instructional design role. You won't be building courses or writing training materials. You will be designing the assessments and certification frameworks that measure whether our talent experts and internal teams are genuinely skilled — and then doing the analytical work to understand what those assessments reveal, what predicts expert effectiveness, and how our programs should evolve based on evidence. You will be working closely with the Learning & Development team to understand the relationship between materials and assessments, and making recommendations to the team based on your analysis.
If you've come from an ed school background, taught in a high-accountability environment, and completed quantitative projects or theses, and are energized by the measurement and data side of education — this role is for you.
What You'll Do
Assessment Design
Design and continuously improve assessments and certification frameworks that validly and reliably measure expert readiness for specific project types
Build assessments and measurements of skills that are consistent, interpretable, and actually predictive of on-the-job performance — not just checklists.
Develop item banks, scoring guides, and inter-rater reliability protocols for evaluating complex human judgment tasks.
Run validity studies: do our assessments measure what we think they measure?
Learning Analytics & Impact Analysis
Analyze the relationship between instructional materials, assessments, and expert performance — identifying what's working and what isn't and make recommendations accordingly.
Analyze assessment data at the item level — difficulty, discrimination, reliability — and iterate based on findings.
Investigate the relationship between assessment performance and real-world expert effectiveness: who performs well on our assessments, and does that predict quality outcomes?
Build reports and dashboards that surface actionable insights to program and operations teams.
Design and analyze quasi-experimental, quantitative and qualitative (mixed methods) studies to understand what interventions actually move the needle on expert quality.
Ongoing Measurement & Improvement
Track certification and assessment outcomes over time and flag when programs need revision
Partner with learning designers and project teams to translate your findings into program improvements
Bring a continuous improvement mindset — ship, measure, learn, iterate
What We're Looking For
Education
Master's degree in Learning Sciences, Educational Psychology, Educational Measurement, Psychometrics, or a closely related field — required
Coursework in quantitative research methods, psychometrics, and educational statistics — required
Familiarity with classical test theory (CTT) and ideally item response theory (IRT)
Quantitative Skills — Required This role requires genuine comfort with numbers. We're looking for someone who can do the following and show their work:
Item-level analysis: difficulty index, discrimination index, inter-rater reliability (Cohen's kappa, Krippendorff's alpha, ICC)
Assess and report on assessment validity and reliability — and know what to do when results look off
Analyze relationships between variables: correlation, regression, and basic predictive modeling
Work fluently in Excel or Google Sheets for data cleaning and summaries
Use Python, STATA or R for deeper analysis (basic proficiency expected; we'll grow this with you)
Translate quantitative findings into plain-language recommendations for non-technical stakeholders
We will ask you to demonstrate this. Finalists will complete a short take-home exercise involving a real assessment dataset — you'll analyze item performance, identify problems, and recommend improvements.
Experience
1–2 years of experience in assessment design, educational research, learning analytics, or a related role
Teaching or similar experience in a high-accountability environment (Teach For America, urban education, or similar) is a strong plus; people who've lived with assessment data in the classroom understand it differently
Experience designing assessments with a clear theory of what you're measuring — not just writing questions
A portfolio or work samples showing both assessment design and quantitative analysis — we want to see how you think
Skills
Deep understanding of measurement: validity, reliability, and what makes an assessment actually good
Ability to move between data and meaning — you can run the analysis and explain what it means for the program
Strong writing — you can communicate complex findings clearly to non-technical audiences
Systems thinker — you see how individual assessments connect to broader operational quality and expert performance
Comfortable with ambiguity and rapid iteration — this is a fast-moving environment and you'll need to ship and improve continuously
Nice to Have
Experience with item response theory (IRT) or latent variable modeling
Familiarity with data annotation, labeling, or AI evaluation workflows
Experience in tech, AI/ML, or data operations environments
Background in competency-based or mastery learning frameworks
Experience building and analyzing assessments
Why This Role
The quality of AI systems depends on the quality of the humans who train them. Your job is to measure that quality rigorously, understand what drives it, and help Mercor build smarter systems for developing expert performance. It's a rare opportunity to apply serious measurement science at a company operating at the frontier of AI development — where the stakes for getting it right are unusually high.