Data Scientist - Survey Design, Data Annotation, and Machine Learning Evaluation
The Special Projects team at Apple is developing novel user-facing conversational features that
leverage the multimodal capabilities of state-of-the-art foundation models. As part of this
process, we generate real-world and simulated data, gather human data annotations, analyze
the results, and use them to build and evaluate Large Language Model judges. We are looking
for a skilled Data Scientist to join our Machine Learning Evaluations teams. This person will
work closely with ML Engineers to manage and analyze our human and automated data
annotation processes, and to develop, test, and refine LLM judges for generative AI model
evaluation. A successful candidate is experienced in survey design, data annotation, LLM
prompt engineering and prompt optimization, and has strong statistical analysis skills.
Minimum Qualifications
BA or Master’s degree in Data Science, Statistics, or a quantitative social science field
2+ years of hands-on experience working in survey design and human data annotation
Proficiency in Python
Excellent communication skills
Preferred Qualifications
PhD in Data Science, Statistics, or a quantitative social science field
Hands-on industry experience with product-focused statistical analysis
Experience working with large-scale multimodal data and data-annotation pipelines
Experience with LLM prompt engineering & prompt optimization
Experience with LLM auto-judges for generative AI model evaluation
A track record of publications or technical presentations in Data Science or a related field
Excellent at cross-functional collaboration