Computer Vision & Machine Learning Engineer

You will work on cutting-edge computer vision and machine learning problems, developing algorithms and systems that enable natural human-computer interaction. This includes human perception, motion synthesis, biometric recognition, 3D vision, and performance-critical real-time systems. You will be responsible for developing and optimizing computer vision and machine learning algorithms for human understanding, including pose estimation, gesture recognition, facial analysis, and behavioral modeling. You will build motion synthesis systems and algorithms for realistic human motion generation and animation, design and implement biometric algorithms for secure authentication and identification systems, and create real-time 3D perception and tracking systems for spatial computing and AR/VR applications. As a member of a fast-paced team, you have the unique and rewarding opportunity to shape upcoming products that will delight and inspire millions of people every day. Minimum Qualifications Master's or equivalent practical experience, in Computer Science, Computer Vision, Machine Learning, or related technical field Experience in deep learning with demonstrated work in at least one area of multimodal systems (e.g. vision, language, video, etc.) Proficiency in Python and in a modern deep learning framework such as PyTorch or JAX Experience with rapid prototyping, reproduction, and validation of research ideas Strong mathematical foundations in machine learning, computer vision, or related fields Experience with foundation model architectures and training methodologies Experience working effectively in a multi-functional, collaborative environment Preferred Qualifications PhD, or equivalent practical experience, in Computer Science, Machine Learning, Computer Vision, or a related technical field Demonstrated expertise in deep learning, with either: A publication record in relevant conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, COLM, etc), or a strong track record of applying deep learning techniques to real-world products Experience with foundation models (language or multimodal) including training, fine-tuning, and deployment Experience applying foundation models to build autonomous or semi-autonomous agents, including planning, task decomposition, and multi-step reasoning Experience with multimodal pretraining, vision-language models, video-language models, and multimodal alignment Experience with large-scale distributed training and model parallelism Strong communication skills and ability to present research findings to both technical and non-technical audiences

Similar jobs