Sr Software Engineer (Scala /Java and Python)
What's the role?
As a Senior Software Engineer – Multimodal AI Systems, you will lead the integration, evaluation, and testing of advanced Vision Foundation Models (VFMs) and Vision-Language Models (VLMs). You will play a key role in building scalable systems and evaluation frameworks for multimodal AI applications involving image, video, and semantic understanding.
- Integrate Vision Foundation Models (VFMs) and Vision-Language Models (VLMs) into scalable production systems, developing APIs, inference pipelines, and backend services for multimodal applications
- Collaborate with data scientists and ML engineers to deploy, optimize, and continuously improve AI workflows
- Design and implement automated evaluation frameworks, benchmarking pipelines, and testing strategies to assess model accuracy, robustness, latency, and overall performance across image, video, and multimodal tasks
- Build scalable, reliable, and maintainable infrastructure, including data pipelines for large-scale image, video, and multimodal datasets, while optimizing system performance and throughput
- Analyze model performance, identify failure cases, and contribute to continuous improvement initiatives for AI systems
- Support integration of retrieval-augmented (RAG) systems, working with embeddings, vector databases, and multimodal retrieval to enable semantic search and contextual AI workflows
Who are you?
You are an experienced software engineer with a strong foundation in AI/ML systems and a passion for building scalable, real-world applications. You bring a balance of system design expertise, hands-on coding, and a collaborative mindset.
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, AI, or a related field
- Proven experience in software engineering, preferably in AI/ML systems
- 4+ year of strong proficiency in Java or Scala , Python
- Experience with backend architecture, REST APIs, and microservices
- Hands-on experience with PyTorch and GPU-based inference systems
- Familiarity with distributed systems and scalable data pipelines
- Understanding of computer vision, Vision Foundation Models (VFMs), and Vision-Language Models (VLMs)
- Experience working with multimodal AI systems and models such as CLIP, BLIP, or similar
- Experience in building evaluation frameworks, benchmarking pipelines, and performance testing
- Familiarity with large-scale image/video datasets and data processing techniques
- Exposure to embeddings, vector databases, or retrieval-based systems
- Collaborative, solution-oriented mindset with strong problem-solving skills
What we offer
HERE offers an opportunity to work in a cutting-edge technology environment with challenging problems to solve! You can make a direct impact on delivery of company´s strategic goals and the freedom to decide how to perform your work. We will support you in delivering your day-to-day tasks and achieving your personal goals and developing your skills. Personal development is highly encouraged at HERE. You can take different courses and training at our online Learning Campus and join cross-functional team projects within our Talent Platform.
HERE is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, age, gender identity, sexual orientation, marital status, parental status, religion, sex, national origin, disability, veteran status, and other legally protected characteristics.
Who are we?
HERE Technologies is a location data and technology platform company. We empower our customers to achieve better outcomes – from helping a city manage its infrastructure or a business optimize its assets to guiding drivers to their destination safely.
At HERE we take it upon ourselves to be the change we wish to see. We create solutions that fuel innovation, provide opportunity and foster inclusion to improve people’s lives. If you are inspired by an open world and driven to create positive change, join us. Learn more about us on our YouTube Channel.
You will join a team focused on advancing multimodal AI capabilities, working at the intersection of computer vision, large-scale AI systems, and software engineering. The team collaborates closely with data scientists and ML engineers to build scalable, production-grade vision and vision-language solutions that power next-generation products.