Sr Software Engineer (Scala /Java and Python)

What's the role?

As a Senior Software Engineer – Multimodal AI Systems, you will lead the integration, evaluation, and testing of advanced Vision Foundation Models (VFMs) and Vision-Language Models (VLMs). You will play a key role in building scalable systems and evaluation frameworks for multimodal AI applications involving image, video, and semantic understanding.

Integrate Vision Foundation Models (VFMs) and Vision-Language Models (VLMs) into scalable production systems, developing APIs, inference pipelines, and backend services for multimodal applications
Collaborate with data scientists and ML engineers to deploy, optimize, and continuously improve AI workflows
Design and implement automated evaluation frameworks, benchmarking pipelines, and testing strategies to assess model accuracy, robustness, latency, and overall performance across image, video, and multimodal tasks
Build scalable, reliable, and maintainable infrastructure, including data pipelines for large-scale image, video, and multimodal datasets, while optimizing system performance and throughput
Analyze model performance, identify failure cases, and contribute to continuous improvement initiatives for AI systems
Support integration of retrieval-augmented (RAG) systems, working with embeddings, vector databases, and multimodal retrieval to enable semantic search and contextual AI workflows

Who are you?

You are an experienced software engineer with a strong foundation in AI/ML systems and a passion for building scalable, real-world applications. You bring a balance of system design expertise, hands-on coding, and a collaborative mindset.

Bachelor’s or Master’s degree in Computer Science, Software Engineering, AI, or a related field
Proven experience in software engineering, preferably in AI/ML systems
4+ year of strong proficiency in Java or Scala , Python
Experience with backend architecture, REST APIs, and microservices
Hands-on experience with PyTorch and GPU-based inference systems
Familiarity with distributed systems and scalable data pipelines
Understanding of computer vision, Vision Foundation Models (VFMs), and Vision-Language Models (VLMs)
Experience working with multimodal AI systems and models such as CLIP, BLIP, or similar
Experience in building evaluation frameworks, benchmarking pipelines, and performance testing
Familiarity with large-scale image/video datasets and data processing techniques
Exposure to embeddings, vector databases, or retrieval-based systems
Collaborative, solution-oriented mindset with strong problem-solving skills

What we offer

HERE offers an opportunity to work in a cutting-edge technology environment with challenging problems to solve! You can make a direct impact on delivery of company´s strategic goals and the freedom to decide how to perform your work. We will support you in delivering your day-to-day tasks and achieving your personal goals and developing your skills. Personal development is highly encouraged at HERE. You can take different courses and training at our online Learning Campus and join cross-functional team projects within our Talent Platform.

HERE is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, age, gender identity, sexual orientation, marital status, parental status, religion, sex, national origin, disability, veteran status, and other legally protected characteristics.

Who are we?

HERE Technologies is a location data and technology platform company. We empower our customers to achieve better outcomes – from helping a city manage its infrastructure or a business optimize its assets to guiding drivers to their destination safely.

At HERE we take it upon ourselves to be the change we wish to see. We create solutions that fuel innovation, provide opportunity and foster inclusion to improve people’s lives. If you are inspired by an open world and driven to create positive change, join us. Learn more about us on our YouTube Channel.

You will join a team focused on advancing multimodal AI capabilities, working at the intersection of computer vision, large-scale AI systems, and software engineering. The team collaborates closely with data scientists and ML engineers to build scalable, production-grade vision and vision-language solutions that power next-generation products.