MVA Multi-Modality Interaction Developer

Key Responsibilities

Develop based on the current mainstream speech systems, including SSPE, wakeup, vad, asr, nlu, dm, tts, LLM, and etc.
Design and implement multimodal fusion combining speech, DMS camera, OMS camera, Dash camera, microphone, sensors, audio system state, voice print, and vehicle state data.
Normalize and structure multimodal inputs into system context representations suitable for LLM reasoning to support future LLM-based assistant use cases, such as; context-aware dialogue, assistant memory collection and apply, and etc.
Design and maintain consistent multimodal data pipelines, handling time alignment, normalization, and state coherence as data flows from vehicle systems into LLM-ready context representations.
Consume vehicle system capabilities through service-oriented APIs, enabling intent-driven control of vehicle functions.
Integrate and abstract data from multiple vehicle ECUs (audio, cameras, sensors, body, ADAS, etc.), with the ability to independently explore and onboard new data sources.
Collaborate closely with EE, platform, AI, and UX teams, acting as a cross-team technical bridge.