Lead Data Software Engineer

We are seeking a Lead Data Software Engineer to come on board with our team. This position centers on developing and maintaining data infrastructure that fuels AI-powered products and intelligent agent systems. You'll get the chance to engage with state-of-the-art technologies and help shape scalable, dependable platforms within a cooperative setting. Responsibilities Plan, build, and support data ingestion and processing pipelines that supply RAG systems, covering the management of unstructured data, images, videos, metadata, and permissions Oversee and fine-tune vector database infrastructure, such as Amazon Kendra alongside an active migration toward OpenSearch Build evaluation datasets and performance measurement frameworks tailored to agents Establish monitoring and observability pipelines for AI workloads, including dashboards for latency, quality, and cost Roll out data governance, privacy guardrails, and quality controls for AI training and inference data Back A/B testing and experimentation infrastructure used to evaluate agent iterations Work jointly with Backend AI engineers on data schemas and embedding approaches Requirements At least 5 years of data engineering background, including direct work with AI/ML data infrastructure A minimum of one year guiding and managing development teams Solid Python expertise for crafting data pipelines, ETL workflows, and backend automation scripts Practical production experience with vector databases, covering schema design and index management for Amazon Kendra or OpenSearch Thorough grasp of search and retrieval concepts, including embedding models, chunking techniques, and retrieval optimization Working familiarity with AWS services like S3, Glue, Athena, and Kinesis (or equivalents), as well as Docker and distributed data environments Experience treating data quality practices such as monitoring, validation, and lineage tracking as operational standards Background in defining AI/ML evaluation metrics and setting up systematic tracking using evaluation frameworks English language proficiency in writing and speaking at B2+ level or higher Nice to have Exposure to LangSmith, RAGAS, or custom-built evaluation framework approaches Experience with multi-modal data processing involving unstructured text, images, and videos, together with related governance Hands-on participation in LLM fine-tuning data preparation Familiarity with observability tools tightly integrated with AI calls, such as Langfuse or Arize Background in constructing streaming data pipelines with technologies like Kafka or Kinesis

Similar jobs