Senior Data Software Engineer
We are looking for a Senior Data Software Engineer to join our team. This role focuses on building and supporting data infrastructure that powers AI-driven products and intelligent agent systems. You'll have the opportunity to work with cutting-edge technologies and contribute to scalable, reliable platforms in a collaborative environment. Responsibilities Design, build, and maintain data ingestion and processing pipelines that feed RAG systems, including handling unstructured data, images, videos, metadata, and permissions Administer and optimize vector database infrastructure, including Amazon Kendra with an ongoing migration to OpenSearch Create evaluation datasets and performance measurement frameworks for agents Develop monitoring and observability pipelines for AI workloads, covering latency, quality, and cost dashboards Implement data governance, privacy safeguards, and quality controls for AI training and inference data Support A/B testing and experimentation infrastructure for assessing agent iterations Collaborate with Backend AI engineers on data schemas and embedding strategies Requirements A minimum of 3 years of data engineering experience, including direct exposure to AI/ML data infrastructure Strong Python skills for building data pipelines, ETL processes, and backend automation scripting Hands-on production experience with vector databases, including schema design and index management for Amazon Kendra or OpenSearch Deep understanding of search and retrieval concepts, including embedding models, chunking strategies, and retrieval optimization Practical knowledge of AWS services such as S3, Glue, Athena, and Kinesis (or equivalents), along with Docker and distributed data environments Experience embedding data quality practices such as monitoring, validation, and lineage tracking as operational defaults Background in designing AI/ML evaluation metrics and establishing systematic tracking through evaluation frameworks English language proficiency (written and spoken) at B2+ level or higher Nice to have Experience with LangSmith, RAGAS, or custom evaluation framework solutions Background in multi-modal data processing covering unstructured text, images, and videos, along with associated governance Hands-on involvement with LLM fine-tuning data preparation Familiarity with observability tooling deeply integrated with AI calls, such as Langfuse or Arize Experience building streaming data pipelines using technologies such as Kafka or Kinesis