Agentic AI Data Engineer
We are seeking a highly skilled Agentic AI Data Engineer to design, build, and optimize intelligent, autonomous data systems that power next-generation AI applications. This role blends data engineering, machine learning infrastructure, and emerging agent-based AI frameworks to enable scalable, self-orchestrating pipelines and decision-making systems.
You will work at the intersection of data platforms, large language models (LLMs), and cloud-native architectures—building systems that can reason, act, and adapt autonomously.
Base Compensation Range: 110,000 - 160,000
The posted range is the hiring range for this role — a subset of the broader range available to employees over time — and reflects base salary across our national hiring scale. Final offers are based on several factors, including the candidate's skills and experience, internal pay equity, work location, market conditions for the role, and the specific scope and responsibilities of the position. The top of the range is reserved for candidates who notably exceed the requirements; the lower end applies to those with less experience or fewer preferred qualifications. For positions based in higher-cost zones (e.g., California, New York, New Jersey), actual compensation may exceed the posted range; your recruiter will share specifics during the process.
Key Responsibilities
- Design and implement agentic AI systems that autonomously orchestrate data workflows and decision pipelines
- Build scalable data pipelines for structured and unstructured data (batch + real-time)
- Develop and manage LLM-powered applications using retrieval-augmented generation (RAG), tool use, and multi-agent frameworks
- Integrate AWS AI/ML services into production-grade architectures
- Develop and optimize data lakes, warehouses, and lakehouse architectures
- Build APIs and microservices to expose AI/ML capabilities
- Ensure data quality, governance, and security across pipelines
- Collaborate with data scientists, ML engineers, and product teams to deploy AI solutions
- Implement monitoring, logging, and observability for AI agents and pipelines
- Optimize cost and performance of cloud-based AI workloads
Cloud & AWS Ecosystem
Strong experience with AWS services, including:
- Amazon S3, Glue, Lambda, Step Functions
- Amazon Redshift / Athena
- Amazon SageMaker (training, deployment, pipelines)
- Amazon Bedrock (foundation models, agents, knowledge bases)
AI/ML & Agentic Systems
- Experience with LLMs and generative AI systems
- Hands-on with agent frameworks (e.g., multi-agent orchestration, tool calling, planning systems)
- Familiarity with AgentCore / agent orchestration platforms
- Understanding of RAG architectures, embeddings, and vector databases
Experience with model deployment, inference optimization, and prompt engineering
Data Engineering
- Strong proficiency in Python and SQL
- Experience with ETL/ELT tools and frameworks
- Distributed data processing (Spark, PySpark, or similar)
- Streaming technologies (Kafka, Kinesis, or similar)
Data modeling and schema design
Data & AI Infrastructure
- Experience with vector databases (e.g., Pinecone, FAISS, OpenSearch)
- Knowledge of data lakehouse architectures (Delta Lake, Iceberg, Hudi)
- Containerization (Docker) and orchestration (Kubernetes)
- CI/CD for ML and data pipelines