Data Engineer
You will embed directly with clients to design, build, and operate data pipelines and foundations that power AI workflows. You will architect and deploy ETL/ELT pipelines, model data stores and schemas for AI use cases, and optimize performance and cost. You will implement data security, quality assurance, governance, and disaster recovery processes, and automate operations to maintain high reliability for production systems.
Responsibilities
- Architect and deploy ETL and ELT pipelines to ingest, transform, and store data from high-volume disparate sources for real-time analysis
- Create and maintain a reliable single source of truth for enterprise intelligence
- Architect and optimize production-grade data foundations to support high-performance AI workflows and automated decision-making
- Establish and automate data security, quality assurance, and governance processes
- Design systems for high fault tolerance and rapid disaster recovery
- Design and model for efficient queries, resource usage, workload scheduling, and cost optimization
Requirements
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
- 5+ years of experience in data engineering within a cloud environment with progression into architectural design
- Proficiency in SQL
- Strong programming skills in Python, Rust, or Java
- Experience building and maintaining data pipelines using processing or streaming frameworks such as Kafka, Flink, Beam, or Spark
- Experience with orchestration tools such as Airflow
- Experience architecting data stores and schemas for AI workflows such as RAG
- Active Google Cloud certifications or willingness to obtain within one month of joining
- US Citizen
- Preferred deep expertise in Google Cloud Platform including Dataflow (Apache Beam), Pub/Sub, BigQuery, and Cloud Composer
- Preferred experience in data modeling and architecture across relational, NoSQL, and graph databases including PostgreSQL, Firestore, MongoDB, and Neo4j
- Preferred experience preparing unstructured data, vector databases, and RAG pipelines for data science or Generative AI
- Familiarity with regulatory compliance frameworks such as FedRAMP and HIPAA
- Experience with modern data platforms like Snowflake or Databricks
Benefits
- Hybrid work environment (MWF in-person in our Reston office)