Senior Data Engineer (Platform)
We are looking for a Senior Data Engineer to build and evolve the data platform powering our global workforce management ecosystem. You will design, implement, and maintain scalable data pipelines that consolidate data from multiple operational systems, transform it into trusted analytical datasets, and make it available for reporting, product analytics, and business intelligence.
You should be comfortable working with modern cloud-native data architectures on AWS, building reliable ETL/ELT pipelines, and designing data models optimized for analytical workloads. This role requires a strong engineering mindset, balancing performance, scalability, data quality, and operational excellence while collaborating closely with software engineers, product teams, analysts, and data scientists.
What You Will Own
- Design, build, and maintain scalable batch and streaming data pipelines using AWS-native services and distributed processing frameworks
- Develop ETL/ELT workflows to ingest, consolidate, sanitize, enrich, and transform data from multiple internal and external systems
- Build and optimize AWS Data Lake solutions using Amazon S3, AWS Glue, Amazon Redshift, and Amazon Kinesis Firehose
- Design and implement distributed data processing jobs using Apache Spark, AWS Glue, Databricks, or equivalent technologies
- Develop orchestration workflows using Apache Airflow (MWAA), AWS Step Functions, or similar workflow orchestration platforms
- Design analytical data models including star schemas, snowflake schemas, dimensional models, and optimized reporting datasets
- Optimize Redshift performance through distribution strategies, sort keys, partitioning, workload tuning, and query optimization
- Build resilient pipelines supporting retries, idempotency, checkpointing, incremental processing, and partial failure recovery
- Implement automated data quality validation, schema evolution, lineage tracking, and governance controls
- Develop infrastructure and deployment automation using Infrastructure as Code and CI/CD pipelines
- Monitor, troubleshoot, and continuously improve the reliability, scalability, and performance of the data platform
- Collaborate with analysts, software engineers, data scientists, and product managers to translate business requirements into scalable data solutions
- Participate in architecture discussions and contribute technical documentation, standards, and best practices
What We Are Looking For
- 5+ years of professional experience building production data pipelines and cloud-based data platforms
- Strong experience with AWS data services including Amazon Redshift, AWS Glue, Amazon S3, and Amazon Kinesis Firehose
- Strong Python programming skills for ETL development, automation, event processing, and scripting
- Advanced SQL expertise including query optimization, window functions, analytical queries, versioned migrations, rollback strategies, and warehouse tuning
- Experience designing scalable ETL/ELT pipelines for both batch and streaming workloads
- Experience with distributed compute and storage using Apache Spark, AWS Glue, Databricks, or similar distributed processing frameworks
- Strong understanding of data warehousing concepts including dimensional modeling, star schemas, snowflake schemas, partitioning strategies, and analytical data structures
- Experience designing end-to-end data architectures including ingestion, transformation, orchestration, and consumption layers
- Experience implementing workflow orchestration using Apache Airflow (MWAA), AWS Step Functions, or equivalent orchestration tools
- Understanding of data governance, metadata management, security best practices, IAM, encryption, and regulatory compliance considerations
- Experience with Git-based collaborative development workflows, CI/CD pipelines, Infrastructure as Code, deployment approvals, versioned migrations, and safe rollback strategies
- Experience monitoring and maintaining production data infrastructure, ensuring high availability, observability, data quality, and operational reliability
- Strong communication skills with the ability to explain technical concepts to business stakeholders and collaborate effectively across engineering, analytics, and product teams
Nice to Have
- Experience with Apache Iceberg, Delta Lake, Apache Hudi, or modern open table formats
- Experience with dbt or SQL-based transformation frameworks
- Familiarity with Kafka, Amazon MSK, or other streaming platforms
- Experience with Lakehouse architectures and modern analytical data platforms
- Knowledge of Terraform or AWS CloudFormation
- Experience with containerized data workloads using Docker and ECS/EKS
- Experience implementing DataOps practices and automated testing for data pipelines
- Familiarity with BI platforms such as Tableau, Power BI, Looker, or QuickSight
- Experience implementing data catalogs, lineage, and governance solutions
- Exposure to machine learning feature pipelines or data science infrastructure
Tech Stack
|
Layer |
Technology |
|
Programming |
Python, SQL, PySpark |
|
Data Processing |
Apache Spark, AWS Glue, Databricks |
|
Data Storage |
Amazon S3, Amazon Redshift, Parquet |
|
Streaming |
Amazon Kinesis Firehose, EventBridge |
|
Orchestration |
Apache Airflow (MWAA), AWS Step Functions |
|
Data Modeling |
Star Schema, Snowflake Schema, Dimensional Modeling |
|
Infrastructure |
AWS, IAM, CloudWatch |
|
IaC/CI |
Git, GitHub Actions, Terraform, CloudFormation |
|
Observability |
CloudWatch, Datadog (or equivalent observability platforms) |
|
Governance |
Data Catalog, Metadata Management, Data Lineage |