Senior Data Engineer (Platform)

We are looking for a Senior Data Engineer to build and evolve the data platform powering our global workforce management ecosystem. You will design, implement, and maintain scalable data pipelines that consolidate data from multiple operational systems, transform it into trusted analytical datasets, and make it available for reporting, product analytics, and business intelligence.

You should be comfortable working with modern cloud-native data architectures on AWS, building reliable ETL/ELT pipelines, and designing data models optimized for analytical workloads. This role requires a strong engineering mindset, balancing performance, scalability, data quality, and operational excellence while collaborating closely with software engineers, product teams, analysts, and data scientists.

What You Will Own

Design, build, and maintain scalable batch and streaming data pipelines using AWS-native services and distributed processing frameworks
Develop ETL/ELT workflows to ingest, consolidate, sanitize, enrich, and transform data from multiple internal and external systems
Build and optimize AWS Data Lake solutions using Amazon S3, AWS Glue, Amazon Redshift, and Amazon Kinesis Firehose
Design and implement distributed data processing jobs using Apache Spark, AWS Glue, Databricks, or equivalent technologies
Develop orchestration workflows using Apache Airflow (MWAA), AWS Step Functions, or similar workflow orchestration platforms
Design analytical data models including star schemas, snowflake schemas, dimensional models, and optimized reporting datasets
Optimize Redshift performance through distribution strategies, sort keys, partitioning, workload tuning, and query optimization
Build resilient pipelines supporting retries, idempotency, checkpointing, incremental processing, and partial failure recovery
Implement automated data quality validation, schema evolution, lineage tracking, and governance controls
Develop infrastructure and deployment automation using Infrastructure as Code and CI/CD pipelines
Monitor, troubleshoot, and continuously improve the reliability, scalability, and performance of the data platform
Collaborate with analysts, software engineers, data scientists, and product managers to translate business requirements into scalable data solutions
Participate in architecture discussions and contribute technical documentation, standards, and best practices

What We Are Looking For

5+ years of professional experience building production data pipelines and cloud-based data platforms
Strong experience with AWS data services including Amazon Redshift, AWS Glue, Amazon S3, and Amazon Kinesis Firehose
Strong Python programming skills for ETL development, automation, event processing, and scripting
Advanced SQL expertise including query optimization, window functions, analytical queries, versioned migrations, rollback strategies, and warehouse tuning
Experience designing scalable ETL/ELT pipelines for both batch and streaming workloads
Experience with distributed compute and storage using Apache Spark, AWS Glue, Databricks, or similar distributed processing frameworks
Strong understanding of data warehousing concepts including dimensional modeling, star schemas, snowflake schemas, partitioning strategies, and analytical data structures
Experience designing end-to-end data architectures including ingestion, transformation, orchestration, and consumption layers
Experience implementing workflow orchestration using Apache Airflow (MWAA), AWS Step Functions, or equivalent orchestration tools
Understanding of data governance, metadata management, security best practices, IAM, encryption, and regulatory compliance considerations
Experience with Git-based collaborative development workflows, CI/CD pipelines, Infrastructure as Code, deployment approvals, versioned migrations, and safe rollback strategies
Experience monitoring and maintaining production data infrastructure, ensuring high availability, observability, data quality, and operational reliability
Strong communication skills with the ability to explain technical concepts to business stakeholders and collaborate effectively across engineering, analytics, and product teams

Nice to Have

Experience with Apache Iceberg, Delta Lake, Apache Hudi, or modern open table formats
Experience with dbt or SQL-based transformation frameworks
Familiarity with Kafka, Amazon MSK, or other streaming platforms
Experience with Lakehouse architectures and modern analytical data platforms
Knowledge of Terraform or AWS CloudFormation
Experience with containerized data workloads using Docker and ECS/EKS
Experience implementing DataOps practices and automated testing for data pipelines
Familiarity with BI platforms such as Tableau, Power BI, Looker, or QuickSight
Experience implementing data catalogs, lineage, and governance solutions
Exposure to machine learning feature pipelines or data science infrastructure

Tech Stack

Layer	Technology
Programming	Python, SQL, PySpark
Data Processing	Apache Spark, AWS Glue, Databricks
Data Storage	Amazon S3, Amazon Redshift, Parquet
Streaming	Amazon Kinesis Firehose, EventBridge
Orchestration	Apache Airflow (MWAA), AWS Step Functions
Data Modeling	Star Schema, Snowflake Schema, Dimensional Modeling
Infrastructure	AWS, IAM, CloudWatch
IaC/CI	Git, GitHub Actions, Terraform, CloudFormation
Observability	CloudWatch, Datadog (or equivalent observability platforms)
Governance	Data Catalog, Metadata Management, Data Lineage