Sr. Data Engineer

Mandatory Technical Skills

Python – strong hands‑on experience in enterprise‑scale development
Apache Spark / PySpark – building large‑scale distributed data pipelines
Big Data concepts – partitioning, shuffling, performance tuning
SQL – complex queries, joins, and optimization
Cloud exposure on GCP, including:
- BigQuery
- Cloud Storage (GCS)
- Dataflow (preferred)
- Pub/Sub (basic understanding)
Experience working with large datasets (TB scale)

Key Responsibilities

Design, develop, and maintain scalable data pipelines using Python and PySpark
Build and optimize batch and near‑real‑time data processing workflows
Work on GCP‑based data platforms, integrating multiple structured and semi‑structured data sources
Develop data transformations, validations, and enrichment logic aligned to business requirements
Optimize Spark jobs for performance, scalability, and cost efficiency
Collaborate with data architects, platform teams, and downstream consumers
Implement logging, monitoring, and error‑handling within data pipelines
Ensure adherence to enterprise data governance, security, and compliance standards
Support production deployments and provide L2/L3 support as needed

Good to Have / Preferred Skills

Similar jobs