Sr. Data Engineer
Mandatory Technical Skills
- Python – strong hands‑on experience in enterprise‑scale development
- Apache Spark / PySpark – building large‑scale distributed data pipelines
- Big Data concepts – partitioning, shuffling, performance tuning
- SQL – complex queries, joins, and optimization
- Cloud exposure on GCP, including:
- BigQuery
- Cloud Storage (GCS)
- Dataflow (preferred)
- Pub/Sub (basic understanding)
- Experience working with large datasets (TB scale)
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using Python and PySpark
- Build and optimize batch and near‑real‑time data processing workflows
- Work on GCP‑based data platforms, integrating multiple structured and semi‑structured data sources
- Develop data transformations, validations, and enrichment logic aligned to business requirements
- Optimize Spark jobs for performance, scalability, and cost efficiency
- Collaborate with data architects, platform teams, and downstream consumers
- Implement logging, monitoring, and error‑handling within data pipelines
- Ensure adherence to enterprise data governance, security, and compliance standards
- Support production deployments and provide L2/L3 support as needed
Good to Have / Preferred Skills
- Knowledge of Kafka or other messaging systems
- Experience with Airflow / Cloud Composer
- Containerization and orchestration exposure (Docker / Kubernetes)
- CI/CD exposure for data pipelines
- Familiarity with data warehousing and analytics use cases
- Prior experience in financial services / regulated environments