Sr. Data Engineer

Mandatory Technical Skills

  • Python – strong hands‑on experience in enterprise‑scale development
  • Apache Spark / PySpark – building large‑scale distributed data pipelines
  • Big Data concepts – partitioning, shuffling, performance tuning
  • SQL – complex queries, joins, and optimization
  • Cloud exposure on GCP, including:
    • BigQuery
    • Cloud Storage (GCS)
    • Dataflow (preferred)
    • Pub/Sub (basic understanding)
  • Experience working with large datasets (TB scale)

Key Responsibilities

  • Design, develop, and maintain scalable data pipelines using Python and PySpark
  • Build and optimize batch and near‑real‑time data processing workflows
  • Work on GCP‑based data platforms, integrating multiple structured and semi‑structured data sources
  • Develop data transformations, validations, and enrichment logic aligned to business requirements
  • Optimize Spark jobs for performance, scalability, and cost efficiency
  • Collaborate with data architects, platform teams, and downstream consumers
  • Implement logging, monitoring, and error‑handling within data pipelines
  • Ensure adherence to enterprise data governance, security, and compliance standards
  • Support production deployments and provide L2/L3 support as needed

Good to Have / Preferred Skills

  • Knowledge of Kafka or other messaging systems
  • Experience with Airflow / Cloud Composer
  • Containerization and orchestration exposure (Docker / Kubernetes)
  • CI/CD exposure for data pipelines
  • Familiarity with data warehousing and analytics use cases
  • Prior experience in financial services / regulated environments

Similar jobs