Python, Pyspark, GCP Data Engineer
Key Responsibilities
- Pipeline Development: Design, develop, and maintain end-to-end ETL/ELT pipelines using Python and PySpark.
- Big Data Processing: Build large-scale data processing frameworks to handle structured and unstructured data, ensuring high performance and reliability.
- Cloud Infrastructure: Architect and manage data solutions within the GCP ecosystem, focusing on cost-efficiency and security.
- Data Modeling: Design and implement robust data warehouse models (Star/Snowflake schemas) and data lake architectures.
- Optimization: Identify, design, and implement internal process improvements, such as automating manual processes and optimizing data delivery for greater scalability.
- Collaboration: Work closely with stakeholders to understand data requirements and translate them into technical specifications.
Technical Qualifications
- Core Programming: Strong proficiency in Python, including experience with libraries like Pandas, NumPy, and logging frameworks.
- Big Data: 3+ years of hands-on experience with Apache Spark (PySpark) for distributed data processing.
- GCP Ecosystem: Practical experience with Google Cloud services, specifically:
- BigQuery (Optimization, Partitioning, Clustering).
- Cloud DataProc or Dataflow.
- Cloud Storage (GCS) and Cloud Functions.
- Cloud Composer (Apache Airflow) for orchestration.
- Data Warehousing: Solid understanding of relational databases and SQL (PostgreSQL, MySQL) as well as NoSQL environments.
- DevOps & Tools: Experience with Git, Docker, and CI/CD pipelines. Familiarity with Terraform or other IaC tools is a significant plus.
Preferred Skills
- Experience with real-time data streaming (e.g., Google Pub/Sub or Kafka).
- Knowledge of data governance, security, and privacy compliance (GDPR/CCPA).
- Experience in optimizing Spark jobs (shuffling, partitioning, and memory management).
- Professional Google Cloud Data Engineer certification.
Soft Skills
- Analytical Thinking: Ability to break down complex data problems into manageable technical tasks.
- Communication: Strong verbal and written skills to interact with both technical and non-technical teams.
- Adaptability: A self-starter who stays current with the evolving data engineering landscape.
- Mentorship: Willingness to provide guidance and conduct code reviews for more junior team members.