Data Engineer
Responsibilities:
- Build and optimise scalable data pipelines using PySpark and SQL
- Design and implement ETL/ELT processes for batch and streaming data
- Develop data solutions using Databricks Lakehouse and Delta Lake
- Ingest and integrate data from internal and external sources (e.g. Kafka, CDC)
- Optimise Spark jobs and data workflows for performance, scalability, and cost efficiency
- Manage infrastructure and environments using Terraform (IaC)
- Ensure data quality, monitoring, and reliability
- Implement governance and access controls (e.g. Unity Catalog)
- Deliver clean, structured, and accessible data for analytics and business use
- Collaborate with cross-functional teams to support analytics, reporting, and AI/ML initiatives
Qualifications:
- Demonstrated experience in data engineering, with a proven ability to build scalable data solutions
- Strong proficiency in Python and SQL
- Hands-on experience with Apache Spark (including Structured Streaming)
- Experience with Databricks (Workflows, Delta Live Tables, Lakehouse architecture)
- Experience with cloud platforms (AWS, Azure, or GCP)
- Experience with Terraform or similar infrastructure-as-code tools
- Experience working with structured and semi-structured data (e.g. JSON)
- Familiarity with CI/CD, modular development, and code documentation
- Strong communication skills and ability to work independently with a high level of ownership
- Preferred experience with Databricks certifications (Associate or Professional) and exposure to data tools such as Kafka, DBT, or similar technologies
- Advantageous to have knowledge of Scala or other programming languages, as well as experience working in Agile development environments