Data Engineer
Role Description
- Lead technical architecture and solution design for large scale data platform solutions
- Proactively Interact with customer’s data engineering team to provide opinionated guidance, technical product workshops on data and analytics platform architecture, best practices, solution design and development
- Independently develop solutions, i.e., perform hands on coding for the designed solution, handover developed solution to customer’s technical team
- Perform pair programming with customer’s data engineers
- Project manage small to mid sized projects, managing scope, priorities, deliverables, risks/issues, and timelines for successful outcomes
- Collaborate with cross-disciplinary project team, play the role of workstream or overall project tech lead depending on size and scale of project / program.
Required Expertise:
- In-depth knowledge and experience of GCP data and analytics technologies
- Designing data platform architecture and design for (a) migrating open source or other public cloud based data platforms to to GCP cloud technologies (not just lift and shift migration but re-engineering to services) (b) Designing green field data platforms on cloud
- Define solution architecture and detailed design, and perform hands-on implementation for
- Data pipelines for batch and event driven ingestion and processing of data from a variety of sources such as on-prem files, on-prem databases, APIs, etc.
- data pipelines for real time data ingestion and processing
- Automatic transpilation of legacy code (HIVE, Teradata, python logic etc.) to BQ SQL.
- BQ query performance optimisation.
- CI/CD pipelines for data workloads using Cloud Build, Artifact Registry, Terraform
- Data governance solutioning using GCP governance tooling (Dataplex, Data Catalog)
Must have
- GCP Dataflow, Dataproc, Pub/Sub, Cloud Composer, Cloud Workflow, BigQuery, Cloud Run, CloudBuild
- Must have: programming knowledge and willingness to be hands-on - Python, Java
Good to Have
- Experience in computing infrastructure (e.g., servers, databases, firewalls, load balancers, kubernetes) and architecting, developing, or maintaining solutions in virtualized environments.
- Experience with dbt or Dataform, Terraform in context of data pipelines
- Experience with open source ecosystem and distributions such as Hadoop, Spark, Cloudera/Hortonworks and frameworks and tech such as SPARK, Oozie, Kafka, HBASE
- Understanding and experience with NoSQL databases such as HBASE, MongoDB
- Knowledge of cloud databases such as Spanner, BigTable, Cloud SQL, DB migrations Certification
- Good to have GCP Data engineering