Senior Data Engineer

Overview

BTVK Advisory is a leading advisory firm whose specialized professionals guide clients through an ever-changing business world, helping them win now and anticipate tomorrow. BTVK Advisory, and its affiliated entities, have operations in North America, South America, Europe, Asia, and Australia. BTVK Advisory’s ultimate parent entity, Baker Tilly US, LLP, is an independent member of Baker Tilly International, a worldwide network of independent accounting and business advisory firms in 141 territories, with 43,000 professionals and a combined worldwide revenue of $5.2 billion.

Baker Tilly is an equal opportunity/affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability or protected veteran status, gender identity, sexual orientation, or any other legally protected basis, in accordance with applicable federal, state or local law.

To be added to all ET through Experienced requisitions Any unsolicited resumes submitted through our website or to Baker Tilly Advisory Group, LP, employee e-mail accounts are considered property of Baker Tilly Advisory Group, LP, and are not subject to payment of agency fees. In order to be an authorized recruitment agency ("search firm") for Baker Tilly Advisory Group, LP, there must be a formal written agreement in place and the agency must be invited, by Baker Tilly's Talent Attraction team, to submit candidates for review via our applicant tracking system.

Job Description:

Responsibilities:

Design, build, and maintain robust, scalable ETL/ELT pipelines on Azure Databricks to ingest, transform, and curate data across the medallion architecture — from raw/bronze landing through silver and gold to platinum/consumption-ready layers — ensuring reliability, performance, and timeliness.
Ingest structured, semi-structured, and unstructured data from a wide range of source systems using custom streaming pipelines (Spark Structured Streaming, Auto Loader), native and partner connectors (Lakeflow Connect, JDBC/CDC), Databricks Marketplace, and Delta Sharing, following proper governance.
Build and manage Unity Catalog assets (catalogs, schemas, tables, and Volumes) and configure or mount external locations/Volumes for governed access to cloud storage such as ADLS Gen2.
Design and implement audit, control, and operational-metadata tables that capture pipeline run history, data lineage, record counts, data-quality outcomes, and exception handling, enabling end-to-end observability, reconciliation, and auditability.
Define and apply data governance strategies across the lakehouse — including access control, data classification, PII handling, lineage, retention, and data-quality frameworks — leveraging Unity Catalog and aligned to enterprise standards.
Develop reusable, performant Delta Lake data models optimized for downstream AI/ML, analytics, and reporting consumption, applying techniques such as partitioning, Z-ordering / liquid clustering, and OPTIMIZE.
Prepare and structure analytics- and AI-ready datasets with upstream AI/ML activities in mind — supporting feature engineering, feature stores, embeddings/vector data, and ML-ready data products for the organization’s AI initiatives.
Enable cross-cloud data movement and sharing — migrating or sharing data from other cloud platforms (e.g., AWS, GCP) and cloud data warehouses into the Azure Databricks lakehouse, including via Delta Sharing.
Troubleshoot and resolve data pipeline and data-quality issues to ensure consistent, dependable data delivery, and partner with BI, analytics, and business teams to deliver curated, analytics-ready datasets for Power BI and Tableau.
Orchestrate and automate workflows using Databricks Workflows/Jobs and Lakeflow Declarative Pipelines (DLT), with CI/CD via Git and Databricks Asset Bundles; contribute to data engineering standards, documentation, code reviews, testing, and continuous improvement.
Deliver projects using Agile or hybrid methodologies, manage tasks independently, provide accurate status updates, and proactively identify risks or dependencies.

Qualifications:

Bachelor’s degree required, preferably in Information Technology, Computer Science, Data Science, Analytics, or Statistics.
5–8 years of hands-on experience in data engineering, data integration, or analytics engineering, including substantial time delivering on Databricks.
Deep, hands-on experience with Azure Databricks and its critical components, including workspaces, clusters/SQL warehouses, Unity Catalog, and ADLS Gen2 integration.
Strong proficiency in Spark/PySpark and SQL, including the ability to develop complex queries and perform advanced data transformations.
Proven experience building medallion-architecture pipelines (bronze → silver → gold → platinum) on Delta Lake.
Hands-on experience ingesting structured, semi-structured, and unstructured data via custom streams (Structured Streaming / Auto Loader), connectors, Databricks Marketplace, and Delta Sharing.
Practical experience with Unity Catalog governance, including access control, lineage, data classification, and Volumes/external locations.
Experience designing audit/control and operational-metadata tables and data-quality frameworks for pipeline observability and reconciliation.
Experience migrating or sharing data across cloud platforms and modern data ecosystems (e.g., AWS, GCP, Snowflake, Redshift, BigQuery) into a Databricks lakehouse.
Familiarity with data orchestration (Databricks Workflows/Jobs, DLT) and CI/CD practices (Git, Databricks Asset Bundles) within data engineering environments.
Solid understanding of data governance, data-quality frameworks, and metric standardization, with experience supporting AI, advanced analytics, or automation initiatives, coupled with strong analytical and problem-solving skills.

Preferred / Bonus Qualifications:

Experience handling tax, assurance, and financial data, including familiarity with the sensitivity, confidentiality, and regulatory considerations associated with such data.
Exposure to AI/ML enablement on Databricks (e.g., MLflow, feature stores, Mosaic AI, Vector Search) and an understanding of how curated data feeds these workloads.
Hands-on experience with Power BI and/or Tableau data modeling and semantic-layer design for downstream consumption.
Relevant Databricks certification (e.g., Databricks Certified Data Engineer Professional).