CLOUD AMAZE - DW Architect (Practice)

Job Title: Data & AI Solution Architect – Databricks

Location: NY/NJ/PA

Work mode: Hybrid (3 days onsite must)

Duration: FTE

Job Description

This is a Senior role with Hand-on experience at the intersection of distributed data engineering, open lakehouse architecture, and production AI. You will own the Databricks Data Intelligence Platform strategy — designing unified lakehouse architectures across data engineering, analytics, ML, and agentic AI workloads. You will translate complex enterprise data challenges into scalable, governed, and cost-efficient Databricks solutions while influencing technical direction at the executive level and building trusted advisor relationships with engineering and data science leadership.

Key Responsibilities are:

  • Strategic planning and hands-on engineering of Snowflake/Big Data and cloud environments that supports our clients’ advanced analytics and data science initiatives.

  • Provide support in defining the scope and sizing of work

  • Working closely with various enterprise architects Information security teams, Data management team, to ensure the architected solution meets all the needs of a customer, from a functionality perspective and IT solution engineering perspective.

  • Lead designing all aspects of our data solution including artifact creation such as diagrams, playbooks and other technical documents.

  • Translate business requirements into technology solutions

  • Mentor and guide Jr. team members to deliver the solutions on time.

  • Create various architecture blueprints and work with the development team to deliver the vision.

Skills & Experience:

  • Overall, 10-15 years of experience in Solution Architecture, Data Management, Data Lake and Lakehouse design and development.

  • Databricks (expert): Delta Lake, Unity Catalog, Lakeflow / Delta Live Tables, Databricks SQL, Photon, Serverless, Auto Loader, Databricks Apps, Vector Search

  • Apache Spark (expert): PySpark and Scala; internals — DAG execution, shuffle optimisation, memory tuning, adaptive query execution, Structured Streaming

  • AI / ML stack (advanced): MLflow (tracking, registry, serving, tracing), Feature Store, Model Serving, AutoML; production ML lifecycle end-to-end

  • GenAI & agents (proficient): RAG pipeline design, Databricks Agent Bricks and Agent Framework, Vector Search, LangChain, MLflow agent tracing, LLM integration (Claude, GPT)

  • Data engineering (advanced): dbt on Databricks, Lakeflow Jobs, Kafka / Structured Streaming, Fivetran, Airbyte — batch and real-time ingestion at enterprise scale

  • Cloud (advanced in one, working in others): AWS (S3, Glue, EMR, Step Functions), Azure (ADLS Gen2, ADF, Event Hubs), GCP (GCS, Dataflow, BigQuery)

  • Data modelling (advanced): Medallion architecture (Bronze / Silver / Gold), data vault 2.0, Kimball dimensional; open table formats (Delta Lake, Apache Iceberg, Apache Hudi)

  • Security & governance: Unity Catalog RBAC, column masking, row-level security, audit logs, private endpoints, SOX / GDPR / HIPAA compliance patterns

  • DevOps & IaC: Git, CI/CD for Databricks (Databricks Asset Bundles, GitHub Actions), Terraform Databricks provider, Databricks CLI

  • Orchestration: Lakeflow Jobs, Apache Airflow with Databricks operator, Prefect — dependency management, multi-task job design, retry and alerting patterns

Similar jobs