CLOUD AMAZE - DW Architect (Practice)
Job Title: Data & AI Solution Architect – Databricks
Location: NY/NJ/PA
Work mode: Hybrid (3 days onsite must)
Duration: FTE
Job Description
This is a Senior role with Hand-on experience at the intersection of distributed data engineering, open lakehouse architecture, and production AI. You will own the Databricks Data Intelligence Platform strategy — designing unified lakehouse architectures across data engineering, analytics, ML, and agentic AI workloads. You will translate complex enterprise data challenges into scalable, governed, and cost-efficient Databricks solutions while influencing technical direction at the executive level and building trusted advisor relationships with engineering and data science leadership.
Key Responsibilities are:
-
Strategic planning and hands-on engineering of Snowflake/Big Data and cloud environments that supports our clients’ advanced analytics and data science initiatives.
-
Provide support in defining the scope and sizing of work
-
Working closely with various enterprise architects Information security teams, Data management team, to ensure the architected solution meets all the needs of a customer, from a functionality perspective and IT solution engineering perspective.
-
Lead designing all aspects of our data solution including artifact creation such as diagrams, playbooks and other technical documents.
-
Translate business requirements into technology solutions
-
Mentor and guide Jr. team members to deliver the solutions on time.
-
Create various architecture blueprints and work with the development team to deliver the vision.
Skills & Experience:
-
Overall, 10-15 years of experience in Solution Architecture, Data Management, Data Lake and Lakehouse design and development.
-
Databricks (expert): Delta Lake, Unity Catalog, Lakeflow / Delta Live Tables, Databricks SQL, Photon, Serverless, Auto Loader, Databricks Apps, Vector Search
-
Apache Spark (expert): PySpark and Scala; internals — DAG execution, shuffle optimisation, memory tuning, adaptive query execution, Structured Streaming
-
AI / ML stack (advanced): MLflow (tracking, registry, serving, tracing), Feature Store, Model Serving, AutoML; production ML lifecycle end-to-end
-
GenAI & agents (proficient): RAG pipeline design, Databricks Agent Bricks and Agent Framework, Vector Search, LangChain, MLflow agent tracing, LLM integration (Claude, GPT)
-
Data engineering (advanced): dbt on Databricks, Lakeflow Jobs, Kafka / Structured Streaming, Fivetran, Airbyte — batch and real-time ingestion at enterprise scale
-
Cloud (advanced in one, working in others): AWS (S3, Glue, EMR, Step Functions), Azure (ADLS Gen2, ADF, Event Hubs), GCP (GCS, Dataflow, BigQuery)
-
Data modelling (advanced): Medallion architecture (Bronze / Silver / Gold), data vault 2.0, Kimball dimensional; open table formats (Delta Lake, Apache Iceberg, Apache Hudi)
-
Security & governance: Unity Catalog RBAC, column masking, row-level security, audit logs, private endpoints, SOX / GDPR / HIPAA compliance patterns
-
DevOps & IaC: Git, CI/CD for Databricks (Databricks Asset Bundles, GitHub Actions), Terraform Databricks provider, Databricks CLI
-
Orchestration: Lakeflow Jobs, Apache Airflow with Databricks operator, Prefect — dependency management, multi-task job design, retry and alerting patterns