Machine Learning Engineer I - Large Language Models - AI & Human Health Research

We are seeking a skilled LLM Engineer I to join our team in the SinAI Assurance Lab. The LLM Engineer will play a key role in designing, building, and deploying large language model (LLM) applications including retrieval-augmented generation (RAG) systems, agentic platforms, and clinical chatbots and will be responsible for designing, maintaining, and optimizing data infrastructure and model validation pipelines that ensure all AI systems, Generative and Non-Generative, deployed across the Mount Sinai Health System (MSHS) are rigorously validated for compliance, performance, and patient safety.

You will work closely with AI product teams, clinical and technical stakeholders, DevOps engineers, and the AI Governance Committee to engineer scalable data flows that support model validation, real-time monitoring, and Machine Learning Engineer I will be primarily responsible for contributing to the development and enhancement of machine learning applications and systems. They will work closely with other engineers and data scientists to design and implement scalable and efficient machine learning systems.

General Data Engineering

Build and maintain robust ETL pipelines for structured and unstructured clinical data from EHR, imaging, and text sources.
Design systems to automate data preparation, lineage tracking, and reproducibility for AI model inputs and outputs.
Develop data infrastructure for benchmarking and stress-testing models in clinical simulation environments.
Collaborate with DevOps and cloud teams to ensure deployment pipelines meet compliance and performance standards.
Set up and monitor model tracking infrastructure for evaluation metrics and drift detection.
Assist in the development of standards and procedures affecting data management, design and maintenance. Documents all standards and procedures.

AI Assurance & Governance

Engineer and maintain pipelines that support pre-deployment model validation and post-deployment monitoring.
Collaborate with Data Scientists and Clinical Product Owners to validate data integrity, reproducibility, and fairness in AI workflows.
Ensure compliance with HIPAA, ethical guidelines, and institutional governance policies on sensitive health data use.
Build dashboards and tools that provide observability across the ML lifecycle: data, models, outcomes.

LLM Engineering & Applied Generative AI

Design, build, and deploy LLM-powered applications including clinical chatbots, copilots, and decision-support tools for end-users across MSHS.
Develop retrieval-augmented generation (RAG) pipelines that integrate vector databases with clinical knowledge sources, EHR data, and institutional documents.
Build agentic platforms and multi-agent workflows using frameworks such as LangChain, LlamaIndex, LangGraph, CrewAI, or equivalent, including tool use, function calling, and orchestration logic.
Operationalize LLM deployment, including inference optimization, latency and cost tuning, model serving (e.g., Azure), and integration of safety guardrails.
Implement prompt engineering, prompt versioning, and structured prompt-evaluation workflows across model providers and versions.
Fine-tune and adapt foundation models (full fine-tuning, LoRA/PEFT, instruction tuning) to clinical and operational use cases where appropriate.
Build LLM evaluation harnesses covering accuracy, hallucination, safety, bias, sycophancy, and clinical appropriateness, with red-teaming and stress-testing of deployed systems.

Stakeholder Engagement & Others

Effectively communicate technical findings related to model and data integrity to governance teams, clinical stakeholders, and leadership.
Maintain clear and well-organized documentation of data workflows, platform architecture, and validation processes.
Help write internal reports on data infrastructure resilience, validation system status, and operational risk.
Stay informed on industry best practices in data engineering and healthcare-focused machine learning.
Possess an extremely flexible attitude. Willing to work with multiple types of technologies and languages with an open mind and without technology bia
Continuous interest in updating skill sets and knowledge of trends in the Big Data Technology space.
Work closely with cross-functional teams including data scientists, healthcare providers, and IT professionals to understand data requirements, develop solutions, and support data-driven decision-making.
Other duties as assigned

Requirements

Bachelor's degree in Computer Science, Statistics, Mathematics, or related field.
Knowledge of at least one programming language among Scala, Python, Java, C, or C++.
Knowledge of big data technologies (e.g., Hadoop, Spark)
Knowledge of Software Development Lifecycle.
Self-motivated with a demonstrated ability to work independently, and to exercise independent judgment in developing complex techniques or programs in a dynamic environment.
- Act as the major contributor in the development and operationalization of four different applications.
- Play a key technical role in maintaining deployed products
Understanding of machine learning algorithms (Supervised, Unsupervised ML algorithms).
Familiarity with SQL or other database languages.

Preferred:

Master's degree in a quantitative discipline (e.g., Statistics, Operations Research, Bioinformatics, Economics, Computational Biology, Computer Science, Information Technology, Mathematics, Physics) or equivalent practical experience
2+ years of experience in data engineering, software engineering, or machine learning.
Proficient in Python and SQL
Proficiency in at least one cloud computing platforms (e.g., AWS, Azure, GCP)
Intermediate knowledge of Machine Learning
Familiarity with ML lifecycle management tools (e.g., MLflow, Kubeflow, Airflow)
Experience on deployment and operationalization of ML Systems
Experience with monitoring tools for AI model tracking
Understanding of DevOps principles, CI/CD pipelines, and containerization (e.g., Docker, Kubernetes)
Experience with version control systems (e.g., Git) Knowledge of big data technologies (e.g., Hadoop, Spark)
Hands-on experience building and deploying LLM-based applications in production (chatbots, copilots, summarization, Q&A, or decision-support tools).
Experience designing and implementing retrieval-augmented generation (RAG) architectures, including chunking strategies, embedding models, and vector databases (e.g., Pinecone, Weaviate, FAISS, pgvector, Milvus).
Experience with agentic frameworks and orchestration libraries (e.g., LangChain, LlamaIndex, LangGraph, CrewAI, AutoGen, Semantic Kernel) including tool/function calling and multi-agent workflows.
Experience building conversational AI / chatbot systems, including dialog state management, memory, and integration with enterprise systems.
Familiarity with foundation model APIs and SDKs (e.g., OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock) and open-weight model families (e.g., Llama, Mistral, Qwen, Gemma).
Working knowledge of prompt engineering, prompt evaluation, and LLM observability/evaluation tooling (e.g., LangSmith, Langfuse, Arize, Ragas, DeepEval).
Familiarity with fine-tuning and model adaptation techniques (e.g., supervised fine-tuning, LoRA/QLoRA, PEFT, instruction tuning, RLHF/DPO) and serving stacks (e.g., vLLM, TGI, Triton).
Awareness of LLM safety, guardrails, and evaluation practices (hallucination, bias, sycophancy, jailbreak resistance) — experience with healthcare-specific evaluation is a plus.
Strong problem-solving skills and ability to work in cross-functional teams