Translational Data Management, Automation, & AI Engineer

Career Category

Clinical

Job Description

Location: Amgen India office, Hyderabad
Employment type: Full-time
Department / Team: Computational Biology team, Precision Medicine

High-level role

We are seeking a hands-on, technically strong Translational Data Management, Automation, & AI Engineer to design, build, and operate robust biomarker and clinical data ingestion pipelines that feed our biomarker platform. You will work closely with computational biologists, translational scientists, data scientists, lab operations, and external vendors/contract research organizations (CROs) to ensure timely, accurate, and standardized ingestion of assay and clinical data for analysis, visualization, and machine-learning use cases supporting clinical trials.

Key responsibilities

Design, implement, test, deploy, and maintain end-to-end data ingestion pipelines that prepare biomarker and clinical data for downstream analytics, visualization, and ML models.

Implement automated data validation, quality control checks, error handling, and remediation workflows to ensure data quality and traceability.

Integrate Codex workflows, agentic automation and generative AI to meet TAT and efficiency goals.

Collaborate with internal biomarker labs and CROs/vendors to onboard new assays; author and maintain data transfer specifications, interface control documents, and acceptance criteria.

Build and maintain harmonization and mapping logic (units, controlled terminology, ontologies) and data models needed to standardize biomarker and clinical datasets.

Generate study-specific analysis bundle per request in defined timeline.

Produce and maintain clear documentation: software specification forms, data definition tables, runbooks, and onboarding guides.

Write clean, tested, maintainable Python code and contribute to CI/CD pipelines, automated testing, and release processes.

Required qualifications

Education & experience

8+ years of experience with Bachelor’s in Computational Biology, Bioinformatics, AI, Computer Science, Data Engineering, or related field. PhD is a plus.

3+ years of experience in data engineering or platform engineering roles; experience working with biomarker/biological/clinical data or in a clinical research environment is highly desirable.

Technical skills

Strong programming skills in Python and database design. Experience with Databricks

Experience with workflow/orchestration tools (e.g., Airflow, Nextflow, snakemake).

Experience with agentic automation and formulation of AI workflow development and deployment, agentic automation tools and Codex workflows.

Familiarity with HPC, cloud platforms and storage (e.g., AWS) and best practices for secure data handling.

Experience with version control (Git), CI/CD, containerization (Docker)

Knowledge of clinical data formats and standards (e.g., CDISC/SDTM/ADaM).

Experience working with clinical labs, biomarker assays (immunoassay, flow cytometry, immunohistochemistry, proteomics, whole genome sequencing, exome sequencing, RNA-seq, methylation, metabolomics)

Familiarity with data standardization and harmonization frameworks, controlled vocabularies

Experience building, testing and debugging R pipelines for production data processing.