Senior ML Engineer

Experience level: 5 \- 8 Years
<\/div>
Qualification: Postgraduate/ Graduate
<\/div>
Location: Chennai/Pune/Bangalore
<\/div>
At Black buck Insights (BBI), we hire great minds who can embrace technology to innovate and build. We are \nalways on the lookout for individuals who are thrilled by the idea of developing solutions, features, and \nservices while managing ambiguity and super\-paced projects. If this is you, come chart your own path at BBI! \nPosition Summary \nThe Senior Machine Learning Software Engineer is a senior\-level technical contributor responsible for \nleading the development of software infrastructure, tools, and platforms that enable scalable and \nmaintainable machine learning operations. This role plays a critical part in bridging the gap between research \nand production by architecture reliable systems for training, testing, deployment, and monitoring of \nmachine learning models. The Senior Machine Learning Software Engineer ensures AI capabilities are \nproduction\-grade, reliable, and scalable\u2014unlocking innovation across all AI\-driven products. \nIn addition to making significant technical contributions, the Senior MLSE provides mentorship to junior \nengineers and fosters best practices in software quality, MLOps, and automation across the machine \nlearning lifecycle.
<\/div>

<\/div>
Responsibilities:
<\/div>
Infrastructure Design & Development
<\/div>
● Architect, build, and maintain reusable components and tools to support model training, \nevaluation, and deployment at scale.
<\/div>
● Optimize model serving frameworks, feature stores, data pipelines, and CI/CD systems for ML \nworkflows.
<\/div>
● Ensure reliability, observability, and performance across ML systems in production.
<\/div>

<\/div>
Technical Leadership & Execution
<\/div>
● Lead cross\-functional engineering initiatives involving platform stability, experimentation \ninfrastructure, or real\-time inference systems.
<\/div>
● Review code, propose architectural improvements, and uphold software engineering best practices \nwithin the ML engineering team.
<\/div>
● Drive design and implementation of MLOps pipelines, automation, and model governance \nworkflows.
<\/div>

<\/div>
Collaboration with Research & Product Engineering
<\/div>
● Work closely with ML researchers to produce experimental models, ensuring \ncompatibility with existing infrastructure.
<\/div>
● Coordinate with data engineering to integrate pipelines, data validations, and model \ninput/output schemas.
<\/div>
● Contribute to product engineering discussions when ML systems require edge optimization, user\nfacing API integrations, or UI\-linked inference.
<\/div>

<\/div>
Mentorship & Knowledge Sharing
<\/div>
● Mentor ML Software Engineers I and II, with a proven track record of advancing at least one MLSE I \nto MLSE II.
<\/div>
● Contribute to internal documentation, architecture reviews, and engineering learning \nresources.
<\/div>
● Set high standards for code quality, reproducibility, and maintainability across the ML \nengineering discipline.
<\/div><\/span>

Requirements<\/h3>
  • 3+ years building and operating production software systems (ML software/inference platform experience strongly preferred).<\/span>
    <\/span><\/li>
  • Strong Python engineering plus solid Linux/bash debugging skills.<\/span>
    <\/span><\/li>
  • Hands\-on experience with NVIDIA Triton Inference Server (or equivalent model serving platform).<\/span>
    <\/span><\/li>
  • Practical experience in model optimization + deployment pipeline (e.g., ONNX/TensorRT, performance/latency tuning, packaging for production).<\/span>
    <\/span><\/li>
  • Proven experience deploying and operating services on AWS, including ECS, plus Docker/container workflows, S3/ECR, IAM/secrets, and safe rollout/rollback practices.<\/span>
    <\/span><\/li>
  • Experience with CI/CD and artifact/version management for ML software (DVC/MLflow\-equivalent workflows are a plus).<\/span>
    <\/span><\/li>
  • Production reliability mindset: monitoring, incident triage, and staged release safety.<\/span>
    <\/span><\/li>
  • Strong ownership, communication, and demonstrated ability to ramp quickly on missing stack\-specific pieces within a 3\-6 month onboarding window.<\/span>
    <\/span><\/li><\/ul><\/span>

    <\/div><\/span>