Tech Lead, ML Infrastructure

We are seeking an experienced Tech Lead to grow out and lead a team of MLOps engineers, data scientists and annotation tooling developers who will be building the backbone of all Machine Learning applied research at the organization. On the one hand, data annotation, validation, split management, curation, and database management for a web-hosted game development platform. On the other, training orchestration, queue management, load balancing, failure recovery, observability.

Key Responsibilities

  • Infrastructure buildout: Steward the end-to-end planning, execution, and delivery of the data and training infrastructure for the organization.
  • Cross-Functional Coordination: Act as the primary point of contact for the team that serves infrastructure to multiple other machine learning teams.
  • Cost Management: Own the cloud spend and implement cost-tracking, resource allocation and lifecycle management.
  • Developer Experience: Service mindset in making the day-to-day of ML engineers smooth, balancing engineering rigor with ease of use.
  • Undergraduate degree in Computer Science.
  • Proven experience as a Tech Lead in an AI/ML infrastructure team.
  • Prior experience in industries with complex multi-disciplinary teams such as robotics, smart grids, precision agriculture, game development or aerospace.
  • Prior experience having a team of around 7 direct reports, establishing ways of working, and developing them to be high performing.
  • High attention to detail and conscientiousness.
  • Ability to translate customer requests and turn them into actionable technical requirement documents in collaboration with ML engineers.
  • Fluency with the entire machine learning lifecycle, including storage orchestration, data provenance, distributed training orchestration and deployment.
  • Familiarity with Python, Git and the Unix shell.
  • Familiarity with collaborative tools such as Jira/Confluence, Slack, a Git server, a data platform, and observability dashboard.

Nice to Have

  • Graduate degree in Computer Science with a focus on distributed systems, operating systems and virtualization.
  • Experience with Adaptive or Lean methodologies and a DevOps culture.
  • Familiarity with AWS and ETL orchestration.

This position is eligible for company sponsored benefits, including medical, dental and vision insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks. Learn more about the benefits offered by NBCUniversal by visiting the Benefits page of the Careers website. Salary range: $245,000-$270,000 (bonus eligible).

As part of our selection process, external candidates may be required to attend an in-person interview with an NBCUniversal employee at one of our locations prior to a hiring decision. NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law.

If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access nbcunicareers.com as a result of your disability. You can request reasonable accommodations by emailing AccessibilitySupport@nbcuni.com.

For LA County and City Residents Only: NBCUniversal will consider for employment qualified applicants with criminal histories, or arrest or conviction records, in a manner consistent with relevant legal requirements, including the City of Los Angeles' Fair Chance Initiative For Hiring Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, where applicable.