Member of Technical Staff - ML Systems & Inference

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.

About the role

At Gimlet, we believe every hire changes the company.

As a Series A company, talent density matters more than headcount. The engineers we hire today will shape the systems, culture, and standards that define Gimlet for years to come.

The future of AI infrastructure will not be built on a single hardware platform. It will be built on systems capable of intelligently orchestrating increasingly heterogeneous compute at unprecedented scale.

Inference sits at the center of that challenge.

This role is an opportunity to help build the systems that determine how modern AI workloads are executed in production.

You will work at the intersection of model architecture, runtime behavior, scheduling, memory management, and system performance to ensure inference is fast, predictable, and scalable.

This is not a traditional machine learning role.

We are not training models or tuning benchmarks in isolation.

We are building the systems that determine how AI workloads are served, optimized, and executed across the next generation of AI infrastructure.

What success looks like

In the first 12-18 months, you will help:

  • Build and optimize inference systems that improve latency, throughput, and efficiency for production AI workloads

  • Design execution strategies that intelligently balance batching, scheduling, concurrency, and resource utilization

  • Improve KV cache management, memory efficiency, and execution behavior across large-scale serving environments

  • Enable new model architectures and inference techniques to run efficiently in production

  • Partner with compiler, kernel, networking, and distributed systems engineers to drive end-to-end performance improvements

  • Influence the architecture of a platform that will help define how AI workloads are deployed over the next decade

You may be a good fit if

  • Strong software engineering fundamentals

  • Experience building or operating ML inference or model serving systems

  • Comfort reasoning about performance, memory usage, and system behavior under load

Strong candidates may also have

  • Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems

  • Deep understanding of modern model architectures and attention mechanisms

  • Experience with batching, scheduling, and concurrency control in inference systems

  • Familiarity with KV cache management and memory placement strategies

  • Experience profiling and tuning latency- and throughput-critical systems

  • Software development experience in Python and C++

What Makes Gimlet Different

Most AI infrastructure companies are focused on deploying more compute.

We are focused on making increasingly diverse compute work together.

We are not building another cloud platform.

We are building the orchestration layer for the future of AI infrastructure.

We believe the next decade of AI will be defined not only by better models and better hardware, but by the systems that determine how those models are executed in production.

The inference, runtime, and orchestration systems we build today will help define how AI workloads are deployed for years to come.

As an early member of our team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

Similar jobs