Member of Technical Staff - Kernels & GPU Performance

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.

About the role

At Gimlet, we believe every hire changes the company.

As a Series A company, talent density matters more than headcount. The engineers we hire today will shape the systems, culture, and standards that define Gimlet for years to come.

The future of AI infrastructure will not be built on a single hardware platform. It will be built on software capable of extracting maximum performance from increasingly diverse compute architectures.

Kernel engineers sit at the center of that challenge.

This role is an opportunity to help build the execution layer that transforms theoretical hardware performance into production reality.

You will work close to accelerators and execution hardware, designing, optimizing, and validating kernels that power large-scale AI workloads across both established and emerging architectures.

This is not a traditional GPU optimization role.

We are building systems that must operate efficiently across heterogeneous hardware environments, where performance, efficiency, and correctness directly influence the economics of AI infrastructure.

What success looks like

In the first 12-18 months, you will help:

  • Build and optimize kernels that improve latency, throughput, and hardware utilization for production AI workloads

  • Develop execution strategies that unlock performance across both established and emerging accelerator architectures

  • Improve memory efficiency, scheduling behavior, and execution characteristics across the inference stack

  • Partner with compiler, runtime, and distributed systems engineers to ensure end-to-end performance optimization

  • Influence how heterogeneous hardware is deployed and utilized within the next generation of AI infrastructure

  • Help establish performance engineering standards that shape the future of Gimlet's execution platform

You may be a good fit if

  • Strong software engineering fundamentals

  • Experience working on performance-critical systems close to hardware

  • Comfort reasoning about low-level execution behavior, memory hierarchies, and performance tradeoffs

Strong candidates may also have

  • Experience with CUDA, Triton, CUTLASS, or other accelerator programming models

  • Deep understanding of GPU execution models (warps/wavefronts, blocks, grids)

  • Experience optimizing memory access patterns (coalescing, shared memory, cache behavior)

  • Familiarity with occupancy, latency hiding, and instruction-level parallelism

  • Experience using profiling and performance analysis tools

  • Familiarity with multi-GPU or distributed execution is a plus

What Makes Gimlet Different

Most AI infrastructure companies focus on acquiring more compute.

We are focused on making increasingly diverse compute work together.

We are not building another cloud platform.

We are building the orchestration layer for the future of AI infrastructure.

We believe the next decade of AI will be defined not only by new hardware architectures, but by the software systems that determine how effectively that hardware is utilized.

The kernels, runtimes, and execution systems we build today will influence how AI workloads run across datacenters for years to come.

As an early member of our team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

Similar jobs