Accelerator Compiler Lead
You will own the compiler and model lowering stack for the AI accelerator, including graph import, operator lowering, IR, graph transformations, quantization integration, code generation, graph partitioning, and compiler diagnostics. You will implement architecture and performance features, define executable artifacts, and mentor a team of compiler and ML systems engineers.
Responsibilities
- Lead architecture and development of the AI accelerator compiler stack.
- Own model ingestion and graph lowering from frameworks and exchange formats such as PyTorch export flows, ONNX, TensorFlow Lite, or similar.
- Define operator coverage strategy, lowering rules, graph transformations, fusion, partitioning, and fallback behavior.
- Develop compiler optimization passes for tensor layout, tiling, memory movement, mixed precision, operator fusion, and hardware-specific scheduling.
- Work closely with accelerator runtime and driver teams to define executable artifact formats, metadata, memory planning requirements, profiling hooks, and runtime constraints.
- Partner with hardware architecture and NPU firmware teams on ISA, command streams, tensor layouts, data movement, hardware constraints, and compiler-visible performance features.
- Own quantization compiler integration, including calibration metadata, precision selection, scale handling, layout constraints, and accuracy/performance tradeoffs.
- Build compiler diagnostics that help customers understand unsupported operators, shape constraints, graph rewrites, quantization issues, and performance bottlenecks.
- Establish compiler verification and regression strategy for graph transformations, IR lowering, numerical behavior, model accuracy, and performance.
- Hire, mentor, and lead a team of compiler and ML systems engineers.
Requirements
- Deep experience with compiler development, ML graph compilers, or code generation for accelerators, GPUs, DSPs, or heterogeneous compute systems.
- Strong understanding of ML model formats, graph IRs, operator lowering, tensor layouts, quantization, and runtime/compiler interfaces.
- Strong C++ and Python programming skills and experience building production-quality compiler or systems software.
- Experience with compiler frameworks or technologies such as MLIR, LLVM, TVM, XLA, IREE, Glow, TensorRT-like systems, OpenVINO-like systems, or equivalent.
- Strong understanding of correctness risks in compiler optimizations, graph rewrites, mixed precision, operator fusion, and hardware-specific lowering.
- Ability to work closely with hardware architects, firmware engineers, runtime engineers, model-integration teams, and SQA.
- Experience leading technical teams or major architecture areas.
- Experience with NPU, GPU, DSP, or AI accelerator compiler stacks.
- Experience with quantization-aware compilation, mixed precision, sparsity, pruning, graph partitioning, or hardware-specific scheduling.
- Experience supporting ONNX, PyTorch export, TensorFlow Lite, JAX/XLA, TorchDynamo/TorchInductor, or other model import flows.
- Familiarity with robotics, computer vision, CNNs, transformers, detection, segmentation, depth, SLAM-adjacent perception, or edge AI workloads.
- Experience building customer-facing compiler diagnostics and model-porting tools.
- Experience with model-zoo release processes, accuracy validation, and reproducible benchmark artifacts.
- Open-source compiler contributions or experience working with external framework communities.
Benefits
- Medical dental and vision coverage
- Paid time off
- Flexible work arrangements
- Professional development opportunities
- Equity participation
- Other benefits designed to support the well being and growth of our team