Asset Management - AI Systems Engineer – Associate/VP

Key Responsibilities

  • Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
  • GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
  • Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

  • Bachelor’s, Master’s, or Ph.D. in Computer Science, Computer Engineering, or a related field.
  • 3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

  • Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
  • Deep understanding of Linux internals, networking, and distributed systems architecture.
  • Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

  • Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
  • Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
  • Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

  • A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
  • Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

  • Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
  • Experience writing custom CUDA kernels or using Triton for operator fusion.
  • Financial industry (Asset Management/Quant) experience is a plus.
  • Language: Professional working proficiency in English to collaborate with global teams.

Similar jobs