GPU Kernel Engineer (AI Infrastructure)

$190K - $250K San Francisco, CA, US Mid Level MLOps Engineer

Interested in this MLOps Engineer role at Forecareer?

Apply Now →

Skills & Technologies

JAXMultimodalPyTorchPython

About This Role

Overview

A well-funded AI infrastructure company is building next-generation multimodal foundation models and a highly optimized training and serving platform. The team is looking for a GPU Kernel Engineer to push the limits of performance on modern accelerators and help power large-scale AI systems.

This role sits at the intersection of GPU programming, systems engineering, and cutting-edge AI workloads. You’ll work across the hardware–software stack, from low-level kernel development to integrating optimized operations into production ML frameworks used for training and inference at scale.

What You’ll Do

  • Custom Kernel Development:
Design, implement, and optimize high-performance GPU kernels using C++, CUDA, ROCm, PTX, Triton, and/or JAX Pallas.
  • Performance Optimization:
Profile and optimize end-to-end ML workloads, with a focus on large-scale LLM training and inference.
  • Framework Integration:
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and custom runtime systems.
  • Bottleneck Analysis:
Build performance models, identify compute and memory bottlenecks, and deliver kernel-level improvements that meaningfully accelerate AI workloads.
  • Cross-Functional Collaboration:
Work closely with ML researchers, distributed systems engineers, and model-serving teams to optimize performance across the stack.
  • Hardware-Aware Engineering:
Collaborate with hardware vendors and stay current with evolving GPU architectures, compilers, and toolchains.
  • Tooling & Reliability:
Contribute to benchmarking, testing, documentation, and tooling to ensure correctness, reproducibility, and sustained performance gains.

Ideal Candidate Profile

  • 5+ years of experience in GPU kernel development, high-performance computing, or systems programming
  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related field
  • Strong programming skills in C++ and Python
  • Deep expertise in CUDA and/or ROCm, GPU memory models, and performance optimization
  • Hands-on experience with Triton and/or JAX Pallas for custom kernel development
  • Strong understanding of PTX, GPU assembly, and low-level execution models
  • Proven experience integrating custom kernels into PyTorch, JAX, or similar ML frameworks
  • Experience working with large-scale LLM workloads (training or inference)

Nice to Have

  • Experience optimizing for AMD GPUs and ROCm
  • Familiarity with JAX FFI and custom ML operator development
  • Experience with efficient inference or serving frameworks (e.g., vLLM, TensorRT)
  • Exposure to TPUs, XLA, or other accelerator programming environments
  • Contributions to open-source ML systems, compilers, or GPU kernel libraries

Benefits

  • Medical, dental, and vision insurance
  • 401(k) plan
  • Daily meals and snacks
  • Flexible time off
  • Competitive compensation and meaningful equity

Job Type: Full-time

Pay: $190,000.00 - $250,000.00 per year

Benefits:

  • 401(k)
  • Dental insurance
  • Health insurance
  • Paid time off
  • Relocation assistance
  • Stock options
  • Vision insurance

Work Location: In person

Salary Context

This $190K-$250K range is above the median for MLOps Engineer roles in our dataset (median: $201K across 79 roles with salary data).

View full MLOps Engineer salary data →

Role Details

Company Forecareer
Title GPU Kernel Engineer (AI Infrastructure)
Location San Francisco, CA, US
Category MLOps Engineer
Experience Mid Level
Salary $190K - $250K
Remote No

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.