Lead Software Engineer, Model Serving Platform (AI Infrastructure)

$230K - $300K San Francisco, CA, US Senior MLOps Engineer

Interested in this MLOps Engineer role at Forecareer?

Apply Now →

Skills & Technologies

KubernetesMultimodalPython

About This Role

Overview

A well-funded AI infrastructure company is building a next-generation model serving platform to power real-time, multimodal foundation models at scale. The team is looking for a Lead Software Engineer to help architect, build, and lead the core systems that enable low-latency, high-throughput AI inference in production.

This is a senior, hands-on leadership role at the center of the AI stack, spanning GPU execution, distributed inference, scheduling, and developer-facing APIs. You’ll both build critical components yourself and guide other engineers, shaping technical direction, standards, and execution quality.

What You’ll Do

  • Technical Leadership:
Own the architecture and technical direction of the model serving platform, guiding design decisions and execution across the team.
  • Core Serving Systems:
Build high-performance inference systems including execution runtimes, batching, scheduling, and distributed serving.
  • Performance-Critical Engineering:
Develop optimized components in C++ and CUDA/HIP, including memory-efficient runtimes and custom GPU kernels where needed.
  • Research-to-Production:
Partner closely with ML researchers to productionize new multimodal models with strict latency, reliability, and scalability requirements.
  • APIs & Services:
Build Python APIs and backend services that expose model capabilities to downstream products and applications.
  • Mentorship & Quality:
Mentor engineers through code reviews, design discussions, and hands-on technical guidance while driving engineering best practices.
  • Observability & Reliability:
Lead profiling, benchmarking, monitoring, and troubleshooting across GPU, runtime, and service layers.

Ideal Candidate Profile

  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience building scalable backend systems or distributed infrastructure
  • Strong understanding of LLM inference mechanics (prefill vs. decode, batching strategies, KV cache)
  • Experience with Kubernetes, Ray, and containerized systems
  • Strong proficiency in C++ and Python
  • Deep debugging, profiling, and performance optimization skills at the systems level
  • Ability to translate research or runtime requirements into production-grade systems
  • Strong communication skills and experience leading technical discussions and mentoring engineers
  • Comfortable working on-site in a fast-moving, high-ownership environment

Nice to Have

  • Experience with ML systems engineering or distributed GPU scheduling
  • Familiarity with open-source inference engines (e.g., vLLM, SGLang, TRT-LLM)
  • Experience building large-scale ML or MLOps infrastructure
  • Proficiency with CUDA or ROCm and GPU profiling tools
  • Background at an AI/ML startup, research lab, or large-scale infrastructure team
  • Familiarity with multimodal models or efficient inference techniques
  • Contributions to open-source ML, systems, or HPC infrastructure

Job Type: Full-time

Pay: $230,000.00 - $300,000.00 per year

Benefits:

  • 401(k)
  • Dental insurance
  • Flexible schedule
  • Health insurance
  • Paid time off
  • Relocation assistance
  • Vision insurance

Work Location: In person

Salary Context

This $230K-$300K range is above the 75th percentile for MLOps Engineer roles in our dataset (median: $201K across 79 roles with salary data).

View full MLOps Engineer salary data →

Role Details

Company Forecareer
Title Lead Software Engineer, Model Serving Platform (AI Infrastructure)
Location San Francisco, CA, US
Category MLOps Engineer
Experience Senior
Salary $230K - $300K
Remote No

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.