Interested in this MLOps Engineer role at Forecareer?

Apply Now →

Skills & Technologies

KubernetesMultimodalPython

About This Role

Overview

A well-funded AI infrastructure company is building a next-generation model serving platform to power real-time, multimodal foundation models at scale. The team is looking for a Lead Software Engineer to help architect, build, and lead the core systems that enable low-latency, high-throughput AI inference in production.

This is a senior, hands-on leadership role at the center of the AI stack, spanning GPU execution, distributed inference, scheduling, and developer-facing APIs. You’ll both build critical components yourself and guide other engineers, shaping technical direction, standards, and execution quality.

What You’ll Do

Technical Leadership: Own the architecture and technical direction of the model serving platform, guiding design decisions and execution across the team.

Core Serving Systems: Build high-performance inference systems including execution runtimes, batching, scheduling, and distributed serving.

Performance-Critical Engineering: Develop optimized components in C++ and CUDA/HIP, including memory-efficient runtimes and custom GPU kernels where needed.

Research-to-Production: Partner closely with ML researchers to productionize new multimodal models with strict latency, reliability, and scalability requirements.

APIs & Services: Build Python APIs and backend services that expose model capabilities to downstream products and applications.

Mentorship & Quality: Mentor engineers through code reviews, design discussions, and hands-on technical guidance while driving engineering best practices.

Observability & Reliability: Lead profiling, benchmarking, monitoring, and troubleshooting across GPU, runtime, and service layers.

Ideal Candidate Profile

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience

5+ years of experience building scalable backend systems or distributed infrastructure

Strong understanding of LLM inference mechanics (prefill vs. decode, batching strategies, KV cache)

Experience with Kubernetes, Ray, and containerized systems

Strong proficiency in C++ and Python

Deep debugging, profiling, and performance optimization skills at the systems level

Ability to translate research or runtime requirements into production-grade systems

Strong communication skills and experience leading technical discussions and mentoring engineers

Comfortable working on-site in a fast-moving, high-ownership environment

Nice to Have

Experience with ML systems engineering or distributed GPU scheduling

Familiarity with open-source inference engines (e.g., vLLM, SGLang, TRT-LLM)

Experience building large-scale ML or MLOps infrastructure

Proficiency with CUDA or ROCm and GPU profiling tools

Background at an AI/ML startup, research lab, or large-scale infrastructure team

Familiarity with multimodal models or efficient inference techniques

Contributions to open-source ML, systems, or HPC infrastructure

Job Type: Full-time

Pay: $230,000.00 - $300,000.00 per year

Benefits:

401(k)
Dental insurance
Flexible schedule
Health insurance
Paid time off
Relocation assistance
Vision insurance

Work Location: In person

Salary Context

This $230K-$300K range is above the 75th percentile for MLOps Engineer roles in our dataset (median: $201K across 79 roles with salary data).

View full MLOps Engineer salary data →

Role Details

Company Forecareer

Title Lead Software Engineer, Model Serving Platform (AI Infrastructure)

Location San Francisco, CA, US

Category MLOps Engineer

Experience Senior

Salary $230K - $300K

Remote No

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.