Databricks: Staff Software Engineer, AI Runtime

Interested in this AI Software Engineer role at Databricks?

Apply Now →

Skills & Technologies

MlflowPytorch

About This Role

AI job market dashboard showing open roles by category

P\-1930

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business.

Training and customizing state\-of\-the\-art AI models is one of the most demanding workloads in computing, and it sits at the heart of Databricks' Mosaic AI mission. AI Runtime (AIR) is our managed platform for large\-scale GPU training and fine\-tuning. It gives customers on\-demand access to fleets of the latest accelerators and a serverless experience that hides the complexity of provisioning, scheduling, and orchestrating multi\-node jobs, with the resilience to keep training running for days or weeks across thousands of GPUs. AIR powers the full spectrum of custom training, from fine\-tuning open models to pre\-training frontier\-scale foundation models, for some of the most sophisticated AI teams in the world.

As a Staff Software Engineer for AI Runtime, you will play a critical role in building and scaling the systems that make large\-scale training fast, reliable, and effortless. You will drive the architecture and evolution of the managed GPU training stack, spanning scheduling and capacity, distributed training performance, fault tolerance, and the developer experience of launching and operating jobs at scale. Beyond hands\-on contributions to core systems, you will help define the long\-term technical vision for AIR, mentor senior engineers, partner across product, research, and platform teams, and lead the initiatives that expand the technical and business impact of custom training at Databricks.

The impact you will have:

Drive the architecture and evolution of AIR's managed GPU training platform, delivering scalable, high\-throughput, and resilient training across fleets that span thousands of accelerators.
Solve the hardest problems in large\-scale training, including multi\-node orchestration, distributed parallelism strategies, GPU scheduling and dynamic routing, high\-throughput data loading, and checkpoint and restore for very long\-running jobs.
Push GPU efficiency and training performance, raising utilization (such as model FLOPs utilization and end\-to\-end throughput) and lowering cost per training run across diverse model architectures and hardware generations.
Build the resilience and observability foundations that keep multi\-node jobs healthy, detecting and recovering from hardware and software failures with minimal disruption to customers.
Partner with product, research, and platform teams to shape the APIs, CLI, and developer experience that make it easy to launch, monitor, and debug production training jobs.
Lead end\-to\-end engineering efforts, from design through production rollout, holding a high bar for performance, correctness, and reliability.
Make direct, high\-impact contributions to the core systems behind AIR, and help bring up support for the latest accelerators and new regions as the fleet grows.
Champion engineering excellence, mentor other engineers through design reviews and technical discussions, and help shape Databricks' long\-term technical direction in AI training infrastructure.

What we look for:

10\+ years of experience building and operating large\-scale distributed systems, with significant depth in GPU training infrastructure, high\-performance computing, or ML systems.
Hands\-on experience with distributed training frameworks (such as PyTorch, FSDP, DeepSpeed, or Megatron) and the parallelism strategies (data, tensor, pipeline, and sequence parallelism) used to train large models.
Strong understanding of training resilience patterns, including checkpointing, failure detection, and automatic recovery for long\-running, multi\-node jobs.
Solid grasp of GPU performance fundamentals, including accelerator architecture, high\-speed interconnects (such as NVLink and InfiniBand or RoCE), collective communication, and the bottlenecks that govern training throughput and utilization.
Experience building and operating managed, multi\-tenant platform products in the cloud, with clear SLAs and SLOs for availability, performance, and reliability.
Strong foundation in algorithms, data structures, and system design as applied to performance\-sensitive, large\-scale distributed systems.
Proven ability to deliver technically complex, high\-impact initiatives that create clear customer or business value.
Strong communication skills and the ability to collaborate across product, research, and infrastructure teams in a fast\-moving environment.
Strategic, product\-oriented mindset with the ability to align technical execution to a long\-term vision, and a passion for mentoring engineers and fostering technical excellence.
BS in Computer Science or a related field (MS or PhD preferred).

Pay Range Transparency

Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non\-commissionable roles or on\-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job\-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above.

Local Pay Range

$190,000—$265,000 USD

About Databricks

Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.

Benefits

At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees.

Our Commitment to Diversity and Inclusion

At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio\-economic status, veteran status, and other protected characteristics.

Compliance

If access to export\-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.

Salary Context

This $190K-$265K range is above the 75th percentile for AI Software Engineer roles in our dataset (median: $190K across 219 roles with salary data).

Role Details

Company Databricks

Title Staff Software Engineer, AI Runtime

Location Mountain View, CA, US

Category AI Software Engineer

Experience Senior

Salary $190K - $265K

Remote No

About This Role

AI Software Engineers build the applications and systems that AI models run inside. They own the API layers, data pipelines, frontend integrations, and infrastructure that turn a model into a product users interact with. Every AI company needs engineers who can build the software around the AI.

The challenge is building reliable systems around inherently unreliable components. Models are probabilistic. They'll give different answers to the same question. They hallucinate. They're slow. They're expensive. Your job is to build an application layer that handles all of this gracefully while delivering a product that users trust and enjoy.

Across the 3,823 AI roles we're tracking, AI Software Engineer positions make up 7% of the market. At Databricks, this role fits into their broader AI and engineering organization.

AI Software Engineer roles are among the most numerous in the AI job market. Every company deploying AI needs software engineers who understand AI integration patterns. The demand is broad, spanning startups to enterprises, across every industry adopting AI capabilities.

What the Work Looks Like

A typical week includes: building API endpoints that serve model inference with caching and fallback logic, designing the data pipeline that feeds context to a RAG system, implementing streaming responses in the frontend, debugging a race condition in the async inference pipeline, and optimizing database queries for the vector search layer. It's full-stack engineering with AI at the center.

Skills Required

Mlflow (4% of roles) Pytorch (16% of roles)

Full-stack engineering skills with AI integration experience. Python and TypeScript are the most common requirements. You'll need to understand API design, database architecture, and how to build reliable systems around probabilistic outputs. Experience with streaming, async processing, and caching patterns is increasingly important as real-time AI applications proliferate.

Knowledge of vector databases, embedding APIs, and LLM integration patterns (function calling, structured outputs, retry logic) differentiates AI software engineers from general software engineers. Understanding cost optimization (caching strategies, model routing, batched inference) is valuable since inference costs can dominate application economics.

Strong postings describe the product you'll be building, the AI integration patterns you'll work with, and the scale requirements. Look for companies that have existing AI features and need engineers to improve and expand them, not companies that are 'planning to add AI' someday.

Compensation Benchmarks

AI Software Engineer roles pay a median of $232,000 based on 797 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. Disclosed range: $190K to $265K.

Across all AI roles, the market median is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($275,000) and AI Safety ($274,200). By seniority level: Entry: $97,880; Mid: $165,000; Senior: $227,400; Director: $247,800; VP: $250,000.

Databricks AI Hiring

Databricks has 21 open AI roles right now. They're hiring across AI/ML Engineer, AI Software Engineer, Research Scientist, AI Product Manager. Positions span MD, US, Mountain View, CA, US, US. Compensation range: $225K - $360K.

Location Context

Across all AI roles, 15% (590 positions) offer remote work, while 3,217 require on-site attendance. Top AI hiring metros: New York (2,643 roles, $211,000 median); San Francisco (2,168 roles, $253,000 median); Los Angeles (1,792 roles, $191,580 median).

Career Path

Common paths into AI Software Engineer roles include Software Engineer, Full-Stack Developer, Backend Engineer.

From here, career progression typically leads toward Staff Engineer, AI Architect, Engineering Manager.

If you're a software engineer, you're already 80% there. Learn the AI integration patterns: RAG, streaming inference, function calling, structured outputs. Build a project that demonstrates you can wrap an AI model in a production-quality application with proper error handling, caching, and user experience. That's the portfolio piece that gets you hired.

What to Expect in Interviews

Technical screens look like standard software engineering interviews with an AI twist. Expect system design questions about building reliable applications around probabilistic models: handling streaming responses, implementing retry logic for API failures, and designing caching strategies for LLM outputs. Coding rounds test standard algorithms plus practical integration patterns like async processing and rate limiting.

When evaluating opportunities: Strong postings describe the product you'll be building, the AI integration patterns you'll work with, and the scale requirements. Look for companies that have existing AI features and need engineers to improve and expand them, not companies that are 'planning to add AI' someday.

AI Hiring Overview

The AI job market has 3,823 open positions tracked in our dataset. By seniority: 112 entry-level, 1,798 mid-level, 1,516 senior, and 397 leadership roles (Director, VP, C-Level). Remote roles make up 15% of the market (590 positions). The remaining 3,217 roles require on-site or hybrid attendance.

The market median for AI roles is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($275,000 median, 41 roles); AI Safety ($274,200 median, 55 roles); Research Engineer ($260,000 median, 434 roles).

The AI Job Market Today

The AI job market spans 3,823 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,629), Data Scientist (322), AI Software Engineer (279). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.

The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (112) are outnumbered by mid-level (1,798) and senior (1,516) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 397 positions, representing the bottleneck between technical execution and organizational strategy.

Remote work availability sits at 15% of all AI roles (590 positions), with 3,217 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.

AI compensation is structured in clear tiers. The market median sits at $200,100. Top-quartile roles start at $253,500, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.

Category matters for compensation. AI Engineering Manager roles lead at $275,000 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.

The most in-demand skills across all AI postings: Python (1,979 postings), Aws (1,190 postings), Azure (899 postings), Rag (839 postings), Gcp (726 postings), Pytorch (595 postings), Prompt Engineering (595 postings), Claude (540 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.

Frequently Asked Questions

Based on 797 roles with disclosed compensation, the median salary for AI Software Engineer positions is $232,000. Actual compensation varies by seniority, location, and company stage.

About 15% of the 3,823 AI roles we track offer remote work. Remote availability varies by company and seniority level, with senior and leadership roles more likely to offer location flexibility.

Databricks is among the companies actively hiring for AI and ML talent. Check our company profiles for detailed breakdowns of open roles, salary ranges, and hiring trends.

Common next steps from AI Software Engineer positions include Staff Engineer, AI Architect, Engineering Manager. Progression depends on whether you lean toward technical depth, people management, or product strategy.

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.

Skills & Technologies

About This Role

The impact you will have:

What we look for:

Pay Range Transparency

About Databricks

Benefits

Our Commitment to Diversity and Inclusion

Compliance

If access to export\-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.

Salary Context

Role Details

About This Role

What the Work Looks Like

Skills Required

Compensation Benchmarks

Databricks AI Hiring

Location Context

Career Path

What to Expect in Interviews

AI Hiring Overview

The AI Job Market Today

Frequently Asked Questions

Get Weekly AI Career Intelligence

Related AI Jobs