Interested in this AI Software Engineer role at Microsoft?
Apply Now →Skills & Technologies
About This Role
Overview
The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure \- from cutting\-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.
This role sits at the intersection of LLM inference fleets, serving efficiency, rapid experimentation, cloud infrastructure, and systems software—working closely with CoreAI data plane, compute, and partner teams to deliver end\-to\-end efficiencies and platform capabilities.
In this role, you will have the opportunity to work on multiple levels of the AI software stack, including the fundamental abstractions, programming models, OpenAI and OSS engines runtimes, libraries and application programming interfaces (APIs) to enable large scale inferencing of models.
You will drive production\-grade inference serving improvements for OpenAI and open\-source models across Azure, including benchmarking, performance measurement, and disciplined experimentation to improve latency, throughput, availability, and cost at scale. You will both (1\) make hands\-on engine changes and (2\) contribute to the experimentation capabilities that make those changes measurable, safe to ship, and repeatable across teams.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Responsibilities
As the Principal engineer on the team, your responsibilities include:
- Optimize inference engines for OpenAI and open\-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
- Run experiments end\-to\-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
- Build and use experimentation capabilities for large\-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
- Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi\-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
- Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi\-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
- Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
- Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200\) for state\-of\-the\-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring\-up).
- Partner with networking and storage teams to leverage high\-performance interconnects (e.g., RDMA/InfiniBand\-class fabrics such as RoCE over IB) for distributed inference, without owning low\-level kernel/driver enablement.
- Drive end\-to\-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
- Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
Additional Responsibilities
- Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large\-scale model inference.
- Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements.
- Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization.
- Monitor performance regressions and drive continuous improvements to reduce time\-to\-deploy and hardware footprint.
- Collaborate across engineering teams to deliver scalable, production\-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout.
- Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi\-token prediction and constrained sampling).
Out of Scope (This role does not focus on)
- Novel hardware bring\-up or first\-party silicon enablement (e.g., Microsoft chips) or expanded support for non\-NVIDIA platforms (e.g., AMD).
- Low\-level kernel, driver, or CUDA optimization as a primary responsibility.
- Model pre\-training, fine\-tuning, or model architecture customization.
Qualifications
- Bachelor's Degree in Computer Science or related technical field and 6\+ years technical engineering experience with coding in languages including, but not limited to, C, C\+\+, C\#, Java, JavaScript, Python, or equivalent experience.
Other Requirements:
- Proven ability to design and operate large\-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
- Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
- Strong problem\-solving skills and the ability to debug complex, cross layer systems issues.
- Demonstrated technical leadership, including mentoring engineers, driving cross\-team architectural alignment, and leveraging AI tools and AI\-assisted workflows to accelerate engineering velocity and quality.
- Hands\-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling\-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
- Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Preferred Qualifications:
- Experience optimizing LLM inference in practice (e.g., PyTorch inference, serving runtimes, model execution, or inference orchestration) in production environments.
- Familiarity with high performance networking and low latency communication stacks.
- Familiarity with GPU\-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration).
- Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability.
- Familiarity with distributed inference stacks (e.g., NCCL\-style collectives, model/tensor parallelism) and performance tradeoffs in large\-scale serving.
Impact \& Growth:
- Work on mission critical infrastructure that directly powers largescale AI systems.
- Influence the future of cloud GPU platforms used by internal and external customers.
- Collaborate with experts across OS, hardware, networking, and AI platform teams.
- Opportunity to grow as a technical leader, shaping long term platform strategy.
Software Engineering IC5 \- The typical base pay range for this role across the U.S. is USD $139,900 \- $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 \- $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us\-corporate\-pay
Software Engineering IC6 \- The typical base pay range for this role across the U.S. is USD $163,000 \- $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 \- $331,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us\-corporate\-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process.
Salary Context
This $139K-$331K range is above the 75th percentile for AI Software Engineer roles in our dataset (median: $189K across 518 roles with salary data).
Role Details
About This Role
AI Software Engineers build the applications and systems that AI models run inside. They own the API layers, data pipelines, frontend integrations, and infrastructure that turn a model into a product users interact with. Every AI company needs engineers who can build the software around the AI.
The challenge is building reliable systems around inherently unreliable components. Models are probabilistic. They'll give different answers to the same question. They hallucinate. They're slow. They're expensive. Your job is to build an application layer that handles all of this gracefully while delivering a product that users trust and enjoy.
Across the 26,159 AI roles we're tracking, AI Software Engineer positions make up 2% of the market. At Microsoft, this role fits into their broader AI and engineering organization.
AI Software Engineer roles are among the most numerous in the AI job market. Every company deploying AI needs software engineers who understand AI integration patterns. The demand is broad, spanning startups to enterprises, across every industry adopting AI capabilities.
What the Work Looks Like
A typical week includes: building API endpoints that serve model inference with caching and fallback logic, designing the data pipeline that feeds context to a RAG system, implementing streaming responses in the frontend, debugging a race condition in the async inference pipeline, and optimizing database queries for the vector search layer. It's full-stack engineering with AI at the center.
AI Software Engineer roles are among the most numerous in the AI job market. Every company deploying AI needs software engineers who understand AI integration patterns. The demand is broad, spanning startups to enterprises, across every industry adopting AI capabilities.
Skills Required
Full-stack engineering skills with AI integration experience. Python and TypeScript are the most common requirements. You'll need to understand API design, database architecture, and how to build reliable systems around probabilistic outputs. Experience with streaming, async processing, and caching patterns is increasingly important as real-time AI applications proliferate.
Knowledge of vector databases, embedding APIs, and LLM integration patterns (function calling, structured outputs, retry logic) differentiates AI software engineers from general software engineers. Understanding cost optimization (caching strategies, model routing, batched inference) is valuable since inference costs can dominate application economics.
Strong postings describe the product you'll be building, the AI integration patterns you'll work with, and the scale requirements. Look for companies that have existing AI features and need engineers to improve and expand them, not companies that are 'planning to add AI' someday.
Compensation Benchmarks
AI Software Engineer roles pay a median of $235,100 based on 665 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. Disclosed range: $139K to $331K.
Across all AI roles, the market median is $184,000. Top-quartile compensation starts at $244,000. The 90th percentile reaches $309,400. For comparison, the highest-paying categories include AI Engineering Manager ($293,500) and AI Architect ($292,900). By seniority level: Entry: $76,880; Mid: $131,300; Senior: $227,400; Director: $244,288; VP: $234,620.
Microsoft AI Hiring
Microsoft has 49 open AI roles right now. They're hiring across AI/ML Engineer, AI Software Engineer, AI Product Manager, Data Scientist. Positions span Redmond, WA, US, San Francisco, CA, US, Washington, DC, US. Compensation range: $159K - $331K.
Location Context
Across all AI roles, 7% (1,863 positions) offer remote work, while 24,200 require on-site attendance. Top AI hiring metros: Los Angeles (1,695 roles, $178,000 median); New York (1,670 roles, $200,000 median); San Francisco (1,059 roles, $244,000 median).
Career Path
Common paths into AI Software Engineer roles include Software Engineer, Full-Stack Developer, Backend Engineer.
From here, career progression typically leads toward Staff Engineer, AI Architect, Engineering Manager.
If you're a software engineer, you're already 80% there. Learn the AI integration patterns: RAG, streaming inference, function calling, structured outputs. Build a project that demonstrates you can wrap an AI model in a production-quality application with proper error handling, caching, and user experience. That's the portfolio piece that gets you hired.
What to Expect in Interviews
Technical screens look like standard software engineering interviews with an AI twist. Expect system design questions about building reliable applications around probabilistic models: handling streaming responses, implementing retry logic for API failures, and designing caching strategies for LLM outputs. Coding rounds test standard algorithms plus practical integration patterns like async processing and rate limiting.
When evaluating opportunities: Strong postings describe the product you'll be building, the AI integration patterns you'll work with, and the scale requirements. Look for companies that have existing AI features and need engineers to improve and expand them, not companies that are 'planning to add AI' someday.
AI Hiring Overview
The AI job market has 26,159 open positions tracked in our dataset. By seniority: 2,416 entry-level, 16,247 mid-level, 5,153 senior, and 2,343 leadership roles (Director, VP, C-Level). Remote roles make up 7% of the market (1,863 positions). The remaining 24,200 roles require on-site or hybrid attendance.
The market median for AI roles is $184,000. Top-quartile compensation starts at $244,000. The 90th percentile reaches $309,400. Highest-paying categories: AI Engineering Manager ($293,500 median, 28 roles); AI Architect ($292,900 median, 108 roles); AI Safety ($274,200 median, 19 roles).
AI Software Engineer roles are among the most numerous in the AI job market. Every company deploying AI needs software engineers who understand AI integration patterns. The demand is broad, spanning startups to enterprises, across every industry adopting AI capabilities.
The AI Job Market Today
The AI job market spans 26,159 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (23,752), AI Software Engineer (598), AI Product Manager (594). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (2,416) are outnumbered by mid-level (16,247) and senior (5,153) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 2,343 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 7% of all AI roles (1,863 positions), with 24,200 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $184,000. Top-quartile roles start at $244,000, and the 90th percentile reaches $309,400. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $293,500 median, while Prompt Engineer roles sit at $122,200. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Rag (16,749 postings), Aws (8,932 postings), Rust (7,660 postings), Python (3,815 postings), Azure (2,678 postings), Gcp (2,247 postings), Prompt Engineering (1,469 postings), Openai (1,269 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.