Interested in this MLOps Engineer role at Stitch Fix?
Apply Now →Skills & Technologies
About This Role
### About Stitch Fix, Inc.
Stitch Fix (NASDAQ: SFIX) Stitch Fix is redefining retail by combining human creativity with advanced data science and Generative AI. As we build the future of personalized shopping, we’re equally committed to building yours. We believe in investing in our team as much as our technology. Join us to be a trendsetter in the industry and help us redefine what’s possible for our clients, while we help you reach your full potential.
About the Role
At Stitch Fix, data and AI are not supporting functions \- they are the product. Every styling recommendation, merchandising decision, inventory bet, and client interaction is shaped by the platforms this role leads.
We are looking for a Director of Data \& AI/ML Platform Engineering to lead the engineering organization responsible for three interconnected platform areas: the enterprise data platform that ingests, stores, and makes data queryable at scale; the machine learning platform that enables data scientists and engineers to build, train, and serve models in production; and the generative AI platform that provides the runtime, routing, and integration infrastructure for AI agents and LLM\-powered applications across the company.
This is a product leadership role as much as it is an engineering leadership role. Your users span the full range of the company \- from engineers and data scientists building models and AI applications, to analysts and business partners across every function who are running self\-serve analytics, investigating data, and building AI\-assisted workflows to do their work. Your job is to understand what each of these user groups needs, set a compelling product vision for each platform area, and drive execution all the way through \- from roadmap through adoption.
You will make the consequential architectural decisions that shape how the company builds with data and AI for years. You will own the modernization agenda, manage the trade\-offs between speed and stability, and communicate both the strategy and the stakes to stakeholders from engineering peers to the executive team.
Why this role?
------------------
The platforms you would lead are not greenfield experiments. They are live production systems at a public company \- real complexity, real stakes, and a clear strategic mandate to modernize and extend them. You'll find a strong technical team, meaningful architectural challenges, and a company that has treated data and AI as a competitive differentiator since its founding.
- Meaningful scale: petabytes of data, thousands of daily pipelines, and a user base ranging from engineers and data scientists to business operators across every function
- Strategic mandate: the company's top strategic initiative is building the next generation of AI\-powered personalization \- this team builds the platform it runs on
- Real ownership: you will make consequential architectural decisions with real consequences, supported by a leadership team that trusts engineers to own their domain
Responsibilities:
What you’ll own
-------------------
- Data infrastructure at scale. The systems that ingest, store, and make data accessible across the company \- petabyte\-scale lakehouse, event streaming, workflow orchestration, data governance, and the self\-service tools that make this infrastructure usable without platform team involvement at every step.
- Machine learning platform. The infrastructure that enables data scientists and engineers to build, experiment, and serve models in production at speed \- feature stores, training pipelines, distributed model serving, and the MLOps practices that keep production models healthy, observable, and improving.
- Generative AI platform. The platform that enables teams across the company to build, deploy, and govern AI agents and GenAI\-powered applications \- runtime and routing infrastructure, self\-service agent\-building tools, context and retrieval management, observability and evaluation frameworks, and the cost and safety controls that keep AI reliable, governed, and improving in production.
- The next generation of personalization and decisioning. The foundational platform work behind the company's highest\-priority strategic initiatives \- partnering with Data Science, Algorithms, and Product to build the next generation of intelligence infrastructure: deeper understanding of clients, products, and style, powered by real\-time data, AI reasoning, and systems that continuously improve.
What you’ll do
------------------
- Set and own the product vision for each platform area. Treat internal platforms as products. Understand your users, define north star metrics for platform health and adoption, build a roadmap that earns trust, and communicate the vision in a way that rallies engineers and gains stakeholder buy\-in.
- Own platform modernization decisions. Lead strategic architectural shifts \- open table format migration, feature store re\-foundation, model serving modernization, agentic AI infrastructure buildout \- on behalf of users and stakeholders. Drive these from problem definition through adoption, not just implementation.
- Compress time from idea to production. Build the developer experience, self\-service tooling, and golden paths that reduce friction for every type of user \- from engineers and data scientists building pipelines and models, to analysts exploring data in BI tools, to business operators building and running AI\-assisted workflows. Speed to insight and speed to production are both critical.
- Lead and grow the organization. Manage engineering managers and senior ICs across three platform areas. Create clarity, remove blockers, and develop people \- while continuously evolving how the team works, applying the AI capabilities you build to accelerate your own org's velocity and shaping the skills and structure the team needs for an AI\-first engineering model.
- Drive cross\-functional alignment. Partner with Data Science, ML Engineering, Data Engineering, Product, and Business leaders to align platform investment with business priorities. Represent the platform in quarterly planning, architecture reviews, and executive forums.
- Communicate with authority at every level. Write crisp strategy documents. Present platform trade\-offs to the C\-suite. Sit with an engineer and whiteboard a system design. Fluency across these modes is a requirement, not a nice\-to\-have.
- Run the business. Own budget, headcount planning, vendor relationships, contractor management, and the long\-horizon platform strategy. Balance investment in new capabilities with operational excellence and the reduction of legacy
About You
Requirements:
- Experience:
- + 10\+ years in software, data, or ML/AI platform engineering; 5\+ years leading engineering managers or multi\-team platform organizations
+ Track record of owning and evolving production\-grade platform systems at scale \- not just building them, but driving adoption, rationalizing legacy, and measurably improving developer and data science productivity over time
+ History of making and landing consequential architectural decisions in complex, high\-availability environments; comfort with the full lifecycle from design through post\-launch iteration
- Technical Depth:
- + Data infrastructure: hands\-on experience with distributed compute and storage (Spark, Trino/Presto, Apache Iceberg or Delta Lake), event streaming (Kafka, Flink), workflow orchestration (Airflow), and data governance and quality systems
+ ML lifecycle: feature engineering and feature stores, model training pipelines, model deployment and serving (Ray Serve, Triton, or equivalent), monitoring and validation, and the operational practices of running ML in production (MLOps)
+ Generative AI and LLMOps: LLM orchestration frameworks, retrieval\-augmented generation (RAG), agent architectures, evaluation frameworks, cost and latency governance, and the emerging standards around agentic AI (Model Context Protocol or equivalent)
+ Developer platforms: experience building internal developer platforms (IDPs), self\-service tooling, and platform abstractions that reduce friction for engineering teams; familiarity with developer experience metrics and platform adoption patterns
+ Cloud\-native architecture: distributed systems design, container orchestration (Kubernetes), and cloud infrastructure at scale (AWS preferred)
- Platform Product Leadership:
- + Product\-led mindset. You approach internal platforms the same way a strong product leader approaches external products: segmented user personas, defined success metrics, a prioritized roadmap, and a bias toward adoption and impact over feature completeness.
+ 360\-degree execution. You own the full loop \- discovery and planning, iterative delivery, production quality, user enablement and evangelism, and the feedback loops that close on real\-world impact.
+ Strategic communication and influence. You can make a compelling case for a multi\-year platform investment to a CxO, write a technical design doc your engineers will actually follow, and give a data scientist a useful answer about why their job is slower than it should be. Each of these is a different skill; you have all three.
+ User advocacy. You represent users' needs inside the platform team. You hold the bar on developer experience, self\-service reliability, and documentation quality. You treat user complaints as signal, not noise.
### Why you'll love working at Stitch Fix...
- We are a group of bright, kind people who are motivated by challenge. We value integrity, innovation and trust. You’ll bring these characteristics to life in everything you do at Stitch Fix.
- We cultivate a community of diverse perspectives— all voices are heard and valued.
- We are an innovative company and leverage our strengths in fashion and tech to disrupt the future of retail.
- We win as a team, commit to our work, and celebrate grit together because we value strong relationships.
- We boldly create the future while keeping equity and sustainability at the center of all that we do.
- We are the owners of our work and are energized by solving problems through a growth mindset lens. We think broadly and creatively through every situation to create meaningful impact.
- We offer comprehensive compensation packages and inclusive health and wellness benefits.
Compensation and Benefits
This role will receive a competitive salary, benefits, and equity. The salary for US\-based employees hired into this role will be aligned with the range below, which includes our three geographic areas. A variety of factors are considered when determining someone’s compensation–including a candidate’s professional background, experience, location, and performance. This position is eligible for an annual bonus, and new hire and ongoing grants of restricted stock units, depending on employee and company performance. In addition, the position is eligible for medical, dental, vision, and other benefits. Applicants should apply via our internal or external careers site.
Salary Range
$213,000 \- $284,000 USD
*This link* *leads to the machine readable files that are made available in response to the federal Transparency in Coverage Rule and includes negotiated service rates and out\-of\-network allowed amounts between health plans and healthcare providers. The machine\-readable files are formatted to allow researchers, regulators, and application developers to more easily access and analyze data.*
Please review Stitch Fix's US Applicant Privacy Policy and Notice at Collection here: https://stitchfix.com/careers/workforce\-applicant\-privacy\-policy
Recruiting Fraud Alert:
To all candidates: your personal information and online safety are top of mind for us. At Stitch Fix, recruiters only direct candidates to apply through our official career pages at https://www.stitchfix.com/careers/jobs or https://web.fountain.com/c/stitch\-fix.
Recruiters will never request payments, ask for financial account information or sensitive information like social security numbers. If you are unsure if a message is from Stitch Fix, please email [email protected].
You can read more about Recruiting Scam Awareness on our FAQ page here: https://support.stitchfix.com/hc/en\-us/articles/1500007169402\-Recruiting\-Scam\-Awareness
Salary Context
This $213K-$284K range is above the 75th percentile for MLOps Engineer roles in our dataset (median: $209K across 26 roles with salary data).
View full MLOps Engineer salary data →Role Details
About This Role
MLOps Engineers build the infrastructure that keeps ML models running in production. They own CI/CD pipelines for model deployment, monitoring for data drift and model degradation, and the tooling that lets data scientists ship faster. If ML Engineers build the models, MLOps Engineers build the roads those models travel on.
The job is fundamentally about reliability and velocity. Data scientists want to iterate fast. Product teams want stable predictions. Your job is to make both happen simultaneously. That means building deployment pipelines that catch regressions before they hit production, monitoring systems that alert on data drift before it degrades model performance, and self-service tooling that lets data scientists deploy without filing a ticket.
Across the 3,824 AI roles we're tracking, MLOps Engineer positions make up 1% of the market. At Stitch Fix, this role fits into their broader AI and engineering organization.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
What the Work Looks Like
A typical week involves: debugging a model deployment that's serving stale predictions, building a new monitoring dashboard for a feature team, writing Terraform for GPU-enabled inference clusters, reviewing pull requests for the ML platform's CI/CD pipeline, and meeting with data scientists to understand their pain points. You're the bridge between ML and infrastructure.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
Skills Required
Kubernetes, Docker, and cloud infrastructure are baseline. Most roles want experience with ML-specific tooling: MLflow, Kubeflow, Weights & Biases, or similar. Strong DevOps fundamentals matter more than ML theory. You need to understand model serving (TorchServe, Triton, vLLM), monitoring (Prometheus, Grafana), and infrastructure-as-code (Terraform, Pulumi).
GPU infrastructure knowledge is increasingly valuable as LLM inference becomes a major cost center. Understanding GPU scheduling, multi-node training setups, and inference optimization (quantization, batching, caching) puts you in the top tier. Experience with model registries and feature stores rounds out the profile.
Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
Compensation Benchmarks
MLOps Engineer roles pay a median of $217,200 based on 76 positions with disclosed compensation. Director-level AI roles across all categories have a median of $243,000. This role's midpoint ($248K) sits 14% above the category median. Disclosed range: $213K to $284K.
Across all AI roles, the market median is $200,000. Top-quartile compensation starts at $253,000. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($293,500) and AI Safety ($274,200). By seniority level: Entry: $97,380; Mid: $160,000; Senior: $227,400; Director: $243,000; VP: $250,000.
Stitch Fix AI Hiring
Stitch Fix has 4 open AI roles right now. They're hiring across AI/ML Engineer, MLOps Engineer, Data Scientist. Based in Remote, US. Compensation range: $144K - $284K.
Remote Work Context
Remote AI roles pay a median of $169,035 across 1,817 positions. About 16% of all AI roles offer remote work.
Career Path
Common paths into MLOps Engineer roles include DevOps Engineer, Platform Engineer, Data Engineer.
From here, career progression typically leads toward ML Platform Lead, Infrastructure Architect, Engineering Manager.
DevOps engineers with ML curiosity have the shortest path. You already understand deployment, monitoring, and infrastructure. Add ML-specific knowledge (model serving, data pipelines, experiment tracking) and you're competitive. The career ceiling is high: ML Platform Lead roles at top companies pay well because the infrastructure complexity is enormous.
What to Expect in Interviews
Interviews emphasize infrastructure and reliability. Expect questions about CI/CD for ML models, monitoring for data drift, and how you'd design a model serving platform that handles 10K requests per second. Coding rounds focus on Python and infrastructure-as-code (Terraform, Helm). Be ready to discuss tradeoffs between different model serving frameworks and how you'd handle rollback when a new model degrades performance.
When evaluating opportunities: Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
AI Hiring Overview
The AI job market has 3,824 open positions tracked in our dataset. By seniority: 119 entry-level, 1,813 mid-level, 1,472 senior, and 420 leadership roles (Director, VP, C-Level). Remote roles make up 16% of the market (613 positions). The remaining 3,187 roles require on-site or hybrid attendance.
The market median for AI roles is $200,000. Top-quartile compensation starts at $253,000. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($293,500 median, 31 roles); AI Safety ($274,200 median, 51 roles); Research Engineer ($260,000 median, 401 roles).
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
The AI Job Market Today
The AI job market spans 3,824 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,702), Data Scientist (281), AI Software Engineer (258). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (119) are outnumbered by mid-level (1,813) and senior (1,472) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 420 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 16% of all AI roles (613 positions), with 3,187 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $200,000. Top-quartile roles start at $253,000, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $293,500 median, while Prompt Engineer roles sit at $142,800. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Python (1,968 postings), Aws (1,203 postings), Azure (882 postings), Rag (877 postings), Gcp (735 postings), Prompt Engineering (587 postings), Pytorch (586 postings), Claude (554 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.