Interested in this MLOps Engineer role at hims & hers?
Apply Now →Skills & Technologies
About This Role
Hims \& Hers is the leading health and wellness platform, on a mission to help the world feel great through the power of better health. We are redefining healthcare by putting the customer first and delivering access to care that is affordable, accessible, and personal, from diagnosis to treatment to delivery. No two people are the same, so we provide access to personalized care designed for results. By normalizing health \& wellness challenges and innovating on their solutions, we’re making better health outcomes easier to achieve.
Hims \& Hers is a public company, traded on the NYSE under the ticker symbol “HIMS.” To learn more about the brand and offerings, you can visit hims.com/about and hims.com/how\-it\-works . For information on the company’s outstanding benefits, culture, and its talent\-first flexible/remote work approach, see below and visit www.hims.com/careers\-professionals.
About the Role:
-------------------
We're hiring a Staff ML Systems Engineer to design, build, and operate the production infrastructure that powers AI across Hims \& Hers. This is a deeply technical, hands\-on infrastructure role focused on the systems underneath AI — the Kubernetes platform, CI/CD and GitOps pipelines, infrastructure\-as\-code, inference and model\-serving infrastructure, and the observability and tracing stack that keeps AI services reliable, debuggable, and compliant in production.
You won't just deploy models — you'll own the machinery that lets every AI team ship and operate safely. You'll own critical systems like our EKS clusters, deployment and autoscaling infrastructure, IAM and secrets management, LLM tracing/observability pipelines (Langfuse, Datadog, OpenTelemetry), and the developer platform that AI and product engineers rely on daily. You'll partner with ML engineers, product engineers, and clinical teams to ensure our AI systems are reliable, observable, secure, and trustworthy in a regulated healthcare environment.
This role is ideal for someone who thinks in systems and infrastructure, cares deeply about reliability, security, and cost, and wants to define how AI runs in production at a company where it directly impacts patient outcomes.
You Will:
-------------
### Own and scale the AI compute and deployment platform
- Own and evolve our containerized application deployment platform and related systems for AI workloads, encompassing general process and job orchestration (e.g. Kubernetes) — cluster operations, node lifecycle, autoscaling (Karpenter), storage (EBS CSI), and workload isolation across staging and production.
- Build and maintain GitOps\-based deployment pipelines (Helm/Kustomize overlays, environment promotion) that let teams ship AI services safely and repeatably.
- Design ephemeral/preview environments, feature\-branched deployments, and nightly release pipelines so teams can validate AI changes in production\-like conditions before release.
- Drive efficiency and cost management across compute, autoscaling, and inference infrastructure.
Build inference and model\-serving infrastructure
- Operate and scale inference infrastructure and a multi\-provider LLM AI gateway (e.g. Bedrock, Vertex, and other providers) — including credentials, rate limits, and failover.
- Build reliable serving patterns for LLM\-powered workflows: routing, grounding, tool execution, and context assembly at the platform level.
- Create reusable infrastructure abstractions and contracts that standardize how AI services are deployed, configured, and consumed across the company.
Own observability, tracing, and reliability
- Own the LLM/AI observability and tracing stack — provisioning and scaling systems like Langfuse, Datadog (dd\-trace), OpenTelemetry tracing (OTLP), and the underlying datastores (e.g. ClickHouse) — so AI behavior is auditable and debuggable in production.
- Build analytics and monitoring pipelines that surface latency, error, quality, and regression signals to engineering and clinical stakeholders.
- Define SLOs, alerting, on\-call runbooks, and incident response for AI infrastructure; lead troubleshooting and continuously raise platform reliability.
Scale the AI developer platform and CI/CD
- Own and improve the monorepo build system and CI/CD pipelines for AI workloads — including eval workflows, Docker image builds, automated PR checks and convention enforcement, and cross\-platform test execution.
- Own shared infrastructure tooling, CLIs, and IaC modules (Terraform, Scalr) that AI and product engineers use daily.
- Identify and eliminate platform bottlenecks — reducing CI/CD cycle times, build latency, and deployment friction — to improve developer velocity across the Applied AI organization.
Drive security, compliance, and governance at the systems level
- Build IAM, OIDC, and secrets management as first\-class infrastructure — scoped, least\-privilege roles, write\-only secret rotation, and cross\-account access audits.
- Encode security\-by\-default, scope boundaries, and access controls into the platform so AI services are HIPAA\-compliant and privacy\-first.
- Partner with clinical, legal, security, and data platform teams (including Databricks/Unity Catalog access governance) to enforce compliant, auditable data access.
Set technical direction and raise the bar
- Drive multi\-quarter infrastructure initiatives, from cluster and deployment architecture to inference platform, GPU compute strategy, and observability evolution.
- Write and lead technical design documents and design reviews, define infrastructure standards and development\-workflow conventions, and contribute to technical governance across AI engineering.
- Mentor engineers on reliability engineering, infrastructure\-as\-code, and MLOps best practices, and bridge the gap between prototypes and production\-grade systems.
You Have:
-------------
- 8\+ years of professional experience in infrastructure, platform, DevOps, or SRE engineering — with at least 3 years focused on ML/AI systems in production.
- Deep, hands\-on experience with Kubernetes (ideally EKS) and the cloud\-native ecosystem — autoscaling, GitOps, Helm/Kustomize, operating clusters at scale, and general process/job orchestration.
- Strong infrastructure\-as\-code skills (Terraform) and experience designing secure cloud architectures: IAM, OIDC, secrets management, and least\-privilege access.
- Strong proficiency in Python, with experience building production infrastructure tooling, CLIs, and data/observability pipelines.
- 2\+ years of experience operating LLM\-based systems in production (LLMOps) — inference routing, serving, tracing, and the reliability patterns needed to run them at scale.
- Hands\-on experience with observability/tracing stacks (Datadog, OpenTelemetry, Langfuse, or equivalent) and metrics/log/trace pipelines.
- Experience designing and maintaining CI/CD pipelines, build systems, and developer tooling for fast\-moving engineering teams.
- A systems\-and\-operations mindset: you think about failure modes, SLOs, observability, security, and long\-term maintainability before shipping.
- Experience writing and leading technical design documents (TDDs/RFCs) for infrastructure\-scale initiatives.
- Strong collaboration skills across engineering, ML, product, security, and clinical teams.
- A deep appreciation for safety, privacy, and security — ideally with experience in a regulated domain such as healthcare, fintech, or life sciences.
Nice to Have:
-----------------
- Experience with AWS (EKS, Bedrock, S3, CloudFront, IAM) and multi\-cloud (GCP/Vertex AI) inference routing.
- Experience with Databricks (MLflow, Unity Catalog, Spark, Delta) and data platform access governance.
- Experience provisioning LLM observability infrastructure (Langfuse, ClickHouse, OpenTelemetry/OTLP tracing, LogFire) and LLM behavior monitoring.
- Experience with Karpenter, cluster autoscaling, and cost optimization for ML compute.
- Experience with monorepo build systems (Pants, Bazel) and large\-scale CI/CD.
- Experience building automated PR\-review / convention\-enforcement pipelines and developer\-workflow standards.
- Familiarity with Vertex AI Agent Builder, Vertex AI Model Registry, or GCP managed AI/ML services as a stretch growth area.
- Contributions to open\-source infrastructure, IaC modules, SDKs, or developer tooling projects.
Why Join Us
---------------
At Hims \& Hers, you'll be part of a small, high\-impact team defining how AI infrastructure runs in production for healthcare. The platform you build — compute, deployment, inference, observability, and security — is the foundation that every AI\-powered experience depends on. Reliability, security, and developer velocity aren't afterthoughts here; they're the product.
Join us in building the infrastructure that makes healthcare AI smarter, safer, and more trustworthy.
Our Benefits (there are more but here are some highlights):
---------------------------------------------------------------
- Competitive salary \& equity compensation for full\-time roles
- Unlimited PTO, company holidays, and quarterly mental health days
- Comprehensive health benefits including medical, dental \& vision, and parental leave
- Employee Stock Purchase Program (ESPP)
- 401k benefits with employer matching contribution
- Offsite team retreats
We are committed to building a workforce that reflects diverse perspectives and prioritizes ethics, wellness, and a strong sense of belonging. If you're excited about this role, we encourage you to apply—even if you're not sure if your background or experience is a perfect match.
Hims considers all qualified applicants for employment, including applicants with arrest or conviction records, in accordance with the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance, the California Fair Chance Act, and any similar state or local fair chance laws.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Hims \& Hers is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, please contact us at [email protected] and describe the needed accommodation. Your privacy is important to us, and any information you share will only be used for the legitimate purpose of considering your request for accommodation. Hims \& Hers gives consideration to all qualified applicants without regard to any protected status, including disability. Please do not send resumes to this email address.
To learn more about how we collect, use, retain, and disclose Personal Information, please visit our Global Candidate Privacy Statement.
Compensation Range: $210K \- $250K
Salary Context
This $210K-$250K range is above the median for MLOps Engineer roles in our dataset (median: $190K across 22 roles with salary data).
View full MLOps Engineer salary data →Role Details
About This Role
MLOps Engineers build the infrastructure that keeps ML models running in production. They own CI/CD pipelines for model deployment, monitoring for data drift and model degradation, and the tooling that lets data scientists ship faster. If ML Engineers build the models, MLOps Engineers build the roads those models travel on.
The job is fundamentally about reliability and velocity. Data scientists want to iterate fast. Product teams want stable predictions. Your job is to make both happen simultaneously. That means building deployment pipelines that catch regressions before they hit production, monitoring systems that alert on data drift before it degrades model performance, and self-service tooling that lets data scientists deploy without filing a ticket.
Across the 3,823 AI roles we're tracking, MLOps Engineer positions make up 1% of the market. At hims & hers, this role fits into their broader AI and engineering organization.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
What the Work Looks Like
A typical week involves: debugging a model deployment that's serving stale predictions, building a new monitoring dashboard for a feature team, writing Terraform for GPU-enabled inference clusters, reviewing pull requests for the ML platform's CI/CD pipeline, and meeting with data scientists to understand their pain points. You're the bridge between ML and infrastructure.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
Skills Required
Kubernetes, Docker, and cloud infrastructure are baseline. Most roles want experience with ML-specific tooling: MLflow, Kubeflow, Weights & Biases, or similar. Strong DevOps fundamentals matter more than ML theory. You need to understand model serving (TorchServe, Triton, vLLM), monitoring (Prometheus, Grafana), and infrastructure-as-code (Terraform, Pulumi).
GPU infrastructure knowledge is increasingly valuable as LLM inference becomes a major cost center. Understanding GPU scheduling, multi-node training setups, and inference optimization (quantization, batching, caching) puts you in the top tier. Experience with model registries and feature stores rounds out the profile.
Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
Compensation Benchmarks
MLOps Engineer roles pay a median of $217,200 based on 87 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. This role's midpoint ($230K) sits 6% above the category median. Disclosed range: $210K to $250K.
Across all AI roles, the market median is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($275,000) and AI Safety ($274,200). By seniority level: Entry: $97,880; Mid: $165,000; Senior: $227,400; Director: $247,800; VP: $250,000.
hims & hers AI Hiring
hims & hers has 2 open AI roles right now. They're hiring across MLOps Engineer, AI/ML Engineer. Based in US. Compensation range: $205K - $250K.
Location Context
AI roles in Austin pay a median of $215,300 across 523 tracked positions. That's 8% above the national median.
Career Path
Common paths into MLOps Engineer roles include DevOps Engineer, Platform Engineer, Data Engineer.
From here, career progression typically leads toward ML Platform Lead, Infrastructure Architect, Engineering Manager.
DevOps engineers with ML curiosity have the shortest path. You already understand deployment, monitoring, and infrastructure. Add ML-specific knowledge (model serving, data pipelines, experiment tracking) and you're competitive. The career ceiling is high: ML Platform Lead roles at top companies pay well because the infrastructure complexity is enormous.
What to Expect in Interviews
Interviews emphasize infrastructure and reliability. Expect questions about CI/CD for ML models, monitoring for data drift, and how you'd design a model serving platform that handles 10K requests per second. Coding rounds focus on Python and infrastructure-as-code (Terraform, Helm). Be ready to discuss tradeoffs between different model serving frameworks and how you'd handle rollback when a new model degrades performance.
When evaluating opportunities: Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
AI Hiring Overview
The AI job market has 3,823 open positions tracked in our dataset. By seniority: 112 entry-level, 1,798 mid-level, 1,516 senior, and 397 leadership roles (Director, VP, C-Level). Remote roles make up 15% of the market (590 positions). The remaining 3,217 roles require on-site or hybrid attendance.
The market median for AI roles is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($275,000 median, 41 roles); AI Safety ($274,200 median, 55 roles); Research Engineer ($260,000 median, 434 roles).
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
The AI Job Market Today
The AI job market spans 3,823 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,629), Data Scientist (322), AI Software Engineer (279). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (112) are outnumbered by mid-level (1,798) and senior (1,516) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 397 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 15% of all AI roles (590 positions), with 3,217 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $200,100. Top-quartile roles start at $253,500, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $275,000 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Python (1,979 postings), Aws (1,190 postings), Azure (899 postings), Rag (839 postings), Gcp (726 postings), Pytorch (595 postings), Prompt Engineering (595 postings), Claude (540 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.