Interested in this MLOps Engineer role at BlackLine?
Apply Now →Skills & Technologies
About This Role
Make Your Mark::
The Principal AI/ML Operations Engineer leads the architecture, automation, and operationalization of both machine learning and AI systems at scale. This role defines the strategy and technical standards for ML\-Ops and AIOps across the organization, ensuring models and agents are evaluated, deployed, governed, and monitored with reliability, efficiency, and compliance. The candidate will collaborate across AI, data, and product engineering teams to drive best practices for serving, observability, automated retraining, evaluation flywheels, and operational guardrails for AI systems in production
You'll Get To::
Leadership and Strategy* Define enterprise\-level standards and reference architectures for ML\-Ops and AIOps systems.
- Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs).
- Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments.
- Lead incident response and reliability strategies for ML/AI systems.
AI System Deployment and Integration:* Lead the deployment of AI models and systems in various environments.
- Collaborate with development teams to integrate AI solutions into existing workflows and applications.
- Ensure seamless integration with different platforms and technologies.
- Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance.
- Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows.
- Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics.
- Implement logging, metering, and auditing for agent behavior, function calls, and compliance alignment.
- Create scalable observability systems—tracking conversation outcomes, factual accuracy, latency, escalation patterns, and safety events.
- Architect end\-to\-end guardrails for AI agents including prompt injection protection, identity\-aware routing, and tool usage authorization.
- Collaborate cross\-functionally to standardize authentication, authorization, and session governance for multi\-agent runtimes.
Model Deployment and Integration:* Architect and standardize model registries and feature stores to support version tracking, lineage, and reproducibility across environments.
- Lead the deployment of machine learning models into production environments, ensuring scalability, reliability, and efficiency.
- Collaborate with software engineers to integrate machine learning models into existing applications and systems.
- Implement and maintain APIs for model inference.
Infrastructure and Environment Management:* Design and manage training infrastructure including distributed training orchestration, GPU/TPU resource allocation, and automatic scaling.
- Implement CI/CD for model workflows using pipelines integrated with model validation, bias checks, and rollback automation.
- Build standardized experimentation frameworks for reproducible training, tuning, and deployment cycles (MLflow, W\&B, Kubeflow).
- Manage and optimize the infrastructure required for machine learning operations in cloud.
- Work closely with other teams to ensure the availability, security, and performance of machine learning systems.
Monitoring and Maintenance:* Implement robust monitoring solutions for deployed machine learning models to detect issues and ensure performance.
- Collaborate with data scientists and engineers to address and resolve model performance and data quality issues.
- Conduct regular system maintenance, updates, and optimizations to ensure optimal performance of machine learning solutions.
Automation and Orchestration:* Develop and maintain automation scripts and tools for managing machine learning workflows.
- Implement orchestration systems to streamline the end\-to\-end machine learning lifecycle, from data preparation to model deployment.
Collaboration with Data Science Teams:* Collaborate with data scientists to understand model requirements and constraints for deployment.
- Facilitate the transition of machine learning models from research to production, ensuring scalability and efficiency.
Performance Optimization:* Identify and implement optimizations to enhance the performance and efficiency of machine learning models in production.
- Conduct performance analysis and implement improvements based on resource utilization of metrics.
Security and Compliance:* Implement security measures to protect machine learning systems and data.
- Ensure compliance with regulatory requirements and industry standards related to machine learning and data privacy.
- Integrate audit controls, metadata storage, and lineage tracking across ML and AI workflows.
- Ensure complete monitoring and feedback loops including event logs, evaluations, and automated retraining triggers.
- Enforce secure deployment patterns with Infrastructure\-as\-Code and cloud\-native secrets management.
- Define SLAs, error budgets, and compliance reporting mechanisms for ML and AI systems.
What You'll Bring::
- Education and Experience:
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field.
- 10\+ years in ML infrastructure, DevOps, and software system architecture; 4\+ years in leading MLOps or AI Ops platforms.
- Technical Skills:
- Strong programming skills in languages such as Python, Java, or Scala.
- Expertise in ML frameworks (TensorFlow, PyTorch, scikit\-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow).
- Proven experience operating production pipelines for ML and LLM\-based systems across cloud ecosystems (GCP, AWS, Azure).
- Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management.
- Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation.
- Hands\-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking.
- Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads.
- Proficiency in containerization technologies (e.g., Docker, Kubernetes).
- Operations and Infrastructure:
- Proficient in scripting languages (e.g., Bash, python) for automation.
- Experience with workflow orchestration tools (e.g., Apache Airflow).
- Expertise in managing and optimizing cloud\-based infrastructure.
- Familiarity with DevOps practices and tools for automated deployment.
- Understanding of network configurations and security protocols.
- Problem\-solving and Critical Thinking:
- Ability to define problems, collect and analyze data, and propose innovative solutions. Strong critical thinking skills to evaluate models, identify limitations, and
- Adaptability and Learning Agility:
- Comfortable working in a fast\-paced, rapidly evolving environment. Proactive in staying up to date with the latest trends, techniques, and technologies in AI/data science
Thrive at BlackLine Because You Are Joining::
- A technology\-based company with a sense of adventure and a vision for the future. Every door at BlackLine is open. Just bring your brains, your problem\-solving skills, and be part of a winning team at the world's most trusted name in Finance Automation!
- A culture that is kind, open, and accepting. It's a place where people can embrace what makes them unique, and the mix of cultural backgrounds and varying interests cultivates diverse thought and perspectives.
- A culture where BlackLiner's continued growth and learning is empowered. BlackLine offers a wide variety of professional development seminars and inclusive affinity groups to celebrate and support our diversity.
BlackLine is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity or expression, race, ethnicity, age, religious creed, national origin, physical or mental disability, ancestry, color, marital status, sexual orientation, military or veteran status, status as a victim of domestic violence, sexual assault or stalking, medical condition, genetic information, or any other protected class or category recognized by applicable equal employment opportunity or other similar laws.
BlackLine recognizes that the ways we work and the workplace itself have shifted. We innovate in a workplace that optimizes a combination of virtual and in\-person interactions to maximize collaboration and nurture our culture. Candidates who live within a reasonable commute to one of our offices will work in the office at least 2 days a week.
Salary Range:: USD $257,000\.00/Yr. \- USD $322,000\.00/Yr.
Salary Context
This $257K-$322K range is above the 75th percentile for MLOps Engineer roles in our dataset (median: $209K across 26 roles with salary data).
View full MLOps Engineer salary data →Role Details
About This Role
MLOps Engineers build the infrastructure that keeps ML models running in production. They own CI/CD pipelines for model deployment, monitoring for data drift and model degradation, and the tooling that lets data scientists ship faster. If ML Engineers build the models, MLOps Engineers build the roads those models travel on.
The job is fundamentally about reliability and velocity. Data scientists want to iterate fast. Product teams want stable predictions. Your job is to make both happen simultaneously. That means building deployment pipelines that catch regressions before they hit production, monitoring systems that alert on data drift before it degrades model performance, and self-service tooling that lets data scientists deploy without filing a ticket.
Across the 3,824 AI roles we're tracking, MLOps Engineer positions make up 1% of the market. At BlackLine, this role fits into their broader AI and engineering organization.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
What the Work Looks Like
A typical week involves: debugging a model deployment that's serving stale predictions, building a new monitoring dashboard for a feature team, writing Terraform for GPU-enabled inference clusters, reviewing pull requests for the ML platform's CI/CD pipeline, and meeting with data scientists to understand their pain points. You're the bridge between ML and infrastructure.
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
Skills Required
Kubernetes, Docker, and cloud infrastructure are baseline. Most roles want experience with ML-specific tooling: MLflow, Kubeflow, Weights & Biases, or similar. Strong DevOps fundamentals matter more than ML theory. You need to understand model serving (TorchServe, Triton, vLLM), monitoring (Prometheus, Grafana), and infrastructure-as-code (Terraform, Pulumi).
GPU infrastructure knowledge is increasingly valuable as LLM inference becomes a major cost center. Understanding GPU scheduling, multi-node training setups, and inference optimization (quantization, batching, caching) puts you in the top tier. Experience with model registries and feature stores rounds out the profile.
Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
Compensation Benchmarks
MLOps Engineer roles pay a median of $217,200 based on 76 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. This role's midpoint ($289K) sits 33% above the category median. Disclosed range: $257K to $322K.
Across all AI roles, the market median is $200,000. Top-quartile compensation starts at $253,000. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($293,500) and AI Safety ($274,200). By seniority level: Entry: $97,380; Mid: $160,000; Senior: $227,400; Director: $243,000; VP: $250,000.
BlackLine AI Hiring
BlackLine has 3 open AI roles right now. They're hiring across AI Software Engineer, AI/ML Engineer, MLOps Engineer. Based in Pleasanton, CA, US. Compensation range: $163K - $322K.
Location Context
Across all AI roles, 16% (613 positions) offer remote work, while 3,187 require on-site attendance. Top AI hiring metros: New York (2,448 roles, $210,000 median); San Francisco (1,990 roles, $253,000 median); Los Angeles (1,686 roles, $189,000 median).
Career Path
Common paths into MLOps Engineer roles include DevOps Engineer, Platform Engineer, Data Engineer.
From here, career progression typically leads toward ML Platform Lead, Infrastructure Architect, Engineering Manager.
DevOps engineers with ML curiosity have the shortest path. You already understand deployment, monitoring, and infrastructure. Add ML-specific knowledge (model serving, data pipelines, experiment tracking) and you're competitive. The career ceiling is high: ML Platform Lead roles at top companies pay well because the infrastructure complexity is enormous.
What to Expect in Interviews
Interviews emphasize infrastructure and reliability. Expect questions about CI/CD for ML models, monitoring for data drift, and how you'd design a model serving platform that handles 10K requests per second. Coding rounds focus on Python and infrastructure-as-code (Terraform, Helm). Be ready to discuss tradeoffs between different model serving frameworks and how you'd handle rollback when a new model degrades performance.
When evaluating opportunities: Good MLOps postings specify their ML stack, infrastructure scale, and the problems they're solving (deployment velocity, cost optimization, monitoring gaps). Red flag: companies that want MLOps but don't have any models in production yet. You'll end up doing general DevOps instead.
AI Hiring Overview
The AI job market has 3,824 open positions tracked in our dataset. By seniority: 119 entry-level, 1,813 mid-level, 1,472 senior, and 420 leadership roles (Director, VP, C-Level). Remote roles make up 16% of the market (613 positions). The remaining 3,187 roles require on-site or hybrid attendance.
The market median for AI roles is $200,000. Top-quartile compensation starts at $253,000. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($293,500 median, 31 roles); AI Safety ($274,200 median, 51 roles); Research Engineer ($260,000 median, 401 roles).
MLOps demand tracks closely with production ML adoption. As more companies move models from notebooks to production, the need for MLOps grows. The role is well-established at large tech companies and growing fast at mid-stage startups that are hitting the 'our models work in notebooks but break in production' phase.
The AI Job Market Today
The AI job market spans 3,824 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,702), Data Scientist (281), AI Software Engineer (258). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (119) are outnumbered by mid-level (1,813) and senior (1,472) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 420 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 16% of all AI roles (613 positions), with 3,187 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $200,000. Top-quartile roles start at $253,000, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $293,500 median, while Prompt Engineer roles sit at $142,800. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Python (1,968 postings), Aws (1,203 postings), Azure (882 postings), Rag (877 postings), Gcp (735 postings), Prompt Engineering (587 postings), Pytorch (586 postings), Claude (554 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.