Interested in this Data Engineer role at UnitedHealth Group?
Apply Now →Skills & Technologies
About This Role
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together.
The Enterprise Information Security (EIS) team is responsible for cybersecurity across our organization. We support our business and members by reducing risk, rapidly responding to threats, focusing on business resiliency and securing new acquisitions.
The Principal AI / Machine Learning Data Engineer role focuses on designing and building scalable data platforms that enable advanced analytics, machine learning, and AI\-driven solutions. This role will support the development of intelligent systems that process large\-scale event and operational data, enabling faster insights, automation, and decision\-making across the organization.
This position sits at the intersection of data engineering, machine learning, and AI, with an emphasis on building modern data pipelines and enabling production\-grade AI capabilities.
Ideal Candidate Profile:
- Demonstrated experience building and operating production data platforms and pipelines across batch and streaming workloads
- Solid hands\-on engineering in Python and SQL; familiarity with JVM languages (Java/Scala) in Spark ecosystems is a plus
- Experience with distributed processing and lakehouse/warehouse patterns (eg, Spark/PySpark, Databricks, Snowflake)
- Experience building ingestion frameworks for structured and unstructured data, including event/log and semi\-structured formats
- Experience enabling Generative AI solutions in production (eg, RAG\-style architectures), including retrieval patterns and evaluation/monitoring practices
- Familiarity with knowledge\-centric data approaches (eg, metadata\-driven systems, entity resolution, and/or graph concepts) to improve discoverability and downstream analytics
- Solid data quality, observability, and monitoring mindset (profiling, validation, alerting, and reliability improvements)
- Comfort with orchestration, CI/CD, containerization, and infrastructure\-as\-code (eg, Airflow, GitHub Actions, Docker, Terraform, Kubernetes)
- Cloud experience (AWS, Azure, and/or GCP), including secure handling of sensitive data (PII/PHI) and collaboration with compliance partners
- Ability to lead through influence, mentor engineers, and translate ambiguous problems into scalable technical roadmaps
You'll enjoy the flexibility to work remotely \* from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week.
Primary Responsibilities:
- Design, develop, and maintain scalable data pipelines and data platforms supporting analytics, machine learning, and AI use cases
- Build and optimize ingestion frameworks for large\-scale structured and unstructured data, including streaming and event\-driven sources
- Partner with cross\-functional stakeholders to understand evolving data and AI needs and define long\-term technical solutions
- Enable and support machine learning and AI workflows, including feature engineering, data preparation, and model deployment support
- Drive strategic initiatives around Generative AI, data quality, observability, lineage, and governance
- Develop and maintain frameworks that support rapid experimentation and deployment of AI/ML solutions
- Introduce and evolve best practices in data modeling, orchestration, testing, and monitoring
- Identify and champion opportunities for platform scalability, performance optimization, and cost efficiency
- Collaborate with product, analytics, and infrastructure teams to deliver high\-impact data and AI solutions
- Build and maintain reusable parsing, enrichment, analytic, and service libraries to accelerate delivery across teams
- Work comfortably under time\-sensitive conditions while ensuring thoroughness
- Maintain high ethical standards and the ability to remain objective and confidential
You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Required Qualifications:
- Bachelor's degree or equivalent experience
- 5\+ years of experience designing, building, and operating production data pipelines and platforms
- 5\+ years of hands\-on development with Python (preferred) and/or Java, including code reviews, packaging, and deployment
- 5\+ years of experience with Spark (PySpark) and Databricks (or similar distributed data processing platform)
- 2\+ years of experience leveraging and deploying Generative AI use cases to production environments
- Solid SQL skills and experience working with data lakes and warehouses (e.g., Databricks, Snowflake)
- Experience building ingestion frameworks for structured and unstructured data (e.g., event/log, semi\-structured JSON), including parsing and enrichment patterns
- Experience designing and scaling ELT/ETL frameworks with orchestration tools such as Airflow (or equivalent)
- Experience implementing data quality, observability, and monitoring practices (e.g., data quality checks, pipeline SLAs/SLOs, alerting)
- Experience with metadata, lineage, and governance concepts and tooling (e.g., data catalogs, lineage, access controls)
- Experience with data modeling best practices for analytics and ML use cases
- Experience with DevOps and CI/CD practices and tools (e.g., GitHub Actions), containerization, and infrastructure\-as\-code (e.g., Docker, Kubernetes, Terraform)
- Experience supporting ML/AI workflows (feature engineering, data preparation, and model deployment enablement); exposure to MLOps practices is a plus
- Demonstrated ability to partner with cross\-functional stakeholders, translate requirements into technical solutions, and lead through influence
Preferred Qualifications:
- Experience with cloud platforms such as AWS, Azure, or Google Cloud, including managed data services
- Experience with streaming and event\-driven architectures (e.g., Kafka, Kinesis, Event Hubs)
- Experience with data quality and validation frameworks (e.g., Great Expectations, Deequ) and/or data observability tooling
- Experience enabling MLOps practices (e.g., feature stores, model registries, experiment tracking, deployment automation)
- Experience with lakehouse architectures, Delta Lake, and advanced Spark optimization/performance tuning
- Experience with data visualization tools and libraries such as Plotly, seaborn, and Chartjs
- Experience with machine learning and predictive analytics
- Familiarity with security and privacy concepts for data platforms (e.g., least privilege, PII/PHI handling) and working with compliance partners
- All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy
Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far\-reaching choice of benefits and incentives. The salary for this role will range from $112,700 to $193,200 annually based on full\-time employment. We comply with all minimum wage laws as applicable.
Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants.
*At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone\-of every race, gender, sexuality, age, location and income\-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes \- an enterprise priority reflected in our mission.*
*UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.*
*UnitedHealth Group is a drug \- free workplace. Candidates are required to pass a drug test before beginning employment.*
Salary Context
This $112K-$193K range is in the lower quartile for Data Engineer roles in our dataset (median: $160K across 195 roles with salary data).
Role Details
About This Role
Data Engineers build the pipelines that feed AI models. They design ETL workflows, manage data lakes, and ensure training and inference data is clean, timely, and accessible. Without good data engineering, AI projects fail. It's that simple.
The AI era has expanded the data engineer's scope far beyond batch ETL jobs. You're building real-time embedding pipelines for RAG systems, managing vector databases, ensuring training data quality at scale, and building the infrastructure that lets ML teams iterate on data as fast as they iterate on models. Data quality is the biggest predictor of model quality, and you're the person responsible for it.
Across the 26,159 AI roles we're tracking, Data Engineer positions make up 1% of the market. At UnitedHealth Group, this role fits into their broader AI and engineering organization.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
What the Work Looks Like
A typical week includes: debugging a data pipeline that's producing stale embeddings for the RAG system, optimizing a Spark job that processes training data, building a data quality monitoring dashboard, meeting with the ML team to understand their next data requirements, and writing dbt models that transform raw event data into ML-ready features. The work is deeply technical and high-impact.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
Skills Required
SQL, Python, and distributed systems (Spark, Airflow, dbt) are core. Cloud data platforms (Snowflake, BigQuery, Redshift) are increasingly standard. Many AI-focused roles also want familiarity with vector databases and embedding pipelines. Understanding data modeling, pipeline orchestration, and data quality frameworks covers the essentials.
AI-specific data engineering skills include: building feature stores, managing training data versioning, implementing data lineage tracking, and building real-time embedding pipelines. Experience with streaming systems (Kafka, Flink) is valuable for real-time AI applications. Understanding ML data requirements (balanced datasets, data augmentation, evaluation set construction) makes you much more effective working with ML teams.
Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
Compensation Benchmarks
Data Engineer roles pay a median of $208,300 based on 199 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. This role's midpoint ($152K) sits 27% below the category median. Disclosed range: $112K to $193K.
Across all AI roles, the market median is $184,000. Top-quartile compensation starts at $244,000. The 90th percentile reaches $309,400. For comparison, the highest-paying categories include AI Engineering Manager ($293,500) and AI Architect ($292,900). By seniority level: Entry: $76,880; Mid: $131,300; Senior: $227,400; Director: $244,288; VP: $234,620.
UnitedHealth Group AI Hiring
UnitedHealth Group has 2 open AI roles right now. They're hiring across AI/ML Engineer, Data Engineer. Based in Eden Prairie, MN, US. Compensation range: $130K - $193K.
Remote Work Context
Remote AI roles pay a median of $156,000 across 1,221 positions. About 7% of all AI roles offer remote work.
Career Path
Common paths into Data Engineer roles include Backend Engineer, Database Administrator, Analytics Engineer.
From here, career progression typically leads toward Senior Data Engineer, ML Engineer, Data Platform Lead.
Master SQL and Python first. Then learn a distributed processing framework (Spark or its modern alternatives) and a pipeline orchestrator (Airflow, Dagster, Prefect). Build a portfolio project that demonstrates end-to-end pipeline construction: ingest, transform, validate, serve. If you want to specialize in AI data engineering, add vector databases and embedding pipelines to your skill set.
What to Expect in Interviews
Expect SQL deep-dives (query optimization, partitioning strategies, data modeling), Python coding focused on data pipeline patterns, and system design questions about building scalable ETL workflows. Companies with ML teams will ask about feature stores, embedding pipelines, and training data management. Be ready to discuss data quality monitoring, pipeline orchestration, and how you'd handle schema evolution in a production data lake.
When evaluating opportunities: Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
AI Hiring Overview
The AI job market has 26,159 open positions tracked in our dataset. By seniority: 2,416 entry-level, 16,247 mid-level, 5,153 senior, and 2,343 leadership roles (Director, VP, C-Level). Remote roles make up 7% of the market (1,863 positions). The remaining 24,200 roles require on-site or hybrid attendance.
The market median for AI roles is $184,000. Top-quartile compensation starts at $244,000. The 90th percentile reaches $309,400. Highest-paying categories: AI Engineering Manager ($293,500 median, 28 roles); AI Architect ($292,900 median, 108 roles); AI Safety ($274,200 median, 19 roles).
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
The AI Job Market Today
The AI job market spans 26,159 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (23,752), AI Software Engineer (598), AI Product Manager (594). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (2,416) are outnumbered by mid-level (16,247) and senior (5,153) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 2,343 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 7% of all AI roles (1,863 positions), with 24,200 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $184,000. Top-quartile roles start at $244,000, and the 90th percentile reaches $309,400. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $293,500 median, while Prompt Engineer roles sit at $122,200. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Rag (16,749 postings), Aws (8,932 postings), Rust (7,660 postings), Python (3,815 postings), Azure (2,678 postings), Gcp (2,247 postings), Prompt Engineering (1,469 postings), Openai (1,269 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.