Blue Cross Blue Shield: Data Engineer (AI/ML)

Interested in this Data Engineer role at Blue Cross Blue Shield?

Apply Now →

Skills & Technologies

AwsAzureBedrockEmbeddingsKubernetesPrompt EngineeringPythonRagSagemaker

About This Role

AI job market dashboard showing open roles by category

Job Description Summary

The Data Engineer will design, build, and optimize scalable, secure data pipelines that power analytics and product platforms. For this role specifically, the focus will be on Machine Learning (ML) and Generative Artificial Intelligence (GenAI) workloads, while contributing to innovation and ensuring compliance with healthcare industry standards. This role is expected to provide strong hands\-on technical expertise, collaborate across teams, and contribute to architecture decisions that align engineering practices with organizational goals.Job Description

Design, build, and maintain reliable, high\-performance data pipelines for large\-scale structured and unstructured healthcare data.
Use PySpark and modern cloud\-based tools (Databricks, AWS Glue, EMR, Snowflake) to transform and process data efficiently.
Support ingestion, transformation, and validation processes that ensure data consistency, integrity, and availability.
Partner with Data Architects, Data Scientists, and Analysts to translate business needs into scalable engineering solutions.
Collaborate with platform and DevOps teams to deploy, scale, and monitor data pipelines using Airflow and Kubernetes.
Participate in code reviews, documentation, and continuous improvement efforts across the engineering team.
Implement and maintain data validation frameworks to ensure pipeline accuracy and completeness.
Contribute to best practices in version control, metadata management, and reproducibility.
Stay current with emerging technologies in data engineering and cloud computing, recommending improvements to existing infrastructure.
Participate in performance tuning, cost optimization, and scaling strategies for cloud\-based data systems.
Identify automation opportunities to streamline ETL/ELT processes and reduce operational overhead.
Share knowledge and mentor junior team members on tools, techniques, and best practices.
Promote a culture of collaboration, innovation, and continuous learning within the engineering organization.
Support compliance with SOC 2, HIPAA, and GDPR by adhering to established data privacy and security practices.

The posting range for this position is:

100,800\.00 \- 138,600\.00

Required Education, Certifications and Experience:

Education:

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

Experience:

5\+ years of experience in data engineering, including building and managing pipelines in cloud\-based environments.

Knowledge Skills and Abilities

Experience with building and operationalizing the data foundations that support machine learning and generative AI use cases, including feature pipelines, training/inference data preparation, and retrieval\-ready datasets (e.g., embeddings and vector stores)
Familiarity with GenAI skills and adjacent tooling (foundation models, prompt engineering, RAG, embeddings/vector databases, and GenAI orchestration frameworks).
Hands\-on experience with AWS AI/ML and data services, including Amazon Bedrock, Bedrock Agent Core, SageMaker, Glue, and EMR.
Experience designing and optimizing data architectures, including data foundations that support ML and GenAI workloads.
Hands\-on experience with workflow orchestration (Airflow) and containerization (Kubernetes).
Hands\-on technical expertise, cross\-team collaboration, and contributing to architecture decisions
Proficiency in Python, SQL, and distributed data frameworks (PySpark, Databricks, AWS Glue, EMR).
Working knowledge of cloud platforms (AWS or Azure) and data warehouses (Snowflake).
Familiarity with NoSQL and relational databases, as well as data modeling best practices.
Strong analytical, problem\-solving, and communication skills.

Understanding of compliance frameworks (SOC 2, HIPAA) and secure data management principles.
Experience working with healthcare datasets or knowledge of healthcare standards (HIPAA, HL7, FHIR) preferred.

\#LI\_HYBRID

The posted salary range is the lowest to highest salary we, in good faith, believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the hiring range and this hiring range may also be modified in the future. A candidate’s position within the hiring range may be based on several factors including, but not limited to, specific competencies, relevant education, qualifications, certifications, relevant experience, skills, seniority, performance, shift, travel requirements, and business or organizational needs. This job is also eligible for *annual* *bonus**incentive*pay.

We offer a comprehensive package of benefits *including paid* *time off,* *11 holidays,medical/dental/vision* *insurance,* *generous* *401(k)* *matching,* *lifestyle spending account* *and* *m**any other benefits* to eligible employees.

Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company’s sole discretion, consistent with the law.

Salary Context

This $100K-$138K range is in the lower quartile for Data Engineer roles in our dataset (median: $168K across 38 roles with salary data).

Role Details

Company Blue Cross Blue Shield

Title Data Engineer (AI/ML)

Location Chicago, IL, US

Category Data Engineer

Experience Mid Level

Salary $100K - $138K

Remote No

About This Role

Data Engineers build the pipelines that feed AI models. They design ETL workflows, manage data lakes, and ensure training and inference data is clean, timely, and accessible. Without good data engineering, AI projects fail. It's that simple.

The AI era has expanded the data engineer's scope far beyond batch ETL jobs. You're building real-time embedding pipelines for RAG systems, managing vector databases, ensuring training data quality at scale, and building the infrastructure that lets ML teams iterate on data as fast as they iterate on models. Data quality is the biggest predictor of model quality, and you're the person responsible for it.

Across the 4,133 AI roles we're tracking, Data Engineer positions make up 1% of the market. At Blue Cross Blue Shield, this role fits into their broader AI and engineering organization.

Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.

What the Work Looks Like

A typical week includes: debugging a data pipeline that's producing stale embeddings for the RAG system, optimizing a Spark job that processes training data, building a data quality monitoring dashboard, meeting with the ML team to understand their next data requirements, and writing dbt models that transform raw event data into ML-ready features. The work is deeply technical and high-impact.

Skills Required

Aws (32% of roles) Azure (24% of roles) Bedrock (5% of roles) Embeddings (6% of roles) Kubernetes (13% of roles) Prompt Engineering (15% of roles) Python (51% of roles) Rag (22% of roles) Sagemaker (4% of roles)

SQL, Python, and distributed systems (Spark, Airflow, dbt) are core. Cloud data platforms (Snowflake, BigQuery, Redshift) are increasingly standard. Many AI-focused roles also want familiarity with vector databases and embedding pipelines. Understanding data modeling, pipeline orchestration, and data quality frameworks covers the essentials.

AI-specific data engineering skills include: building feature stores, managing training data versioning, implementing data lineage tracking, and building real-time embedding pipelines. Experience with streaming systems (Kafka, Flink) is valuable for real-time AI applications. Understanding ML data requirements (balanced datasets, data augmentation, evaluation set construction) makes you much more effective working with ML teams.

Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.

Compensation Benchmarks

Data Engineer roles pay a median of $208,300 based on 273 positions with disclosed compensation. Mid-level AI roles across all categories have a median of $165,778. This role's midpoint ($119K) sits 43% below the category median. Disclosed range: $100K to $138K.

Across all AI roles, the market median is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Safety ($274,200) and AI Engineering Manager ($268,700). By seniority level: Entry: $97,760; Mid: $165,778; Senior: $227,400; Director: $250,000; VP: $250,000.

Blue Cross Blue Shield AI Hiring

Blue Cross Blue Shield has 1 open AI role right now. They're hiring across Data Engineer. Based in Chicago, IL, US. Compensation range: $138K - $138K.

Location Context

AI roles in Chicago pay a median of $200,100 across 329 tracked positions.

Career Path

Common paths into Data Engineer roles include Backend Engineer, Database Administrator, Analytics Engineer.

From here, career progression typically leads toward Senior Data Engineer, ML Engineer, Data Platform Lead.

Master SQL and Python first. Then learn a distributed processing framework (Spark or its modern alternatives) and a pipeline orchestrator (Airflow, Dagster, Prefect). Build a portfolio project that demonstrates end-to-end pipeline construction: ingest, transform, validate, serve. If you want to specialize in AI data engineering, add vector databases and embedding pipelines to your skill set.

What to Expect in Interviews

Expect SQL deep-dives (query optimization, partitioning strategies, data modeling), Python coding focused on data pipeline patterns, and system design questions about building scalable ETL workflows. Companies with ML teams will ask about feature stores, embedding pipelines, and training data management. Be ready to discuss data quality monitoring, pipeline orchestration, and how you'd handle schema evolution in a production data lake.

When evaluating opportunities: Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.

AI Hiring Overview

The AI job market has 4,133 open positions tracked in our dataset. By seniority: 106 entry-level, 1,901 mid-level, 1,663 senior, and 463 leadership roles (Director, VP, C-Level). Remote roles make up 14% of the market (583 positions). The remaining 3,532 roles require on-site or hybrid attendance.

The market median for AI roles is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. Highest-paying categories: AI Safety ($274,200 median, 57 roles); AI Engineering Manager ($268,700 median, 42 roles); Research Engineer ($260,000 median, 442 roles).

The AI Job Market Today

The AI job market spans 4,133 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,865), Data Scientist (339), AI Software Engineer (313). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.

The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (106) are outnumbered by mid-level (1,901) and senior (1,663) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 463 positions, representing the bottleneck between technical execution and organizational strategy.

Remote work availability sits at 14% of all AI roles (583 positions), with 3,532 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.

AI compensation is structured in clear tiers. The market median sits at $200,700. Top-quartile roles start at $254,000, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.

Category matters for compensation. AI Safety roles lead at $274,200 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.

The most in-demand skills across all AI postings: Python (2,128 postings), Aws (1,324 postings), Azure (1,003 postings), Rag (916 postings), Gcp (817 postings), Pytorch (655 postings), Prompt Engineering (639 postings), Claude (571 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.

Frequently Asked Questions

Based on 273 roles with disclosed compensation, the median salary for Data Engineer positions is $208,300. Actual compensation varies by seniority, location, and company stage.

About 14% of the 4,133 AI roles we track offer remote work. Remote availability varies by company and seniority level, with senior and leadership roles more likely to offer location flexibility.

Blue Cross Blue Shield is among the companies actively hiring for AI and ML talent. Check our company profiles for detailed breakdowns of open roles, salary ranges, and hiring trends.

Common next steps from Data Engineer positions include Senior Data Engineer, ML Engineer, Data Platform Lead. Progression depends on whether you lean toward technical depth, people management, or product strategy.

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.

Skills & Technologies

About This Role

The posting range for this position is:

Required Education, Certifications and Experience:

Education:

Experience:

Knowledge Skills and Abilities

Salary Context

Role Details

About This Role

What the Work Looks Like

Skills Required

Compensation Benchmarks

Blue Cross Blue Shield AI Hiring

Location Context

Career Path

What to Expect in Interviews

AI Hiring Overview

The AI Job Market Today

Frequently Asked Questions

Get Weekly AI Career Intelligence

Related AI Jobs