Interested in this Data Engineer role at Regeneron?
Apply Now →Skills & Technologies
About This Role
Build our future together:
Global Development is embarking on a Digital Transformation project incorporating AI, machine learning, and automation to help us reduce cycle times, improve quality allowing us to focus on more meaningful work. We focus on developing and improving data pipelines, infrastructure, architecture, and analytic tools to allow resources to fuel our transformation. Working with a team of engineers, analysts, and scientists, you will contribute to modernizing our clinical data infrastructure. This highly visible role will be a technical and strategic liaison between the transformation projects and IT.
A key responsibility will be developing and optimizing schema and data models for clinical data. We ensure that those products are standard compliant and interoperable with other platforms and data sources. This role will also contribute to data governance strategy and to the development and management of tools improving and monitoring data quality. The data engineer will also develop processes to automate routine workflows. The engineer will also provide technical leadership and mentorship and will stay current with innovations in data engineering so that they may be evaluated for implementation.
This position offers the opportunity to contribute to a fast\-growing, science\-driven organization making a meaningful difference to patients worldwide.
When \& where:
This can be a remote position in the US or on\-site position at our Armonk, NY or Warren, NJ offices.
Discover your role:
- Work with a cross\-functional team to optimize and implement our data strategy with a focus on optimization for digital transformation and the use of AI/ML
- Design and document end\-to\-end data architectures that support diverse analytic, operational, and research needs.
- Facilitate the implementation of a modern data platform (e.g. Snowflake, Databricks, etc.)
- Identify opportunities and implement solutions to increase data interoperability and standardization among systems and across other business units.
- Develop and implement pipelines to monitor and improve both internal and external (i.e., from CRO partners) data quality.
- Work with informatics and AI engineers optimizing the utility of data for their respective pipelines.
- Monitor and optimize the performance of data architectures and platforms.
- Develop or implement critical metrics to measure the impact of the overall data strategy.
- Stay up to date with the latest advances in the field and, as appropriate, evaluate them for adoption
This role requires:
An advanced degree in computer science, statistics, biomedical informatics, or a related field is preferred (PhD \+ 2 years of experience or an MS \+ 4 years of relevant experience). A minimum of 5 years’ experience developing and leading the implementation of data engineering solutions, including accountability for the success of the implementation, in life sciences or healthcare
- Demonstrated expertise in designing and maintaining infrastructure and architecture for clinical or biomedical data in a healthcare or life sciences setting.
- Expertise in modern data platforms (e.g., Snowflake, Redshift, BigQuery, Databricks) and programming languages such as Python, SQL, R, etc.
- Maintain and manage code repositories (e.g Bitbucket) ensuring clean, well\-documented code with proper version control.
- Proficiency in cloud architecture (e.g. AWS, Azure, GCP) and DevOps practices. Recognized certifications are a plus.
- Experience building, scaling, and maintaining pipelines for structured and unstructured data. Ability to integrate pipelines across the enterprise is essential.
- Deep understanding of regulatory frameworks (HIPAA, GDPR, 21 CFR Part 11\) and clinical data standards (CDISC, HL7, FHIR).
- Knowledge of machine learning pipelines and integration with clinical data platforms.
- May require travel up to 20%.
Does this sound like you? Apply now to take your first step towards living the Regeneron Way! We have an inclusive culture that provides comprehensive benefits, which vary by location. In the U.S., benefits may include health and wellness programs (including medical, dental, vision, life, and disability insurance), fitness centers, 401(k) company match, family support benefits, equity awards, annual bonuses, paid time off, and paid leaves (e.g., military and parental leave) for eligible employees at all levels! For additional information about Regeneron benefits in the US, please visit https://careers.regeneron.com/en/working\-at\-regeneron/total\-rewards/. For other countries’ specific benefits, please speak to your recruiter.
Please be advised that at Regeneron, we believe we are most successful and work best when we are together. For that reason, many of Regeneron’s roles are required to be performed on\-site. Please speak with your recruiter and hiring manager for more information about Regeneron’s on\-site policy and expectations for your role and your location.
Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or parental status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. The Company will also provide reasonable accommodation to the known disabilities or chronic illnesses of an otherwise qualified applicant for employment, unless the accommodation would impose undue hardship on the operation of the Company's business.
For roles in which the hired candidate will be working in the U.S., the salary ranges provided are shown in accordance with U.S. law and apply to U.S.\-based positions. For roles which will be based in Japan and/or Canada, the salary ranges are shown in accordance with the applicable local law and currency. If you are outside the U.S, Japan or Canada, please speak with your recruiter about salaries and benefits in your location.
Please note that certain background checks will form part of the recruitment process. Background checks will be conducted in accordance with the law of the country where the position is based, including the type of background checks conducted. The purpose of carrying out such checks is for Regeneron to verify certain information regarding a candidate prior to the commencement of employment such as identity, right to work, educational qualifications etc.
Salary Range (annually)
$150,500\.00 \- $245,500\.00
Salary Context
This $150K-$245K range is above the 75th percentile for Data Engineer roles in our dataset (median: $160K across 37 roles with salary data).
Role Details
About This Role
Data Engineers build the pipelines that feed AI models. They design ETL workflows, manage data lakes, and ensure training and inference data is clean, timely, and accessible. Without good data engineering, AI projects fail. It's that simple.
The AI era has expanded the data engineer's scope far beyond batch ETL jobs. You're building real-time embedding pipelines for RAG systems, managing vector databases, ensuring training data quality at scale, and building the infrastructure that lets ML teams iterate on data as fast as they iterate on models. Data quality is the biggest predictor of model quality, and you're the person responsible for it.
Across the 3,823 AI roles we're tracking, Data Engineer positions make up 1% of the market. At Regeneron, this role fits into their broader AI and engineering organization.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
What the Work Looks Like
A typical week includes: debugging a data pipeline that's producing stale embeddings for the RAG system, optimizing a Spark job that processes training data, building a data quality monitoring dashboard, meeting with the ML team to understand their next data requirements, and writing dbt models that transform raw event data into ML-ready features. The work is deeply technical and high-impact.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
Skills Required
SQL, Python, and distributed systems (Spark, Airflow, dbt) are core. Cloud data platforms (Snowflake, BigQuery, Redshift) are increasingly standard. Many AI-focused roles also want familiarity with vector databases and embedding pipelines. Understanding data modeling, pipeline orchestration, and data quality frameworks covers the essentials.
AI-specific data engineering skills include: building feature stores, managing training data versioning, implementing data lineage tracking, and building real-time embedding pipelines. Experience with streaming systems (Kafka, Flink) is valuable for real-time AI applications. Understanding ML data requirements (balanced datasets, data augmentation, evaluation set construction) makes you much more effective working with ML teams.
Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
Compensation Benchmarks
Data Engineer roles pay a median of $208,300 based on 266 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. This role's midpoint ($198K) sits 5% below the category median. Disclosed range: $150K to $245K.
Across all AI roles, the market median is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($275,000) and AI Safety ($274,200). By seniority level: Entry: $97,880; Mid: $165,000; Senior: $227,400; Director: $247,800; VP: $250,000.
Regeneron AI Hiring
Regeneron has 2 open AI roles right now. They're hiring across Data Engineer, AI Product Manager. Based in Tarrytown, NY, US. Compensation range: $179K - $245K.
Location Context
Across all AI roles, 15% (590 positions) offer remote work, while 3,217 require on-site attendance. Top AI hiring metros: New York (2,643 roles, $211,000 median); San Francisco (2,168 roles, $253,000 median); Los Angeles (1,792 roles, $191,580 median).
Career Path
Common paths into Data Engineer roles include Backend Engineer, Database Administrator, Analytics Engineer.
From here, career progression typically leads toward Senior Data Engineer, ML Engineer, Data Platform Lead.
Master SQL and Python first. Then learn a distributed processing framework (Spark or its modern alternatives) and a pipeline orchestrator (Airflow, Dagster, Prefect). Build a portfolio project that demonstrates end-to-end pipeline construction: ingest, transform, validate, serve. If you want to specialize in AI data engineering, add vector databases and embedding pipelines to your skill set.
What to Expect in Interviews
Expect SQL deep-dives (query optimization, partitioning strategies, data modeling), Python coding focused on data pipeline patterns, and system design questions about building scalable ETL workflows. Companies with ML teams will ask about feature stores, embedding pipelines, and training data management. Be ready to discuss data quality monitoring, pipeline orchestration, and how you'd handle schema evolution in a production data lake.
When evaluating opportunities: Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
AI Hiring Overview
The AI job market has 3,823 open positions tracked in our dataset. By seniority: 112 entry-level, 1,798 mid-level, 1,516 senior, and 397 leadership roles (Director, VP, C-Level). Remote roles make up 15% of the market (590 positions). The remaining 3,217 roles require on-site or hybrid attendance.
The market median for AI roles is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($275,000 median, 41 roles); AI Safety ($274,200 median, 55 roles); Research Engineer ($260,000 median, 434 roles).
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
The AI Job Market Today
The AI job market spans 3,823 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,629), Data Scientist (322), AI Software Engineer (279). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (112) are outnumbered by mid-level (1,798) and senior (1,516) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 397 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 15% of all AI roles (590 positions), with 3,217 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $200,100. Top-quartile roles start at $253,500, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $275,000 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Python (1,979 postings), Aws (1,190 postings), Azure (899 postings), Rag (839 postings), Gcp (726 postings), Pytorch (595 postings), Prompt Engineering (595 postings), Claude (540 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.