Interested in this Data Engineer role at The Emmes Company, LLC?
Apply Now →Skills & Technologies
About This Role
Overview:
Data Engineer \- Veridix AI US Remote
Veridix AI is the technology, data, and AI arm of the Emmes Group, a leading full\-service contract research organization (CRO) with over 47 years of experience in supporting clinical research across more than 70 countries. With industry\-leading capabilities in cell and gene therapy, vaccines, infectious diseases, and ophthalmology, Emmes is one of the top clinical service providers to the U.S. government and is rapidly expanding its presence in biopharma.
Veridix AI develops advanced eClinical solutions, powering clinical trials through patient data collection, randomization, biospecimen tracking, and data quality monitoring. Our cutting\-edge AI innovations, including Generative AI (GenAI) capabilities, are transforming clinical trial timelines by streamlining processes from document authoring to automating study builds.
Our “Character Achieves Results” culture is driven by five key values that guide our actions in the way we conduct research and distinguish us as an organization: Integrity, Agility, Passion for Excellence, Collaborative Partnerships and Intellectual Curiosity. If you share our motivations and passion in research, come join us! You will be joining a collaborative culture that empowers every Emmes employee — from entry level through top executive — to contribute to our clients’ success by sharing ideas openly and honestly. Primary Purpose
The Data Engineer will have a strong background in data engineering and extensive experience with AWS Cloud services. As a Data Engineer, they are responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support our data analytics and business intelligence initiatives.
Responsibilities:
- Design, develop, and maintain robust data pipelines and ETL processes to ingest, transform, and store data from various sources.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements, design data models, and deliver solutions that meet business needs.
- Automate data workflows and implement monitoring and logging to ensure the health and performance of the data infrastructure.
- Conduct data profiling, cleansing, and validation to ensure high data quality standards.
- Optimize data storage and retrieval performance, ensuring data quality and integrity.
- Implement and manage data architecture on AWS, ensuring scalability, reliability, and security.
- Stay up to date with the latest trends and best practices in data engineering and AWS cloud technologies.
Qualifications:
- Bachelor’s or master’s degree in computer science, Information Technology, or a related field.
- 3 or more years of related professional experience.
- Experience in data engineering strong focus on AWS cloud services.
- Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL, Redshift).
- Experience with AWS services such as S3, Lambda, Glue, EMR, Kinesis, and Redshift.
- Strong programming skills in languages such as Python, Java, or Scala.
- Knowledge of data modeling, ETL concepts, and data warehousing.
- Familiarity with version control systems (e.g., Git) and CI/CD pipelines.
- Excellent problem\-solving skills and attention to detail.
- Knowledge of machine learning frameworks and data science workflows.
- Familiarity with data visualization tools (e.g., QuickSight, Qlik).
- Familiarity with NoSQL databases (e.g., DynamoDB, MongoDB).
- Strong collaboration skills with cross\-functional teams to establish best design and user flows for applications.
- Strong multitasking, problem solving, and organizational skills.
- Proven ability to work independently and in a team environment.
- Satisfactory background check required.
More about The Emmes Group
Emmes Group: Building a better future for us all.
Emmes Group is transforming the future of clinical research, bringing the promise of new medical discovery closer within reach for patients. Emmes Group was founded as Emmes more than 47 years ago, becoming one of the primary clinical research providers to the US government before expanding into public\-private partnerships and commercial biopharma. Emmes has built industry leading capabilities in cell and gene therapy, vaccines and infectious diseases, ophthalmology, rare diseases, and neuroscience.
We believe the work we do will have a direct impact on patients’ lives and act accordingly. We strive to build a collaborative culture at the intersection of being a performance and people driven company. We’re looking for talented professionals eager to help advance clinical research as we work to embed innovation into the fabric of our company. If you share our motivations and passion in research, come join us! *Why work at Emmes?*
At Emmes, your actions and hard work will have a direct impact on public health initiatives, both globally and in our local communities with opportunities for volunteerism through our Emmes Cares community engagement program. We offer a competitive benefits package focused on the health and needs of our growing workforce, including:* Flexible Approved Time Off
- Tuition Reimbursement
- 401k Retirement Plan
- Work From Home Anywhere in the US
- Maternal/Paternal Leave
- Casual Dress Code \& Work Environment
CONNECT WITH US!*Follow us on Twitter \- @EmmesCRO**Find us on LinkedIn \- Emmes* *The Emmes Company, LLC is an equal opportunity employer and does not discriminate in its selection and employment practices. All qualified applicants will receive consideration for employment without regard to disability or protected veteran status.*
\#LI\-Remote
Role Details
About This Role
Data Engineers build the pipelines that feed AI models. They design ETL workflows, manage data lakes, and ensure training and inference data is clean, timely, and accessible. Without good data engineering, AI projects fail. It's that simple.
The AI era has expanded the data engineer's scope far beyond batch ETL jobs. You're building real-time embedding pipelines for RAG systems, managing vector databases, ensuring training data quality at scale, and building the infrastructure that lets ML teams iterate on data as fast as they iterate on models. Data quality is the biggest predictor of model quality, and you're the person responsible for it.
Across the 3,823 AI roles we're tracking, Data Engineer positions make up 1% of the market. At The Emmes Company, LLC, this role fits into their broader AI and engineering organization.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
What the Work Looks Like
A typical week includes: debugging a data pipeline that's producing stale embeddings for the RAG system, optimizing a Spark job that processes training data, building a data quality monitoring dashboard, meeting with the ML team to understand their next data requirements, and writing dbt models that transform raw event data into ML-ready features. The work is deeply technical and high-impact.
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
Skills Required
SQL, Python, and distributed systems (Spark, Airflow, dbt) are core. Cloud data platforms (Snowflake, BigQuery, Redshift) are increasingly standard. Many AI-focused roles also want familiarity with vector databases and embedding pipelines. Understanding data modeling, pipeline orchestration, and data quality frameworks covers the essentials.
AI-specific data engineering skills include: building feature stores, managing training data versioning, implementing data lineage tracking, and building real-time embedding pipelines. Experience with streaming systems (Kafka, Flink) is valuable for real-time AI applications. Understanding ML data requirements (balanced datasets, data augmentation, evaluation set construction) makes you much more effective working with ML teams.
Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
Compensation Benchmarks
Data Engineer roles pay a median of $208,300 based on 266 positions with disclosed compensation. Mid-level AI roles across all categories have a median of $165,000.
Across all AI roles, the market median is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Engineering Manager ($275,000) and AI Safety ($274,200). By seniority level: Entry: $97,880; Mid: $165,000; Senior: $227,400; Director: $247,800; VP: $250,000.
The Emmes Company, LLC AI Hiring
The Emmes Company, LLC has 1 open AI role right now. They're hiring across Data Engineer. Based in Rockville, MD, US.
Location Context
Across all AI roles, 15% (590 positions) offer remote work, while 3,217 require on-site attendance. Top AI hiring metros: New York (2,643 roles, $211,000 median); San Francisco (2,168 roles, $253,000 median); Los Angeles (1,792 roles, $191,580 median).
Career Path
Common paths into Data Engineer roles include Backend Engineer, Database Administrator, Analytics Engineer.
From here, career progression typically leads toward Senior Data Engineer, ML Engineer, Data Platform Lead.
Master SQL and Python first. Then learn a distributed processing framework (Spark or its modern alternatives) and a pipeline orchestrator (Airflow, Dagster, Prefect). Build a portfolio project that demonstrates end-to-end pipeline construction: ingest, transform, validate, serve. If you want to specialize in AI data engineering, add vector databases and embedding pipelines to your skill set.
What to Expect in Interviews
Expect SQL deep-dives (query optimization, partitioning strategies, data modeling), Python coding focused on data pipeline patterns, and system design questions about building scalable ETL workflows. Companies with ML teams will ask about feature stores, embedding pipelines, and training data management. Be ready to discuss data quality monitoring, pipeline orchestration, and how you'd handle schema evolution in a production data lake.
When evaluating opportunities: Strong postings specify the data stack, mention ML pipeline work, and describe the scale of data you'll be working with. Look for companies that understand the connection between data quality and model quality. Avoid roles that conflate data engineering with data analysis.
AI Hiring Overview
The AI job market has 3,823 open positions tracked in our dataset. By seniority: 112 entry-level, 1,798 mid-level, 1,516 senior, and 397 leadership roles (Director, VP, C-Level). Remote roles make up 15% of the market (590 positions). The remaining 3,217 roles require on-site or hybrid attendance.
The market median for AI roles is $200,100. Top-quartile compensation starts at $253,500. The 90th percentile reaches $307,500. Highest-paying categories: AI Engineering Manager ($275,000 median, 41 roles); AI Safety ($274,200 median, 55 roles); Research Engineer ($260,000 median, 434 roles).
Data Engineer demand in AI contexts is strong and growing. Every company building AI needs clean, reliable data pipelines. The shift toward real-time AI applications (chatbots, recommendation engines, agent systems) means data engineering is more critical than ever. Companies are willing to pay premium salaries for data engineers with AI/ML pipeline experience.
The AI Job Market Today
The AI job market spans 3,823 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,629), Data Scientist (322), AI Software Engineer (279). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.
The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (112) are outnumbered by mid-level (1,798) and senior (1,516) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 397 positions, representing the bottleneck between technical execution and organizational strategy.
Remote work availability sits at 15% of all AI roles (590 positions), with 3,217 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.
AI compensation is structured in clear tiers. The market median sits at $200,100. Top-quartile roles start at $253,500, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.
Category matters for compensation. AI Engineering Manager roles lead at $275,000 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.
The most in-demand skills across all AI postings: Python (1,979 postings), Aws (1,190 postings), Azure (899 postings), Rag (839 postings), Gcp (726 postings), Pytorch (595 postings), Prompt Engineering (595 postings), Claude (540 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.
Frequently Asked Questions
Get Weekly AI Career Intelligence
Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.