Lead Data Scientist at Smarsh

Interested in this Data Scientist role at Smarsh?

Apply Now →

Skills & Technologies

AwsAzureDockerHugging FaceKerasKubernetesLangchainPytorchTensorflow

About This Role

AI job market dashboard showing open roles by category

Who are we?

Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80\+ communication channels before those risks become regulatory fines or headlines. Relentless innovation has fueled our journey to consistent leadership recognition from analysts like Gartner and Forrester, and our sustained, aggressive growth has landed Smarsh in the annual Inc. 5000 list of fastest\-growing American companies since 2008\. Summary

As a Lead Data Scientist (NLP \& Financial Compliance) at Smarsh, you will spearhead the development of state\-of\-the\-art natural language processing (NLP) and large language model (LLM) solutions that power next\-generation compliance and surveillance systems. You’ll work on highly specialized problems at the intersection of natural language processing, communications intelligence, financial supervision, and regulatory compliance, where unstructured data from emails, chats, voice transcripts, and trade communications hold the keys to uncovering misconduct and risk.

The role will involve working with other Senior Data Scientists and mentoring Associate Data Scientists in analyzing complex data, generating insights, and creating solutions as needed across a variety of tools and platforms. This role demands both technical excellence in NLP modeling and a deep understanding of financial domain behavior—including insider trading, market manipulation, off\-channel communications, MNPI, bribery, and other supervisory risk areas. The ideal candidate for this position will possess the ability to perform both independent and team\-based research and generate insights from large data sets with a hands\-on/can do attitude of servicing/managing day to day data requests and analysis.

This role also offers a unique opportunity to get exposure to many problems and solutions associated with taking machine learning and analytics research to production. On any given day, you will have the opportunity to interface with business leaders, machine learning researchers, data engineers, platform engineers, data scientists and many more, enabling you to level up in true end\-to\-end data science proficiency.

### How will you contribute?

Collect, analyze, and interpret small/large datasets to uncover meaningful insights to support the development of statistical methods / machine learning algorithms.
Lead the design, training, and deployment of NLP and transformer\-based models for financial surveillance and supervisory use cases (e.g., misconduct detection, market abuse, trade manipulation, insider communication).
Development of machine learning models and other analytics following established workflows, while also looking for optimization and improvement opportunities
Data annotation and quality review
Exploratory data analysis and model fail state analysis
Contribute to model governance, documentation, and explainability frameworks aligned with internal and regulatory AI standards.
Client/prospect guidance in machine learning model and analytic fine\-tuning/development processes
Provide guidance to junior team members on model development and EDA
Work with Product Manager(s) to intake project/product requirements and translate these to technical tasks within the team’s tooling, technique and procedures
Continued self\-led personal development

### What will you bring?

Strong understanding of financial markets, compliance, surveillance, supervision, or regulatory technology
Experience with one or more data science and machine/deep learning frameworks and tooling, including scikit\-learn, H2O, keras, pytorch, tensorflow, pandas, numpy, carot, tidyverse
Command of data science and statistics principles (regression, Bayes, time series, clustering, P/R, AUROC, exploratory data analysis etc…)
Strong knowledge of key programming concepts (e.g. split\-apply\-combine, data structures, object\-oriented programming)
Solid statistics knowledge (hypothesis testing, ANOVA, chi\-square tests, etc…)
Knowledge of NLP transfer learning, including word embedding models (gloVe, fastText, word2vec) and transformer models (Bert, SBert, HuggingFace, and GPT\-x etc.)
Experience with natural language processing toolkits like NLTK, spaCy, Nvidia NeMo
Knowledge of microservices architecture and continuous delivery concepts in machine learning and related technologies such as helm, Docker and Kubernetes
Familiarity with Deep Learning techniques for NLP.
Familiarity with LLMs \- using ollama \& Langchain
Excellent verbal and written skills
Proven collaborator, thriving on teamwork

Preferred Qualifications

Master’s or Doctor of Philosophy degree in Computer Science, Applied Math, Statistics, or a scientific field
Familiarity with cloud computing platforms (AWS, GCS, Azure)
Experience with automated supervision/surveillance/compliance tools

The above salary range represents Smarsh's good faith and reasonable estimate of the range of possible base compensation at the time of posting. *Any applicable bonus programs will be discussed during the recruiting process.*

The salary for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, location, specialty and training.

Local cost of living assessments are done for each new hire at the time of offer.About our culture

Smarsh hires lifelong learners with a passion for innovating with purpose, humility and humor. Collaboration is at the heart of everything we do. We work closely with the most popular communications platforms and the world’s leading cloud infrastructure platforms. We use the latest in AI/ML technology to help our customers break new ground at scale. We are a global organization that values diversity, and we believe that providing opportunities for everyone to be their authentic self is key to our success. Smarsh leadership, culture, and commitment to developing our people have all garnered Comparably.com Best Places to Work Awards. Come join us and find out what the best work of your career looks like.

Salary Context

This $166K-$214K range is above the 75th percentile for Data Scientist roles in our dataset (median: $160K across 245 roles with salary data).

View full Data Scientist salary data →

Role Details

Company Smarsh

Title Lead Data Scientist

Location Remote, US

Category Data Scientist

Experience Senior

Salary $166K - $214K

Remote Yes

About This Role

Data Scientists extract insights and build predictive models from data. In the AI era, many roles now include LLM-powered analytics, automated reporting, and integration with generative AI tools. The role has evolved from 'the person who runs SQL queries' to 'the person who builds AI-powered data products.'

Modern data science roles fall into two camps: analytics-focused (insights, dashboards, experimentation) and ML-focused (building predictive models, recommendation systems, NLP features). The best data scientists can operate in both modes. The AI shift means that even analytics-focused roles now involve building automated insight pipelines using LLMs, going well beyond one-off reports.

Across the 4,133 AI roles we're tracking, Data Scientist positions make up 8% of the market. At Smarsh, this role fits into their broader AI and engineering organization.

Data Scientist roles remain in high demand, though the definition keeps shifting. Companies increasingly want candidates who can bridge traditional statistics with modern ML and LLM capabilities. The 'pure insights' data scientist role is consolidating into analytics engineering, while the 'build models' data scientist role is merging with ML engineering.

What the Work Looks Like

A typical week includes: analyzing experiment results for a product feature launch, building a predictive model for customer churn, creating an automated reporting pipeline using LLM-powered summarization, presenting insights to stakeholders, and cleaning data (always cleaning data). The ratio of analysis to engineering varies by company, but expect both.

Skills Required

Aws (32% of roles) Azure (24% of roles) Docker (11% of roles) Hugging Face (4% of roles) Keras (1% of roles) Kubernetes (13% of roles) Langchain (11% of roles) Pytorch (16% of roles) Tensorflow (13% of roles)

Python, SQL, and statistical modeling are the foundation. Increasingly, roles want experience with LLMs for data analysis, automated insight generation, and building AI-powered data products. Familiarity with cloud data platforms (Snowflake, BigQuery, Databricks) and ML frameworks (scikit-learn, PyTorch) covers most job requirements.

Experimentation design and causal inference are underrated skills that separate strong candidates. Companies care about whether their product changes cause improvements, and can distinguish causation from correlation. A/B testing methodology, Bayesian statistics, and the ability to communicate uncertainty to non-technical stakeholders are high-value skills.

Good postings specify the data stack, the types of problems you'll work on, and the team structure. Look for companies that differentiate between analytics and ML data science. Vague 'data scientist' postings that list every skill under the sun usually mean the company doesn't know what they need.

Compensation Benchmarks

Data Scientist roles pay a median of $198,000 based on 868 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400. Disclosed range: $166K to $214K.

Across all AI roles, the market median is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Safety ($274,200) and AI Engineering Manager ($268,700). By seniority level: Entry: $97,760; Mid: $165,778; Senior: $227,400; Director: $250,000; VP: $250,000.

Smarsh AI Hiring

Smarsh has 1 open AI role right now. They're hiring across Data Scientist. Based in Remote, US. Compensation range: $214K - $214K.

Remote Work Context

Remote AI roles pay a median of $173,300 across 2,012 positions. About 14% of all AI roles offer remote work.

Career Path

Common paths into Data Scientist roles include Data Analyst, Statistician, Quantitative Researcher.

From here, career progression typically leads toward Senior Data Scientist, ML Engineer, AI Product Manager.

Start with statistics and SQL. Build a real analysis project on public data that demonstrates insight generation alongside model building. The market values data scientists who can communicate findings clearly to business stakeholders. If you want to move toward ML engineering, invest in software engineering fundamentals and production deployment skills.

What to Expect in Interviews

Interviews combine statistics, coding, and business acumen. SQL is almost always tested, often with complex joins and window functions. Expect a case study round where you're given a business problem and asked to design an analysis plan. Coding rounds focus on pandas, statistical modeling, and visualization. The strongest differentiator is how well you communicate insights to non-technical stakeholders during presentation rounds.

When evaluating opportunities: Good postings specify the data stack, the types of problems you'll work on, and the team structure. Look for companies that differentiate between analytics and ML data science. Vague 'data scientist' postings that list every skill under the sun usually mean the company doesn't know what they need.

AI Hiring Overview

The AI job market has 4,133 open positions tracked in our dataset. By seniority: 106 entry-level, 1,901 mid-level, 1,663 senior, and 463 leadership roles (Director, VP, C-Level). Remote roles make up 14% of the market (583 positions). The remaining 3,532 roles require on-site or hybrid attendance.

The market median for AI roles is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. Highest-paying categories: AI Safety ($274,200 median, 57 roles); AI Engineering Manager ($268,700 median, 42 roles); Research Engineer ($260,000 median, 442 roles).

The AI Job Market Today

The AI job market spans 4,133 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,865), Data Scientist (339), AI Software Engineer (313). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.

The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (106) are outnumbered by mid-level (1,901) and senior (1,663) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 463 positions, representing the bottleneck between technical execution and organizational strategy.

Remote work availability sits at 14% of all AI roles (583 positions), with 3,532 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.

AI compensation is structured in clear tiers. The market median sits at $200,700. Top-quartile roles start at $254,000, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.

Category matters for compensation. AI Safety roles lead at $274,200 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.

The most in-demand skills across all AI postings: Python (2,128 postings), Aws (1,324 postings), Azure (1,003 postings), Rag (916 postings), Gcp (817 postings), Pytorch (655 postings), Prompt Engineering (639 postings), Claude (571 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.

Frequently Asked Questions

Based on 868 roles with disclosed compensation, the median salary for Data Scientist positions is $198,000. Actual compensation varies by seniority, location, and company stage.

About 14% of the 4,133 AI roles we track offer remote work. Remote availability varies by company and seniority level, with senior and leadership roles more likely to offer location flexibility.

Smarsh is among the companies actively hiring for AI and ML talent. Check our company profiles for detailed breakdowns of open roles, salary ranges, and hiring trends.

Common next steps from Data Scientist positions include Senior Data Scientist, ML Engineer, AI Product Manager. Progression depends on whether you lean toward technical depth, people management, or product strategy.

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.

Skills & Technologies

About This Role

Who are we?

Preferred Qualifications

Salary Context

Role Details

About This Role

What the Work Looks Like

Skills Required

Compensation Benchmarks

Smarsh AI Hiring

Remote Work Context

Career Path

What to Expect in Interviews

AI Hiring Overview

The AI Job Market Today

Frequently Asked Questions

Get Weekly AI Career Intelligence

Related AI Jobs