How many AI engineering jobs are available in 2026?

Based on our analysis of 3,824 AI job postings, demand for AI engineers keeps growing. The most in-demand skills include Python, RAG systems, and LLM frameworks like LangChain.

What skills are most in-demand for AI roles?

Based on our job market analysis, the most requested skills include: Python, RAG (Retrieval-Augmented Generation), LangChain, AWS, and experience with production ML systems. Rust is emerging as a valuable skill for performance-critical AI applications.

How is this data collected?

We collect data from major job boards and company career pages, tracking AI, ML, and prompt engineering roles. Our database is updated weekly and includes only verified job postings with disclosed requirements.

When should I fine-tune vs use RAG?

Fine-tune when: you need the model to learn a specific style, format, or behavior; you have consistent, well-labeled training data; latency matters (no retrieval step); or you need the model to internalize domain knowledge. Use RAG when: information changes frequently, you need citations, you want to avoid hallucination on factual queries, or your budget is limited. For many production systems, combining both works best.

How much does LLM fine-tuning cost?

LoRA/QLoRA fine-tuning on a 7B model: $50-$500 per run on cloud GPUs. Full fine-tuning of a 7B model: $500-$5,000. Fine-tuning a 70B model with LoRA: $500-$5,000. Full fine-tuning of a 70B model: $5,000-$50,000+. API-based fine-tuning (OpenAI, Anthropic): varies by token count, typically $100-$2,000 for moderate datasets. Costs drop 30-40% year-over-year.

What is LoRA fine-tuning?

LoRA (Low-Rank Adaptation) trains small adapter layers instead of modifying all model weights. It reduces GPU memory requirements by 60-80% and training time by 40-60% compared to full fine-tuning, with minimal quality loss for most tasks. QLoRA adds 4-bit quantization, enabling fine-tuning of 70B models on a single 48GB GPU. LoRA is the default approach for most production fine-tuning in 2026.

How much training data do I need for fine-tuning?

For style and format adaptation: 100-500 high-quality examples. For domain specialization: 1,000-10,000 examples. For complex task learning: 10,000-100,000 examples. Quality matters more than quantity. 500 carefully curated examples often outperform 10,000 noisy ones. Always hold out 10-20% for evaluation and test for catastrophic forgetting on general tasks.

Can I combine fine-tuning and RAG?

Yes, and hybrid approaches often outperform either alone. Fine-tune for style, format, and base domain knowledge. Use RAG for specific facts, recent information, and citeable answers. A common pattern: fine-tune a model to follow your output format and tone, then use RAG to inject relevant context at inference time. This reduces hallucination while maintaining consistent style.

LLM Fine-Tuning Guide: When to Fine-Tune vs RAG

Fine-tuning an LLM costs between $50 and $50,000 per run depending on model size and method. RAG costs $0.01-$0.10 per query with no upfront training investment. The decision between them isn't just about cost. It's about what kind of problem you're solving, how your data changes over time, and what quality bar you need to hit.

Most teams default to RAG because it's faster to implement. That's often the right call. But there are specific scenarios where fine-tuning produces meaningfully better results. Here's how to decide, and how to execute either approach well.

When to Fine-Tune

AI market intelligence showing trends, funding, and hiring velocity

Fine-tuning is the right choice in a narrow set of high-value situations.

The Model Needs to Learn a Style or Format

When you need the model to consistently produce outputs in a specific format, tone, or structure that prompting alone can't achieve reliably. Examples:

Medical documentation that follows specific clinical terminology conventions
Legal briefs formatted according to court-specific requirements
Code generation in a proprietary language or framework
Customer communication that matches a brand's exact voice and terminology

Fine-tuning embeds these patterns into the model's weights, producing consistent output without lengthy system prompts. A well-fine-tuned model can deliver the right format in 50-100 tokens of instruction rather than 2,000 tokens of few-shot examples.

You Have High-Quality, Consistent Training Data

Fine-tuning works best when you have 500+ carefully curated examples that demonstrate exactly what you want the model to do. The data should be:

Consistent in quality (every example is correct and complete)
Representative of the full range of inputs you'll see in production
Free of contradictions (don't train the model to produce conflicting outputs)
Vetted by domain experts (not auto-generated or scraped without review)

If your data is noisy, incomplete, or contradictory, fine-tuning will learn those problems along with the intended behavior. Bad training data produces a model that's confidently wrong.

Latency Matters

RAG adds retrieval time to every query. A vector database lookup adds 10-50ms. Reranking adds another 20-100ms. For applications where total latency must be under 200ms, eliminating the retrieval step by fine-tuning knowledge directly into the model can be the right tradeoff.

This applies most to: real-time chat interfaces, voice assistants, inline code completion, and any application where users perceive delays.

You Need Domain Internalization

When the model needs to "think" in a domain's language rather than look up facts. A doctor doesn't consult a reference for every medical term. Similarly, a fine-tuned model can internalize domain vocabulary, relationships, and reasoning patterns in ways that RAG context injection can't replicate.

This works for: domain-specific reasoning, technical terminology usage, and tasks where the model needs to draw on specialized knowledge during multi-step reasoning.

When to Use RAG

RAG is the better choice for most production LLM applications.

Information Changes Frequently

If your knowledge base updates daily, weekly, or monthly, RAG wins. You update the document corpus without retraining. Fine-tuning requires a new training run every time the underlying information changes, which costs time and money.

Use RAG for: product catalogs, documentation that gets updated, news and current events, any source of truth that evolves.

You Need Citations

RAG naturally supports citations because the model generates answers based on specific retrieved documents. You can show users exactly which sources informed the answer. Fine-tuning bakes information into model weights, making it impossible to trace which training example influenced a specific output.

Use RAG for: any application where users need to verify answers, legal and medical applications, research tools, customer support where agents need to reference specific policies.

You Want to Avoid Hallucination on Factual Queries

RAG with proper retrieval reduces hallucination because the model generates answers grounded in retrieved documents rather than relying on memorized knowledge. Fine-tuning can reduce hallucination in the domain it was trained on but doesn't have the same grounding mechanism.

Your Budget Is Limited

A basic RAG system costs $100-$500 to set up (embedding generation, vector database, orchestration) and $0.01-$0.10 per query for ongoing costs. Fine-tuning a 7B model costs $50-$500 per run, and you'll run it multiple times as you iterate. For a 70B model, costs start at $5,000 per run.

If you're not sure whether fine-tuning will solve your problem, start with RAG. You can always fine-tune later if RAG doesn't meet your quality bar.

How to Fine-Tune Effectively

Step 1: Prepare Your Data

Data preparation takes 60-70% of the total fine-tuning effort. Don't rush this.

Format: JSON Lines (JSONL) with instruction/input/output triplets or conversation format depending on the model. Follow the base model's training format exactly. Quality control:

Review every example manually if you have fewer than 1,000
Sample-review 10-20% if you have more
Check for consistency: similar inputs should produce similar outputs
Remove or fix contradictory examples
Ensure coverage: your training data should represent the full distribution of production inputs

Size guidelines:

Style/format adaptation: 100-500 examples
Domain specialization: 1,000-10,000 examples
Complex task learning: 10,000-100,000 examples

Evaluation split: Hold out 10-20% for evaluation. Never train on your test data.

Step 2: Choose Your Method

LoRA (Low-Rank Adaptation)

The default choice for most fine-tuning in 2026. LoRA trains small adapter layers instead of modifying all model weights.

Benefits:

60-80% reduction in GPU memory requirements
40-60% faster training time
Can maintain multiple LoRA adapters and swap between them
Minimal quality loss for most tasks

Use LoRA when: you want to fine-tune on a single GPU, you need multiple domain-specific versions of the same base model, or you want efficient iteration.

QLoRA (Quantized LoRA)

Combines LoRA with 4-bit quantization of the base model. Enables fine-tuning of 70B parameter models on a single 48GB GPU.

Benefits:

Can fine-tune much larger models on consumer hardware
Additional cost savings over standard LoRA
Quality is within 1-3% of full LoRA for most tasks

Use QLoRA when: you want to fine-tune a large model (30B+) without multi-GPU setup, or you're iterating quickly and want to minimize compute costs.

Full Fine-Tuning

Modifies all model weights. Rarely necessary and significantly more expensive.

Use full fine-tuning when: LoRA/QLoRA can't achieve the quality you need (rare), you're training on a very large dataset (100K+ examples), or you're fine-tuning a small model (under 3B parameters) where LoRA overhead isn't justified.

API-Based Fine-Tuning

OpenAI, Google, and other providers offer fine-tuning through their APIs. You upload training data, they handle the training.

Benefits:

No GPU infrastructure to manage
Simple API interface
Pay per token of training data

Drawbacks:

Limited control over training parameters
Model weights aren't accessible (vendor lock-in)
More expensive per run than self-hosted for large datasets

Use API fine-tuning when: you don't have GPU infrastructure, your dataset is small to medium, or you're fine-tuning a model you'll serve through the same API.

Step 3: Configure Training

Key hyperparameters:

Learning rate: Start at 2e-5 for LoRA, 1e-5 for full fine-tuning. Too high causes forgetting; too low wastes compute. Epochs: 1-3 for LoRA, 1-2 for full fine-tuning. More epochs on small datasets leads to overfitting. LoRA rank (r): 8-32 for most tasks. Higher rank captures more complex patterns but uses more memory. Start at 16. LoRA alpha: Typically 2x the rank (r=16, alpha=32). Controls the scaling of LoRA updates. Batch size: As large as your GPU memory allows. Gradient accumulation can simulate larger batch sizes.

Step 4: Evaluate Rigorously

Task-specific metrics: Accuracy, F1, BLEU, ROUGE, or custom metrics depending on your task. Compare against the base model and against RAG on the same evaluation set. General capability testing: Fine-tuning can cause catastrophic forgetting, where the model gets better at your task but worse at general tasks. Test on a general benchmark (MMLU, HellaSwag) before and after fine-tuning. Human evaluation: Automated metrics don't capture everything. Have domain experts review 50-100 outputs from the fine-tuned model, scoring for accuracy, format compliance, and overall quality. A/B testing in production: The final evaluation is real-user behavior. Deploy the fine-tuned model alongside the baseline and compare user satisfaction, task completion rates, and error rates.

Hybrid Architectures: Fine-Tuning + RAG

The best production systems often combine both approaches. Fine-tune for style, format, and base domain knowledge. Use RAG for specific facts, recent information, and citeable answers.

Pattern 1: Fine-Tuned Model + RAG Retrieval

Fine-tune a model to follow your output format and tone. At inference time, retrieve relevant context from a knowledge base and include it in the prompt. The model generates answers in the right style while grounding responses in retrieved documents.

This works well for: customer support systems (consistent tone + accurate product information), medical Q&A (clinical language + current treatment protocols), and legal research (proper citation format + relevant case law).

Pattern 2: Fine-Tuned Retrieval + Base Model

Fine-tune a smaller model specifically for retrieval: given a query, identify the most relevant documents. Use a general-purpose LLM for generation with the retrieved context.

This works well when: your retrieval needs are domain-specific but your generation needs are general-purpose.

Pattern 3: Fine-Tuned Router + Specialized Models

Train a small model to route queries to the appropriate handler: RAG for factual questions, fine-tuned model for style-specific generation, base model for general queries, and deterministic code for structured tasks.

This is the most sophisticated pattern but also the most effective for complex production systems with diverse query types.

Cost Comparison

Fine-Tuning Costs (One-Time per Training Run)

LoRA on 7B model (cloud GPU): $50-$500
QLoRA on 70B model (single GPU): $500-$5,000
Full fine-tuning on 7B model: $500-$5,000
Full fine-tuning on 70B model: $5,000-$50,000+
API fine-tuning (OpenAI, moderate dataset): $100-$2,000

RAG Costs (Ongoing per Query)

Embedding generation: $0.0001-$0.001 per query
Vector database query: $0.0001-$0.001 per query
LLM generation with context: $0.01-$0.10 per query
Total per query: $0.01-$0.10

Break-Even Analysis

A LoRA fine-tuning run costs ~$200. If fine-tuning eliminates the need for RAG retrieval and reduces prompt length by 1,000 tokens per query ($0.01-$0.03 savings per query at typical API rates), the break-even is 7,000-20,000 queries. For a system handling 10,000 queries per day, fine-tuning pays for itself in 1-2 days.

For systems with low query volume (under 1,000 queries/day), RAG is almost always more cost-effective. For high-volume systems, the math favors fine-tuning, especially when combined with self-hosted inference.

Common Mistakes

Fine-Tuning on Bad Data

The most common mistake. Garbage in, garbage out. If your training examples contain errors, inconsistencies, or don't represent your production distribution, the fine-tuned model will reflect those problems. Spend twice as long on data preparation as you think you need.

Over-Fitting on Small Datasets

Training too many epochs on a small dataset causes the model to memorize training examples rather than learning patterns. Symptoms: perfect performance on training data, poor performance on held-out examples, and repetitive outputs. Solution: fewer epochs, more data, or LoRA with lower rank.

Ignoring Catastrophic Forgetting

Fine-tuning can degrade general model capabilities. If your fine-tuned model starts producing worse results on tasks outside your training domain, you've overtrained. Monitor general benchmarks alongside task-specific metrics.

Not Comparing Against RAG First

Always benchmark RAG performance before investing in fine-tuning. In many cases, a well-built RAG system with good retrieval achieves 90-95% of fine-tuning quality at a fraction of the cost and complexity. Only fine-tune if RAG demonstrably can't meet your quality requirements.

Fine-Tuning When Prompting Would Work

Sometimes the problem isn't that the model lacks knowledge. It's that the prompt isn't structured well. Before fine-tuning, try systematic prompt optimization: few-shot examples, chain-of-thought reasoning, structured output formatting. If prompting alone closes the quality gap, you've saved yourself significant time and money.

Tools and Frameworks

For Self-Hosted Fine-Tuning

Hugging Face TRL: Most popular library for LoRA/QLoRA fine-tuning
Axolotl: Higher-level wrapper around TRL with configuration-driven training
LLaMA-Factory: Focused on LLaMA family models with a web UI
Unsloth: Optimized for fast LoRA training (2x speedup claims)

For API Fine-Tuning

OpenAI Fine-Tuning API: Supports GPT-4o-mini and GPT-4o
Google Vertex AI: Supports Gemini model fine-tuning
Together AI: Supports various open-source model fine-tuning

For Evaluation

Ragas: RAG-specific evaluation framework
DeepEval: General LLM evaluation framework
Weights & Biases: Experiment tracking and comparison
Promptfoo: Prompt and model comparison testing

The choice of tools matters less than the quality of your data and evaluation methodology. Pick tools your team knows and focus your energy on data preparation and rigorous testing.

How AI Pulse data is built

Every number in this article comes from a continuously updated dataset of 3,824 weekly job postings across 42 roles and 14 industries. Salary figures are derived from postings that disclose compensation. AI penetration percentages reflect the share of postings in each function that explicitly require or prefer AI skills. Premium calculations compare median compensation for AI-skilled postings against same-function, same-seniority postings without AI requirements.

Sources & notes. AI Pulse weekly job posting index (n=3,824). Salary disclosure rate: 6.4%. Premium calculations require minimum n=20 postings per role-seniority cell. Updated weekly.

Last updated: 2026-04-03.

How this fits into the bigger career picture

Every article on AI Pulse connects back to the same dataset on AI adoption, salary premiums, and role trajectories. If you're early in your career thinking, the research index covers the full set of insights articles. If you're closer to a job move, the AI by role grid maps the adoption rate and salary premium for every function we track.

The pages that combine the data into a strategic read are the ai-for-* role hubs. Each one synthesizes the adoption story, salary thesis, displacement risk, and the strategic move for that function. If this article is about a specific role, browse the matching hub for the full picture: AI for engineering, marketing, sales, data and analytics, product management, and 19 more.

LLM Fine-Tuning Guide: When to Fine-Tune vs RAG

When to Fine-Tune

The Model Needs to Learn a Style or Format

You Have High-Quality, Consistent Training Data

Latency Matters

You Need Domain Internalization

When to Use RAG

Information Changes Frequently

You Need Citations

You Want to Avoid Hallucination on Factual Queries

Your Budget Is Limited

How to Fine-Tune Effectively

Step 1: Prepare Your Data

Step 2: Choose Your Method

Step 3: Configure Training

Step 4: Evaluate Rigorously

Hybrid Architectures: Fine-Tuning + RAG

Pattern 1: Fine-Tuned Model + RAG Retrieval

Pattern 2: Fine-Tuned Retrieval + Base Model

Pattern 3: Fine-Tuned Router + Specialized Models

Cost Comparison

Fine-Tuning Costs (One-Time per Training Run)

RAG Costs (Ongoing per Query)

Break-Even Analysis

Common Mistakes

Fine-Tuning on Bad Data

Over-Fitting on Small Datasets

Ignoring Catastrophic Forgetting

Not Comparing Against RAG First

Fine-Tuning When Prompting Would Work

Tools and Frameworks

For Self-Hosted Fine-Tuning

For API Fine-Tuning

For Evaluation

How AI Pulse data is built

How this fits into the bigger career picture

Sources

Frequently Asked Questions

Related Resources

About the Author

More Insights Like This

AI Agent Frameworks: CrewAI vs LangGraph vs AutoGen

Vector Database Selection Guide with Benchmarks

ML Portfolio Projects That Get You Hired in 2026

NLP Engineer vs LLM Engineer: Roles Compared

AI Portfolio Projects That Get You Hired

NLP Engineer Career Guide: Demand and Skills

Get Weekly AI Career Insights

Get AI Career Intel