Senior AI Agent & Evaluations Engineer

Remote Senior AI Agent Developer

Interested in this AI Agent Developer role at Vacatia, Inc.?

Apply Now →

Skills & Technologies

AwsClaudePrompt EngineeringSalesforce

About This Role

AI job market dashboard showing open roles by category

Join Vacatia and Help Build the Future of AI\-Powered Vacation Ownership

Location: Portland, OR (Hybrid – Three Days In Office)

Remote considered for exceptional candidates.

About Vacatia

Vacatia is building the future of vacation ownership. We operate in a fragmented, operationally complex industry where AI has the potential to fundamentally transform how decisions are made, how customers are supported, and how businesses scale.

We're developing AI agents that sit at the center of critical business workflows—helping owners, supporting operations, surfacing insights, and automating decisions that historically required significant human effort. These agents interact with real customers and influence real business outcomes, making reliability, safety, and performance essential.

We're looking for a hands\-on Senior AI Agent \& Evals Engineer to own the intelligence layer behind these systems. You'll be responsible for designing agent behavior, building evaluation frameworks, creating guardrails, and continuously improving agent performance as our AI footprint expands across the organization.

If you're passionate about prompt engineering, agent reliability, and creating measurable AI systems that solve meaningful business problems, we'd love to meet you.

Why You'll Love Working at Vacatia

Build the Future of Applied AI

Design and improve AI agents that directly impact customer experiences, operational efficiency, and business outcomes across our organization.

Work on Problems That Matter

Your work will influence real\-world decisions involving customer communications, mortgage outcomes, rental operations, and owner experiences.

Own the Intelligence Layer

Take full ownership of prompt design, agent behavior, evaluation systems, guardrails, and continuous performance improvement.

Measure What Matters

Build sophisticated evaluation frameworks, golden datasets, and automated scoring systems that ensure our agents continually improve.

Partner Across the Business

Collaborate closely with engineers, operators, and subject matter experts to transform business knowledge into scalable AI systems.

Join a Small Team with Outsized Impact

Work alongside experienced engineers and leaders who believe AI can create meaningful competitive advantages in a traditionally underserved industry.

Your Impact

  • Design, refine, and optimize prompts, tool definitions, routing logic, and decision\-making behavior across Vacatia's AI agent ecosystem
  • Build and maintain evaluation frameworks, golden datasets, grading systems, and regression testing pipelines that measure agent quality and reliability
  • Develop guardrails and safe\-failure mechanisms that ensure agents operate responsibly in customer\-facing and financially sensitive workflows
  • Monitor production performance, investigate failures, identify edge cases, and continuously improve agent outcomes through data\-driven iteration
  • Partner with business stakeholders to translate policies, operational requirements, and domain expertise into measurable agent behavior
  • Collaborate with engineering teams to define context requirements, tool contracts, and integration specifications that support agent success
  • Create scalable frameworks and reusable patterns for deploying AI agents across new business workflows and use cases
  • Establish best practices for prompt engineering, evaluation methodologies, observability, and agent operations

What You Bring

  • Proven experience shipping and owning production AI agents or LLM\-powered systems beyond proof\-of\-concept environments
  • Deep expertise in prompt engineering, including system prompts, tool usage, context management, output constraints, and agent behavior design
  • Hands\-on experience building evaluation frameworks using golden datasets, scoring rubrics, LLM\-as\-judge methodologies, and regression testing
  • Strong familiarity with modern AI development tools such as Claude Code, Codex, or similar coding agents
  • Experience with agent observability and evaluation platforms such as LangSmith, Langfuse, Arize, Galileo, or comparable solutions
  • Ability to distinguish prompt issues from data, tooling, model, or evaluation failures and systematically improve agent performance
  • Strong written and verbal communication skills with the ability to work effectively across engineering and business teams
  • Demonstrated ownership mindset with a passion for building reliable, measurable, and continuously improving AI systems

Strongly Preferred

  • Experience building agents that process communication\-based workflows including emails, support tickets, chat interactions, or transcripts
  • Experience with multiple agent frameworks and a practical understanding of their tradeoffs
  • Familiarity with the evolving LLM landscape and model selection strategies
  • Experience designing and implementing end\-to\-end evaluation pipelines and agent operations workflows
  • Production experience with online evaluation systems and automated scoring of live traffic

Nice to Have

  • Experience integrating AI systems with Salesforce, AWS Connect, or customer engagement platforms
  • Background in customer\-facing industries where accuracy, compliance, and communication quality are critical
  • Contributions to open\-source projects, technical writing, or public thought leadership in AI, prompt engineering, or agent development

Join Us

Join us at the forefront of applied AI innovation. If you're excited about building intelligent systems that solve complex business problems, improving agent behavior through rigorous evaluation, and helping shape the future of vacation ownership, we'd love to hear from you.

At Vacatia, you'll have the opportunity to build AI solutions that matter, work alongside talented teammates, and create technology that drives real business impact.

Role Details

Company Vacatia, Inc.
Title Senior AI Agent & Evaluations Engineer
Location Remote, US
Experience Senior
Salary Not disclosed
Remote Yes

About This Role

AI Agent Developers build autonomous systems that can reason, plan, and take actions. They design multi-step workflows, tool-use frameworks, and orchestration layers that let LLMs interact with external systems. This is the frontier of applied AI engineering.

Agent development is where the most interesting (and hardest) problems in applied AI live right now. Making an LLM answer a question is straightforward. Making it reliably execute a 15-step workflow that involves calling APIs, reading databases, making decisions, and recovering from errors is an unsolved problem. You're building systems that have to work despite the fact that the underlying model is non-deterministic.

Across the 4,133 AI roles we're tracking, AI Agent Developer positions make up 1% of the market. At Vacatia, Inc., this role fits into their broader AI and engineering organization.

AI Agent Developer is one of the newest and fastest-growing AI role categories. The market is early but accelerating as companies move beyond simple chatbots toward AI systems that can take real actions. Compensation is high because the skill set is rare and the business impact is potentially enormous.

What the Work Looks Like

A typical week includes: designing the action space and tool definitions for a new agent use case, debugging why the agent chose the wrong action sequence on a specific input, building evaluation frameworks that test agent reliability across hundreds of scenarios, optimizing the prompt chain for cost and latency, and implementing safety guardrails to prevent the agent from taking destructive actions. The work is equal parts engineering and empirical science.

AI Agent Developer is one of the newest and fastest-growing AI role categories. The market is early but accelerating as companies move beyond simple chatbots toward AI systems that can take real actions. Compensation is high because the skill set is rare and the business impact is potentially enormous.

Skills Required

Aws (32% of roles) Claude (14% of roles) Prompt Engineering (15% of roles) Salesforce (5% of roles)

Deep experience with LLM APIs and agent frameworks (LangChain, CrewAI, AutoGen). Strong understanding of prompt engineering, function calling, and error handling for non-deterministic systems. Python is standard. Experience with orchestration patterns, state management, and workflow engines adds significant value.

The best agent developers think like systems engineers. They design for failure modes, build observability into every step, and understand that agent reliability is the product. Expertise in evaluation methodology for non-deterministic systems is the differentiator. Can you measure whether your agent works 'well enough'? Can you find the edge cases where it breaks?

Look for roles that describe specific agent use cases, mention evaluation methodology, and talk about production deployment. Early-stage companies exploring agents can be exciting, but be prepared for ambiguity. The most valuable roles are at companies that have already shipped a v1 and need to make it reliable.

Compensation Benchmarks

AI Agent Developer roles pay a median of $241,950 based on 112 positions with disclosed compensation. Senior-level AI roles across all categories have a median of $227,400.

Across all AI roles, the market median is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. For comparison, the highest-paying categories include AI Safety ($274,200) and AI Engineering Manager ($268,700). By seniority level: Entry: $97,760; Mid: $165,778; Senior: $227,400; Director: $250,000; VP: $250,000.

Vacatia, Inc. AI Hiring

Vacatia, Inc. has 1 open AI role right now. They're hiring across AI Agent Developer. Based in Remote, US.

Remote Work Context

Remote AI roles pay a median of $173,300 across 2,012 positions. About 14% of all AI roles offer remote work.

Career Path

Common paths into AI Agent Developer roles include Software Engineer, LLM Engineer, Prompt Engineer.

From here, career progression typically leads toward AI Architect, Principal Engineer, Head of AI Engineering.

Build agents. That's the portfolio. Take an open-source agent framework, build something that completes a non-trivial multi-step task, evaluate it rigorously, and document what you learned about reliability, cost, and failure modes. The field is new enough that practical experience counts for more than credentials.

What to Expect in Interviews

Interviews focus on systems thinking and reliability engineering. Expect questions about agent architecture: how you'd design a multi-step workflow with error recovery, how you'd evaluate agent performance, and how you'd prevent agents from taking destructive actions. Coding exercises often involve building a simple agent with tool use and evaluating its behavior across different scenarios. Discussion of safety and guardrails is increasingly common.

When evaluating opportunities: Look for roles that describe specific agent use cases, mention evaluation methodology, and talk about production deployment. Early-stage companies exploring agents can be exciting, but be prepared for ambiguity. The most valuable roles are at companies that have already shipped a v1 and need to make it reliable.

AI Hiring Overview

The AI job market has 4,133 open positions tracked in our dataset. By seniority: 106 entry-level, 1,901 mid-level, 1,663 senior, and 463 leadership roles (Director, VP, C-Level). Remote roles make up 14% of the market (583 positions). The remaining 3,532 roles require on-site or hybrid attendance.

The market median for AI roles is $200,700. Top-quartile compensation starts at $254,000. The 90th percentile reaches $307,500. Highest-paying categories: AI Safety ($274,200 median, 57 roles); AI Engineering Manager ($268,700 median, 42 roles); Research Engineer ($260,000 median, 442 roles).

AI Agent Developer is one of the newest and fastest-growing AI role categories. The market is early but accelerating as companies move beyond simple chatbots toward AI systems that can take real actions. Compensation is high because the skill set is rare and the business impact is potentially enormous.

The AI Job Market Today

The AI job market spans 4,133 open positions across 15 role categories. The largest categories by volume: AI/ML Engineer (2,865), Data Scientist (339), AI Software Engineer (313). These three account for the majority of open positions, though smaller categories often have higher per-role compensation because of specialized skill requirements.

The seniority mix tells a story about where AI teams are in their maturity. Entry-level roles (106) are outnumbered by mid-level (1,901) and senior (1,663) positions, reflecting that most companies are past the 'build a team from scratch' phase and need experienced engineers who can ship production systems. Leadership roles (Director, VP, C-Level) total 463 positions, representing the bottleneck between technical execution and organizational strategy.

Remote work availability sits at 14% of all AI roles (583 positions), with 3,532 requiring on-site or hybrid attendance. The remote share has stabilized after the post-pandemic correction. Senior and specialized roles (Research Scientist, ML Architect) are more likely to be remote-eligible than entry-level positions, partly because experienced hires have more negotiating power and partly because these roles require less hands-on mentorship.

AI compensation is structured in clear tiers. The market median sits at $200,700. Top-quartile roles start at $254,000, and the 90th percentile reaches $307,500. These figures include base salary with disclosed compensation. Total compensation (including equity, bonuses, and sign-on) runs 20-40% higher at companies that offer those components.

Category matters for compensation. AI Safety roles lead at $274,200 median, while Prompt Engineer roles sit at $140,000. The spread between highest and lowest-paying categories reflects the premium on specialized technical skills versus broader analytical roles.

The most in-demand skills across all AI postings: Python (2,128 postings), Aws (1,324 postings), Azure (1,003 postings), Rag (916 postings), Gcp (817 postings), Pytorch (655 postings), Prompt Engineering (639 postings), Claude (571 postings). Python dominates, appearing in the vast majority of role descriptions regardless of category. Cloud platform experience (AWS, GCP, Azure) is the second most common requirement. The newer entrants to the top skills list (RAG, vector databases, LLM APIs) reflect the shift from traditional ML toward generative AI applications.

Frequently Asked Questions

Based on 112 roles with disclosed compensation, the median salary for AI Agent Developer positions is $241,950. Actual compensation varies by seniority, location, and company stage.
Deep experience with LLM APIs and agent frameworks (LangChain, CrewAI, AutoGen). Strong understanding of prompt engineering, function calling, and error handling for non-deterministic systems. Python is standard. Experience with orchestration patterns, state management, and workflow engines adds significant value.
About 14% of the 4,133 AI roles we track offer remote work. Remote availability varies by company and seniority level, with senior and leadership roles more likely to offer location flexibility.
Vacatia, Inc. is among the companies actively hiring for AI and ML talent. Check our company profiles for detailed breakdowns of open roles, salary ranges, and hiring trends.
Common next steps from AI Agent Developer positions include AI Architect, Principal Engineer, Head of AI Engineering. Progression depends on whether you lean toward technical depth, people management, or product strategy.

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.