SDE 3, AI Infrastructure

Austin, TX, US Senior MLOps Engineer

Interested in this MLOps Engineer role at Oracle?

Apply Now →

Skills & Technologies

DockerKubernetesRAGRust

About This Role

Here at OCI we’re building the world’s largest AI clusters and we’re the fastest at bringing them to the market. The AI Infrastructure organization at OCI is leading this effort by creating a GPU focused cloud with the latest hardware providing the best performance, efficiency, reliability, and scalability. This is your chance to be part of the AI revolution by creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance. You will have the opportunity to work with cutting-edge technologies and make a significant impact on our organization's success.

We are seeking Software Engineers who can bring fresh ideas and embrace challenges to scale and optimize AI infrastructure components like GPU control plane and GPU data plane that provide computing resources to customer AI workloads. In this role, you will ensure top performance for AI workloads scheduled on our platform. You will design and develop solutions to enhance our AI infrastructure to deliver exceptional customer experience and peak performance.

Responsibilities

  • Design and develop large-scale distributed software services and solutions to manage AI infrastructure of OCI.
  • Write high quality and maintainable code by leveraging design reviews, code reviews, unit tests and integration tests.
  • Develop complete solutions by ensuring that the services and the components are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.
  • Focus on customer needs through a data driven approach.
  • Collaborate with other team members working on the same project to meet customer requirements.
  • Troubleshoot and optimize automation for reliability, performance, and availability.

Qualifications & Skills

  • BS (or equivalent experience) in Computer Science, Engineering, or related field.
  • 3 years of experience in software development with programming languages including, but not limited to, C, C++, C#, Java, Go, Rust.
  • 1 year of experience designing and developing distributed systems and services.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Preferred Qualifications

  • Experience in managing cloud infrastructure with hundreds of thousands of servers.
  • Experience in containerization technologies such as Docker and Kubernetes.
  • Experience in scheduling high-performance workloads on Kubernetes or Slurm.

Role Details

Company Oracle
Title SDE 3, AI Infrastructure
Location Austin, TX, US
Category MLOps Engineer
Experience Senior
Salary Not disclosed
Remote No

Get Weekly AI Career Intelligence

Salary data, skills demand, and market signals from 16,000+ AI job postings. Every Monday.