Remote Otter LogoRemoteOtter

Super Intelligence HPC Support Engineer - Remote

Posted 2 days ago
DevOps / Sysadmin
Full Time
USA

Overview

As a Super Intelligence HPC Support Engineer, you’ll be part of a specialized team dedicated to Lambda’s most strategic and complex customers — organizations operating hyperscale GPU clusters and pushing the boundaries of AI/ML at unprecedented scale.

In Short

  • Act as the primary technical point of escalation for Super Intelligence customers running hyperscale GPU clusters.
  • Lead incident response for complex issues, ensuring rapid triage, clear communication, and timely resolution.
  • Proactively identify risks in large environments and drive preventative improvements.
  • Partner closely with Lambda Engineering and Product teams.
  • Contribute to runbooks, best practices, and operational guides.
  • Train and mentor other support engineers.
  • Participate in a rotating on-call schedule.

Requirements

  • 7+ years of experience in HPC or cloud support engineering.
  • Proven experience managing large-scale Linux clusters.
  • Deep expertise in orchestration tools such as Kubernetes and/or Slurm.
  • Strong knowledge of GPU technologies.
  • Skilled in high-throughput networking and cluster storage solutions.
  • Familiarity with monitoring/logging platforms.
  • Experience leading incident management.
  • Ability to balance deep technical troubleshooting with clear communication.

Benefits

  • Generous cash & equity compensation.
  • Health, dental, and vision coverage for you and your dependents.
  • Wellness and Commuter stipends for select roles.
  • 401k Plan with 2% company match (USA employees).
  • Flexible Paid Time Off Plan.
Lambda logo

Lambda

Founded in 2012, Lambda is a rapidly growing AI computing platform that originated from a team of AI engineers dedicated to advancing machine learning. The company focuses on providing engineers with robust tools for deploying AI solutions that are fast, secure, and scalable, whether through powerful on-site GPU hardware or flexible cloud-based options. Lambda's AI Cloud is trusted by leading companies and research institutions, aiming to make computation as accessible and essential as electricity. With a commitment to innovation and high demand for its systems, Lambda offers competitive compensation, comprehensive benefits, and a collaborative work environment.

Share This Job!

Save This Job!

Similar Jobs:

R.O

Support Engineer Intern - Remote

RippleMatch Opportunities

32 weeks ago

Join Rockwell Automation as a Support Engineer Intern to assist customers with troubleshooting and support.

USA
Internship
All others
$20 - $32/hour
Restaurant365 logo

Data Intelligence Engineer - Remote

Restaurant365

5 weeks ago

Join Restaurant365 as a Data Intelligence Engineer to lead AI-driven automation and data integrity initiatives in revenue operations.

Worldwide
Full-time
Data Analysis
$96,400 - $120,500/year
Element Solutions logo

Threat Intelligence Engineer - Remote

Element Solutions

9 weeks ago

Join Element as a Threat Intelligence Engineer to enhance cybersecurity through advanced threat analysis and intelligence reporting.

USA
Full-time
Software Development
Braze logo

Business Intelligence Engineer - Remote

Braze

4 days ago

Join Braze as a Business Intelligence Engineer and work on building robust data pipelines and products for informed decision-making.

USA, Canada
Full-time
Data Analysis

Testlio

Business Intelligence Engineer - Remote

Testlio

1 week ago

Join Testlio as a Business Intelligence Engineer, where you will support data-driven decision-making in a fully remote environment.

Worldwide
Full-time
Data Analysis