Remote Otter LogoRemoteOtter

HPC Infrastructure Engineer - Remote

Posted 13 weeks ago
DevOps / Sysadmin
Full Time
USA

Overview

The High-Performance Computing Infrastructure Engineer is primarily responsible for the overall health and maintenance of HPC infrastructure in our managed services customer's environments.

In Short

  • Providing enterprise-level operational support to Managed Services customers for incident, problem, and change management activities.
  • Design, deploy, and manage Kubernetes clusters optimized for HPC workloads.
  • Optimize cluster performance, resource utilization, and cost-effectiveness.
  • Implement monitoring, logging, and alerting solutions for HPC Linux clusters.
  • Ensure the security of the Kubernetes infrastructure and HPC workloads.
  • Troubleshoot and resolve issues related to Kubernetes and DGX systems.
  • Stay up to date on the latest technologies and trends in Kubernetes and HPC.
  • Create and maintain detailed documentation.
  • Serve as a subject matter expert for HPC technologies.
  • Participate in on-call rotation.

Requirements

  • Bachelor’s degree or equivalent in Information Systems or related field.
  • 5+ years of expert level experience managing infrastructure in high-performance computing environments.
  • Strong understanding of Kubernetes architecture and components.
  • Hands-on experience with deploying NVIDIA DGX systems preferred.
  • Experience with deploying and managing Kubernetes clusters in production environments.
  • Experience with HPC workloads and applications.
  • Experience with containerization technologies.
  • Strong scripting skills in Bash and Python.
  • Excellent problem-solving and troubleshooting skills.
  • Managed Services or consulting experience is required.

Benefits

  • Equal opportunity employer.
  • Culture of belonging and empowerment.
  • Diversity and inclusion initiatives.
AHEAD logo

AHEAD

AHEAD is a forward-thinking company that builds platforms for digital business, focusing on cloud infrastructure, automation, analytics, and software delivery to facilitate digital transformation for enterprises. The company fosters a culture of belonging, valuing diverse perspectives and empowering employees to contribute to a collaborative environment. AHEAD is committed to equal opportunity employment and actively seeks candidates who can enhance the diversity of ideas and perspectives within the organization. With a strong emphasis on professional development, AHEAD provides its employees with access to cutting-edge technologies and opportunities for continuous learning.

Share This Job!

Save This Job!

Similar Jobs:

Paradex

Infrastructure Engineer - Remote

Paradex

6 weeks ago

Join Paradex as an Infrastructure Engineer to lead site reliability and optimize cloud infrastructure in a dynamic DeFi environment.

Worldwide
Full-time
DevOps / Sysadmin
Dropbox logo

Infrastructure Engineer - Remote

Dropbox

7 weeks ago

Join Dropbox as an Infrastructure Engineer to build scalable systems and enhance user experiences.

USA
Full-time
Software Development
$174,100 - $294,400 USD/year
Sharetec logo

Infrastructure Engineer - Remote

Sharetec

7 weeks ago

Sharetec is seeking a full-time remote Infrastructure Engineer to design, implement, and maintain their cloud and physical environments.

USA
Full-time
DevOps / Sysadmin
$110,000 - $130,000/year

Faire

Infrastructure Engineer - Remote

Faire

7 weeks ago

Join Faire as an Infrastructure Engineer to enhance our platform's performance and scalability.

Canada
Full-time
Software Development
$156,000 - $214,500/year

Freenome

Infrastructure Engineer - Remote

Freenome

8 weeks ago

Freenome is seeking an experienced Infrastructure Engineer to build a cloud-native machine learning platform for cancer detection.

Worldwide
Full-time
DevOps / Sysadmin
$188,275 - $288,500/year