Remote Otter LogoRemoteOtter

Staff Site Reliability Engineer (SRE) - Machine Learning Infrastructure - Remote

Posted 4 days ago
DevOps / Sysadmin
Full Time
Worldwide
129347 - 200824 USD/year

Overview

The Wikimedia Foundation is seeking a Staff Site Reliability Engineer (SRE) to focus on Machine Learning Infrastructure, working with a distributed team across various time zones.

In Short

  • Design and implement robust ML infrastructure for training and deploying models.
  • Improve reliability and scalability of ML systems.
  • Collaborate with ML engineers and product teams.
  • Monitor and optimize system performance.
  • Provide guidance and documentation for ML infrastructure usage.
  • Mentor team members in infrastructure management.

Requirements

  • 7+ years of experience in SRE or infrastructure engineering.
  • Expertise with on-premises ML infrastructure (Kubernetes, Docker).
  • Proficiency in infrastructure automation tools (Terraform, Ansible).
  • Experience with observability tools (Prometheus, Grafana).
  • Familiarity with Python-based ML frameworks (PyTorch, TensorFlow).
  • Strong English communication skills.

Benefits

  • Remote-first organization with a diverse workforce.
  • Competitive and equitable salary structure.
  • Support for open-source software and volunteer communities.
  • Opportunities for professional growth and mentorship.
Wikimedia Foundation logo

Wikimedia Foundation

The Wikimedia Foundation is a nonprofit organization that operates Wikipedia and other Wikimedia free knowledge projects, with a vision of a world where everyone can freely share in the sum of all knowledge. The Foundation believes in the potential of individuals to contribute to shared knowledge and advocates for policies that support free access to information. It relies on donations from millions of individuals globally and is committed to maintaining an inclusive and equitable workplace. With a remote-first approach, the Wikimedia Foundation employs staff across over 40 countries, fostering a diverse workforce dedicated to its mission.

Share This Job!

Save This Job!

Similar Jobs:

Base.org Careers Page logo

Staff Machine Learning & Infrastructure Engineer - Remote

Base.org Careers Page

15 weeks ago

Join Base Ads as a Staff Machine Learning & Infrastructure Engineer to build innovative machine learning solutions for a new onchain advertising product.

USA
Full-time
Software Development
$218,025 - $256,500 USD/year
Clerk logo

Infrastructure Engineer / Site Reliability Engineer (SRE) - Remote

Clerk

6 weeks ago

Clerk is seeking an experienced Infrastructure Engineer / SRE to manage and optimize their technology infrastructure.

USA
Full-time
DevOps / Sysadmin

Nextdoor

Machine Learning Infrastructure Engineer - Remote

Nextdoor

21 weeks ago

Join Nextdoor as a Machine Learning Infrastructure Engineer to build impactful ML systems in a collaborative environment.

CA, USA
Full-time
Software Development
$205,000 - $336,000/year
Waymo logo

Machine Learning Infrastructure Engineer - Remote

Waymo

24 weeks ago

Waymo is seeking a Machine Learning Infrastructure Engineer to develop large-scale inference solutions for autonomous driving technology.

CA, USA
Full-time
Software Development
$158,000 - $200,000 USD/year
Waymo logo

Machine Learning Infrastructure Engineer - Remote

Waymo

25 weeks ago

Waymo is seeking a Machine Learning Infrastructure Engineer to develop large-scale inference solutions for autonomous driving technology.

CA, USA
Full-time
Software Development
$192,000 - $243,000 USD/year