Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 1 week ago

Overview

The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams’ capability to design, build and operate robust systems at scale. Pinterest’s applications and infrastructure that handle billions of monthly page views and petabytes of data as Pinterest continues to grow and scale. As a Pinterest SRE, you will design and build systems, platforms, tools, frameworks and methodologies to assure the reliability of our large-scale distributed systems.

In Short

  • Develop software solutions to enable reliability and operability of large scale distributed systems handling petabytes of data and serving
  • Build a deep understanding of how Pinterest’s systems behave, scale, interact and fail, and use that insight to identify risks and opportunities for remediation
  • Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across Pinterest Engineering
  • Build meaningful, insightful and actionable SLIs
  • Automate critical portions of Pinterest’s engineering processes, to minimize risk and maximize the speed of innovation
  • Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world

Requirements

  • 5+ years of industry experience, building and operating large scale, high performance distributed systems
  • Experience programming with Python or Go
  • Strong knowledge of Linux/Unix/BSD internals and experience working with open source software (e.g. MySQL, Hadoop, Envoy, HAProxy, Nginx)
  • Experience with technologies such as ElasticSearch, ZooKeeper, HBase, Hadoop, Memcache and Kafka with a focus on reliability, automation, operability and performance
  • Infrastructure as code a plus (e.g. Terraform, Puppet, Chef, Ansible, Salt, Fabric, Docker, etc)
  • Bonus points if experienced with deploying web apps to cloud infrastructure (AWS, etc.) and working with distributed, service-oriented architecture
  • Bachelor’s degree in a relevant field such as Computer Science, or equivalent experience

Benefits

  • Flexible work model with in-office collaboration 1-2 times every 6 months
  • Opportunity to work on large-scale systems
  • Support for professional growth and development
  • Inclusive and diverse work environment

Similar Jobs:

Software Mind logo

Site Reliability Engineer - Remote

Software Mind

2 days ago

Software Mind is looking for a Site Reliability Engineer to enhance the reliability of their software systems in a flexible and supportive work environment.

Site Reliability Engineering
Cloud Native Applications
Azure
AWS
LATAM
Full-time
DevOps / Sysadmin
Jackbox Games logo

Site Reliability Engineer - Remote

Jackbox Games

7 days ago

Join Jackbox Games as a Site Reliability Engineer to maintain AWS infrastructure and develop applications in Go.

Site Reliability Engineering
AWS
GO
ECS
USA
Full-time
DevOps / Sysadmin
$103,326 - $190,465/year
Printify logo

Site Reliability Engineer - Remote

Printify

1 week ago

Join our team as a Site Reliability Engineer, responsible for ensuring the reliability of our distributed systems and platforms in a dynamic international environment.

Site Reliability Engineering
System Design
Development
Configuration
Worldwide
Full-time
DevOps / Sysadmin
Zepz logo

Site Reliability Engineer - Remote

Zepz

1 week ago

Join Zepz as a Site Reliability Engineer to enhance service stability and resilience through innovative automation and observability practices.

SRE
DevOps
Automation
Monitoring
South Africa
Full-time
DevOps / Sysadmin

Number8

Site Reliability Engineer - Remote

Number8

1 week ago

Software Mind is looking for a Site Reliability Engineer to ensure the reliability and performance of software systems.

Site Reliability Engineering
Cloud Native Applications
Azure
AWS
USA
Full-time
Software Development