Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 21 weeks ago
DevOps / Sysadmin
Full Time
USA
$120,000 - $160,000/year

Overview

iCapital is powering the world’s alternative investment marketplace. Our financial technology platform has transformed how advisors, wealth management firms, asset managers, and banks evaluate and recommend bespoke public and private market strategies for their high-net-worth clients. iCapital services approximately $214 billion in global client assets invested in 1,731 funds, as of December 2024.

In Short

  • Design, implement, and maintain service level objectives (SLOs) that align with business goals and customer expectations.
  • Develop observability strategies, focusing on meaningful metrics that drive actionable insights.
  • Architect and implement scalable infrastructure solutions using cloud-native technologies and infrastructure as code.
  • Drive automation initiatives to eliminate toil and improve system reliability.
  • Champion reliability best practices across development teams through consultation and tooling.
  • Design and operation of a Kubernetes environment for container management and orchestration.
  • Lead incident response, conduct thorough postmortems, and drive systematic improvements.
  • Participate in on-call rotations with a focus on continuous service improvement.

Requirements

  • 5+ years of SRE experience or related experience with 3+ years in AWS
  • Strong experience with container orchestration platforms like Kubernetes and related ecosystem tools
  • Working knowledge of databases such as MongoDB, Postgres, DynamoDB
  • Strong foundation in reliability engineering principles and distributed systems behavior
  • Experience defining and implementing SLOs/SLIs and using them to drive system improvements
  • Demonstrated ability to design and implement observability solutions that provide actionable insights while minimizing alert fatigue
  • Coding abilities in at least one IaC language, with Terraform strongly preferred and one programming language such as Python, Ruby or Java with a focus on maintainable, tested code
  • Understand modern observability practices and experience implementing and maintaining monitoring solutions such as Prometheus/Grafana, Splunk, NewRelic, CloudWatch, and ELK in the cloud
  • Strong incident response skills with experience leading incident retrospectives and driving improvements
  • Excellent problem-solving abilities and experience debugging distributed systems
  • Track record of successfully automating operations and reducing toil
  • Strong communication skills with ability to explain complex technical concepts to diverse audiences

Benefits

  • The base salary range for this role is $120,000 to $160,000. iCapital offers a compensation package which includes salary, equity for all full-time employees, and an annual performance bonus.
  • Employees also receive a comprehensive benefits package that includes an employer matched retirement plan, generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling, parental leave, and unlimited paid time off (PTO).
  • We believe the best ideas and innovation happen when we are together. Employees in this role will work in the office Monday-Thursday, with the flexibility to work remotely on Friday.
iCapital logo

iCapital

iCapital is a leading financial technology platform that revolutionizes the alternative investment marketplace, enabling advisors, wealth management firms, asset managers, and banks to effectively evaluate and recommend tailored public and private market strategies for high-net-worth clients. With approximately $209 billion in global client assets invested across 1,690 funds as of November 2024, iCapital has earned recognition as a top fintech company, being named to the Forbes Fintech 50 for seven consecutive years and receiving multiple awards for its innovative solutions. The company is committed to providing exceptional client service and fostering inclusive workplace practices.

Share This Job!

Save This Job!

Similar Jobs:

Software Mind logo

Site Reliability Engineer - Remote

Software Mind

19 weeks ago

Software Mind is looking for a Site Reliability Engineer to enhance the reliability of their software systems in a flexible and supportive work environment.

LATAM
Full-time
DevOps / Sysadmin
Jackbox Games logo

Site Reliability Engineer - Remote

Jackbox Games

20 weeks ago

Join Jackbox Games as a Site Reliability Engineer to maintain AWS infrastructure and develop applications in Go.

USA
Full-time
DevOps / Sysadmin
$103,326 - $190,465/year
Pinterest logo

Site Reliability Engineer - Remote

Pinterest

20 weeks ago

Pinterest is seeking a Site Reliability Engineer to ensure the reliability of its large-scale distributed systems.

USA
Full-time
Software Development
Printify logo

Site Reliability Engineer - Remote

Printify

20 weeks ago

Join our team as a Site Reliability Engineer, responsible for ensuring the reliability of our distributed systems and platforms in a dynamic international environment.

Worldwide
Full-time
DevOps / Sysadmin
Zepz logo

Site Reliability Engineer - Remote

Zepz

20 weeks ago

Join Zepz as a Site Reliability Engineer to enhance service stability and resilience through innovative automation and observability practices.

South Africa
Full-time
DevOps / Sysadmin