As a Staff Site Reliability Engineer (Staff SRE) at SailPoint, you will be a key member on our Reliability Engineering team, driving reliability practices servicing the Identity Security Cloud platform. You are immensely passionate about reliability practices and operational excellence.

In Short

Make it easy for everyone to create, consume, manage, and scale reliable cloud production services to achieve more
Keep up with industry trends to improve end-to-end reliability and maintainability for all services
Coach engineering teams on observability best practices such as setting up well-defined Service Level Objectives (SLOs)
Analyze performance of services and recommend infrastructure/code changes that will improve capacity and performance
Enable our engineering teams to scale our enterprise operations by providing guidance, best practices, and support as part of an SRE Center of Excellence
Manage cross-functional requirements working with Engineering, Product, Services, and other departments
Be a mentor of quality for design reviews, code, test cases, automation, observability, root cause analysis, and self-healing
Influence architectural design, implementation, consolidation, and simplification for global scale
Drive operational excellence to deliver frictionless operation, happy on call, and optimal customer experience

Requirements

8+ years experience in SRE or DevOps production operations supporting a highly available environment for SaaS software or cloud service provider
Strong proficiency with one or more programming languages (Java, Python, Go, etc.)
Bachelor's degree in Computer Science or other technical discipline, or equivalent experience is preferred, not required
Due to FedRAMP requirements, US Citizenship is required to be considered for this role
Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code, preferably Terraform
Strong proficiency with containerization technology and/or Kubernetes
In-depth experience with metrics, tracing, and logging observability tools such as Prometheus, Grafana, Honeycomb, and Kibana
Experience with incident management, including conducting incident reviews
Strong understanding of Linux, software development, systems, networking, and Cloud concepts
A positive and collaborative demeanor, combined with the ability to coach, mentor, and delegate
Excellent communication skills
Life-long learner – you stay up to date with technology trends, spend time learning new technologies, and share your learnings with your team

Benefits

Health and wellness coverage: Medical, dental, and vision insurance
Disability coverage: Short-term and long-term disability
Life protection: Life insurance and Accidental Death & Dismemberment (AD&D)
Flexible spending accounts for health care, and dependent care; limited purpose flexible spending account
Financial security: 401(k) Savings and Investment Plan with company matching
Time off benefits: Flexible vacation policy
Holidays: 8 paid holidays annually
Sick leave
Parental support: Paid parental leave
Employee Assistance Program (EAP) and Care Counselors
Voluntary benefits: Legal Assistance, Critical Illness, Accident, Hospital Indemnity and Pet Insurance options
Health Savings Account (HSA) with employer contribution

Sailpoint Technologies

SailPoint Technologies is a leading provider of identity security solutions, dedicated to helping organizations manage and secure their digital identities. With a focus on operational excellence and reliability, SailPoint's Identity Security Cloud platform empowers businesses to create, consume, manage, and scale reliable cloud production services. The company emphasizes the importance of observability, performance analysis, and cross-functional collaboration, ensuring that engineering teams are equipped with the best practices and support needed to drive innovation and maintain high availability in a rapidly evolving technological landscape.

Share This Job!

Save This Job!

Jobs from Sailpoint Technologies:

Staff Machine Learning Engineer

Machine Learning

Python

Senior Site Reliability Engineer (SRE)

Site Reliability Engineering

Cloud Platforms

AWS