Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 23 weeks ago
DevOps / Sysadmin
Full Time
Malaysia

Overview

As a Site Reliability Engineer at Guidewire, you’ll join a passionate team dedicated to automating every process to ensure our systems run efficiently. Our Platform team is fully committed to developing and managing software that enhances the reliability of production systems—systems that serve hundreds of customers and support millions of transactions every day.

In Short

  • Drive Reliability & Automation: Take a dedicated SRE approach to managing shared multi-tenant infrastructure for resilient SaaS microservice-based systems and customer-centric applications.
  • Oversee and continuously enhance our team’s presence in AWS by automating deployment and operational tasks.
  • Innovate and Improve Core Systems: Contribute to the development of our core infrastructure systems—adding features, fixing bugs, and implementing reliability enhancements.
  • Engineer and maintain a complex single sign-on (SSO) authentication platform based on SAML/OAuth to ensure secure, seamless access for our users.
  • Enhance Observability & Incident Management: Build and maintain comprehensive observability tooling, metrics, and dashboards to support our global platform infrastructure.
  • Improve our incident management lifecycle by identifying, mitigating, and learning from reliability risks, while helping to create a self-healing environment.
  • Empower the Team: Develop system documentation and training materials to educate and empower your teammates.
  • Collaborate with various engineering teams, providing valuable feedback and contributing code when needed to enhance our products.

Requirements

  • Bachelor’s Degree in Computer Science or a related field.
  • Proven software engineering and automation skills using Bash, Python, and/or Go.
  • Well-versed in agile development methodologies (Scrum, Kanban, etc.) and have a deep background in Linux systems.
  • Significant experience in automating and managing systems on Amazon Web Services (AWS) and supporting live production environments (Java/Apache/Tomcat).
  • Proficient with Infrastructure as Code (IaC) tools such as Terraform, Terragrunt, or Terraspace.
  • Experience with devops/gitops tools (Git, Bitbucket, Flux CD, TeamCity) for smooth code promotions.
  • Hands-on experience in containerization (Docker, Helm, Kubernetes/EKS, CNI, and Ingress networking).
  • Strong understanding of Single-Sign On, SAML, and OAuth (bonus if you’ve worked with Okta).
  • Experienced with observability tools (Datadog, CloudWatch, PagerDuty) and familiar with event store/stream-processing technologies like Kafka or AWS SQS.
  • Worked with relational databases such as Aurora Postgres or Oracle RDS.
  • Possess advanced exposure to application development, web UI design, JSON, and overall application architecture.
  • Exposure to Open Application Model systems like KubeVela or Crossplane is a plus.

Benefits

  • Opportunity to make a direct impact by ensuring our cloud platform is both robust and customer-focused.
  • Exciting challenges of solving problems at scale with technologies like AWS, Kubernetes, and Aurora.
Guidewire Software (Malaysia) Sdn Bhd logo

Guidewire Software (Malaysia) Sdn Bhd

Guidewire Software (Malaysia) Sdn Bhd is a leading provider of software solutions for the insurance industry, dedicated to enhancing the efficiency and reliability of its systems. The company focuses on automating processes to support its flagship cloud platform and InsuranceSuite products, which serve hundreds of customers and handle millions of transactions daily. With a commitment to innovation, Guidewire's Platform team collaborates closely with product developers to ensure high availability, performance, and observability of its services, while also engaging in continuous improvement of operational metrics and incident management.

Share This Job!

Save This Job!

Similar Jobs:

Ledgebrook logo

Site Reliability Engineer - Remote

Ledgebrook

23 weeks ago

Ledgebrook is seeking a Site Reliability Engineer to enhance the reliability and performance of its cloud-native infrastructure.

Worldwide
Full-time
DevOps / Sysadmin
Libertex Group logo

Site Reliability Engineer - Remote

Libertex Group

23 weeks ago

Join Libertex Group as a Site Reliability Engineer to ensure the stability and performance of our infrastructure.

Worldwide
Full-time
DevOps / Sysadmin
JatApp logo

Site Reliability Engineer - Remote

JatApp

24 weeks ago

Join JatApp as a Site Reliability Engineer to manage and optimize a cross-platform VPN service.

Worldwide
Full-time
DevOps / Sysadmin
Goodleap logo

Site Reliability Engineer - Remote

Goodleap

24 weeks ago

GoodLeap is seeking a Site Reliability Engineer to ensure the reliability and performance of its applications and services.

USA
Full-time
DevOps / Sysadmin
JatApp logo

Site Reliability Engineer - Remote

JatApp

24 weeks ago

Join JATAPP as a Site Reliability Engineer to manage and optimize a cross-platform VPN service.

Worldwide
Full-time
DevOps / Sysadmin