Remote Otter LogoRemoteOtter

Site Reliability Engineer (SRE) - Remote

Posted 4 days ago
DevOps / Sysadmin
Full Time
Brazil

Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform.

In Short

  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.
  • Design and implement robust monitoring, alerting, and observability solutions.
  • Automate deployment, scaling, and management of our cloud-native infrastructure.
  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Participate in on-call rotations and provide rapid response to production incidents.
  • Collaborate closely with development teams to build reliable systems.
  • Lead incident response efforts and conduct thorough post-mortems.
  • Optimize infrastructure for performance and cost-effectiveness.
  • Implement and enforce security best practices across all systems.
  • Create and maintain comprehensive documentation.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • 5+ years of experience in DevOps, SRE, or similar roles.
  • Strong experience with cloud platforms (AWS, GCP, Azure).
  • Proficiency in at least one programming/scripting language (Python, Go, Bash).
  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation).
  • Solid background in containerization technologies (Docker, Kubernetes).
  • Proven experience with monitoring and observability tools (Prometheus, Grafana).
  • Strong understanding of CI/CD pipelines and automation.
  • Exceptional troubleshooting and problem-solving skills.

Benefits

  • Opportunity to work on cutting-edge Generative AI technology.
  • Collaborative and innovative work environment.
  • Professional development and growth opportunities.
  • Flexible work arrangements.
  • Competitive compensation package.
Articul8 logo

Articul8

Articul8 AI is a forward-thinking company dedicated to creating exceptional AI products that surpass customer expectations. With a strong focus on excellence, the team at Articul8 AI is committed to making a positive impact on the world through innovative solutions. They emphasize collaboration and creativity, fostering an environment that encourages personal and professional growth. By leveraging their expertise in AI and financial services, Articul8 AI aims to transform customer experiences and drive enterprise-level outcomes in the financial industry.

Share This Job!

Save This Job!

Similar Jobs:

OneImaging logo

Site Reliability Engineer (SRE) - Remote

OneImaging

7 days ago

Join our infrastructure team as a Site Reliability Engineer (SRE) responsible for the scalability, reliability, and performance of our cloud-based services.

USA
Full-time
DevOps / Sysadmin
Tempo logo

Site Reliability Engineer (SRE) - Remote

Tempo

2 weeks ago

Join Tempo as a Site Reliability Engineer to build and maintain infrastructure for innovative time management solutions.

Worldwide
Full-time
DevOps / Sysadmin

ZENVIA

Site Reliability Engineer - SRE - Remote

ZENVIA

2 weeks ago

Join Zenvia as a Site Reliability Engineer to ensure the reliability and scalability of critical solutions in a collaborative and innovative environment.

Worldwide
Full-time
DevOps / Sysadmin
Top Hat logo

Site Reliability Engineer (SRE) - Remote

Top Hat

9 weeks ago

Join our Core Platform team as a Site Reliability Engineer to enhance software delivery performance and mentor teams in DevOps practices.

Canada
Full-time
DevOps / Sysadmin
Gorilla Logic logo

Site Reliability Engineer (SRE) - Remote

Gorilla Logic

10 weeks ago

Gorilla Logic is looking for a Site Reliability Engineer (SRE) to lead observability and monitoring initiatives using Dynatrace.

Colombia
Full-time
DevOps / Sysadmin