Remote Otter LogoRemoteOtter

Cloud Site Reliability Engineer (SRE) - Remote

Posted 3 days ago

Overview

BDR Solutions, LLC excels in delivering best-value services to U.S. Federal Civilian and Defense agencies, driving mission success with excellence and innovation. We specialize in modernizing government systems for health, social services, and disaster relief, enhancing veteran lives. As a service-disabled veteran-owned, 8(a), HUBZone small business, our mission is to provide unparalleled support to veterans in all sectors. We are committed to creating a future where every veteran's well-being is prioritized, combining IT expertise with compassionate care. At BDR, we are known for reliable outcomes tailored to our clients' missions, ensuring our services positively impact veterans and all our clients.

We are seeking a Cloud SRE to join our growing team! This team works across the company, and with multiple cloud partners, to make using Smile Digital Health products simple for our customers. As part of the hosting operations team, the Cloud SRE will support the building, operating and automating of infrastructure services to deliver SaaS-based solutions on Azure/AWS. This role creates a bridge between development and operations by applying a software engineering mindset to system administration topics. The incumbent will divide their time between operations/on-call duties and administering systems and software which help increase site reliability and performance.

In Short

  • Collaborate with Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for AWS, Azure and other cloud providers.
  • Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.
  • Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers.
  • Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details.
  • Develop and maintain technical relationships with our core Cloud Service Providers.
  • Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications.
  • Ensure that internal and external SLA’s meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved.
  • Create tools for automating deployment, monitoring and operations of the overall platform.
  • Participate in an on-call rotation to provide application support, incident management, and troubleshooting.
  • Provide ongoing maintenance and support of internal tools, improve system health and reliability.

Requirements

  • Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products.
  • Experience with Kubernetes, Openshift, Kafka, Elastic stack.
  • Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams.
  • Proficiency in Terraform, Ansible or Chef.
  • Expertise in troubleshooting support escalation, on-Call process optimization and documenting knowledge.
  • Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely.
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels.
  • Working knowledge of industry best practices with regard to information security.
  • Previous experience building or maintaining a large scale Cloud service.
  • Proven ability to prioritize and track multiple projects in parallel.
  • Proven ability to be highly responsive and customer-focused.

Benefits

  • Military Veterans encouraged to apply.
  • Equal Opportunity Employer.
  • Consideration for employment without regard to race, color, religion, sex, age, national origin, marital status, disability, veteran status, sexual orientation, or genetic information.

Similar Jobs:

Ryzlabs logo

Cloud Site Reliability Engineer (SRE) - Remote

Ryzlabs

4 weeks ago

RYZ is looking for a Cloud SRE to enhance system resiliency and availability for self-driving robotic carriers.

Site Reliability Engineering
DevOps
Cloud Computing
Docker
Argentina, Uruguay
Full-time
DevOps / Sysadmin
Smile Digital Health logo

Cloud Site Reliability Engineer (SRE) - Remote

Smile Digital Health

7 weeks ago

Join Smile Digital Health as a Cloud SRE to enhance healthcare data management through innovative cloud solutions.

Cloud Services
AWS
Azure
Kubernetes
Worldwide
Full-time
DevOps / Sysadmin

P.W

Site Reliability Engineer (SRE) - Remote

Point Wild

6 days ago

Join Point Wild as a Site Reliability Engineer to maintain system reliability and performance in a dynamic engineering team.

Site Reliability Engineering
DevOps
AWS
Azure
Worldwide
Full-time
DevOps / Sysadmin
Ensono logo

Site Reliability Engineer (SRE) - Remote

Ensono

6 days ago

Ensono is looking for an experienced Site Reliability Engineer (SRE) to enhance their infrastructure and service management.

Site Reliability Engineering
Infrastructure AS Code
Terraform
Azure DevOps
USA
Full-time
DevOps / Sysadmin
$93,000 - $135,000/year
Element Solutions logo

Site Reliability Engineer (SRE) - Remote

Element Solutions

1 week ago

Element is seeking a motivated Site Reliability Engineer (SRE) to enhance cloud migration and collaborate on Infrastructure as Code and CI/CD efforts.

Site Reliability Engineering
Cloud Migration
Infrastructure AS Code
CI/CD
USA
Full-time
DevOps / Sysadmin