Our Company is where we transform vision into reality. It's where ideas become technologies, and cutting-edge technologies become solutions for animal care and management.
We support farmers by providing real-time actionable information to help them manage their herds. It provides pet owners with smart devices and data that give them a better understanding of their pets’ activity and health needs, enriching relationships. It helps conservationists safeguard natural environments and wildlife.
Leveraging decades of Technological Research & Development experience across many markets, technologies and species, along with development environments and Quality Assurance procedures, we're always inventing new ways to look after the health and well-being of animals. Our decades of experience keep us ahead of the curve by leveraging advanced Technological Solutions from enhancing the precious bond between people and their pets, to advancing animal healthcare and wildlife preservation.
We are looking for an exceptional Senior Site Reliability Engineer (SRE) to help establish and lead the technical practices of SRE within our CloudOps team. This is a hands-on role for an experienced professional who can implement SRE principles, build frameworks and tools to ensure system reliability, and mentor others in adopting these practices.
If you are passionate about operational excellence, love solving complex technical challenges, and thrive in highly collaborative environments, this is the role for you.
What You’ll Do:
Define and Build the SRE Function
· Help to define and implement the SRE principles and practices.
· Partner with development and DevOps teams to create Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical services.
· Advocate for and implement system architectures that prioritize reliability, scalability, and fault tolerance.
Develop Automation and Resilience
· Build automation tools to reduce toil, streamline operations, and improve reliability using Infrastructure as Code (IaC) tools like Terraform and CrossPlane.
· Implement self-healing systems, automate incident detection and response, and integrate chaos engineering practices to test system resilience.
Drive Observability and Monitoring Excellence
· Create and maintain advanced observability systems with tools like DataDog, Prometheus, and Grafana to ensure uptime and system health.
· Develop efficient alerting and monitoring strategies, including synthetic tests and automated anomaly detection.
· Strong proven experience with AWS services and using IAC with Terraform.
· Analyze system logs and telemetry data to detect patterns, identify issues, and optimize system performance.
Incident Response and Problem Solving
· Take ownership of incident response processes, ensuring swift recovery of services and conducting thorough Root Cause Analysis (RCA) for long-term improvements.
· Document incident learnings and collaborate with teams to enhance on-call processes and system documentation.
Contribute to Continuous Improvement
· Improve deployment pipelines (CI/CD) using tools like GitHub Actions, Azure DevOps, or ArgoCD, ensuring smooth and reliable releases.
· Continuously evaluate and refine operational processes to reduce manual effort and increase efficiency.
MSD Animal Health Technology Labs is a pioneering company dedicated to transforming innovative ideas into advanced technologies that enhance animal care and management. By providing farmers with actionable insights for herd management and offering pet owners smart devices to monitor their pets' health, the company enriches the human-animal bond. With decades of experience in technological research and development, MSD Animal Health is committed to advancing animal healthcare and wildlife preservation through cutting-edge solutions and quality assurance practices.
Share This Job!
Save This Job!
Jobs from MSD Animal Health Technology Labs:
Global AI Lead
Junior System Engineer - Data Analysis and Machine Learning
Electronics Engineer
Data Engineer
Mobile Automation Engineer
MSD Animal Health Technology Labs is a pioneering company dedicated to transforming innovative ideas into advanced technologies that enhance animal care and management. By providing farmers with actionable insights for herd management and offering pet owners smart devices to monitor their pets' health, the company enriches the human-animal bond. With decades of experience in technological research and development, MSD Animal Health is committed to advancing animal healthcare and wildlife preservation through cutting-edge solutions and quality assurance practices.
Share This Job!
Save This Job!
Jobs from MSD Animal Health Technology Labs:
Global AI Lead
Junior System Engineer - Data Analysis and Machine Learning
Electronics Engineer
Data Engineer
Mobile Automation Engineer
P.W
Point Wild
Join Point Wild as a Site Reliability Engineer to maintain system reliability and performance in a dynamic engineering team.
Ensono is looking for an experienced Site Reliability Engineer (SRE) to enhance their infrastructure and service management.
Element is seeking a motivated Site Reliability Engineer (SRE) to enhance cloud migration and collaborate on Infrastructure as Code and CI/CD efforts.
CMG is seeking a Site Reliability Engineer to enhance the reliability and performance of their infrastructure and applications.
Join Ververica as a Site Reliability Engineer to design and maintain infrastructure for a Unified Streaming Data Platform.