TripleTen for Business empowers companies to achieve their business goals by bridging talent gaps in Data Science, AI for professionals, Python Development, and Management.
In Short
Develop infrastructure, and write solutions to simplify operations.
Build processes to achieve and maintain 99.99% uptime, and improve the exercise process.
Develop automation and service reliability, plan resources, and reduce ops in development.
Build infrastructure and monitoring, help developers solve infrastructure problems, train developers to solve problems independently, and improve the observability of infrastructure, monitoring, schedules, and alerts.
Requirements
2+ years of Site Reliability Experience.
Experience working with Prometheus - must have.
Experience working with Kubernetes, GitLab CI, and Ansible.
Experience working with Unix systems (we have Ubuntu) and the console.
Understanding the basics of TCP/IP to build networks, how web services work, REST API, and gRPC.
Experience performing diagnostics, including interpreting the output of Ps, Top, Strace, Perf, and TCPDump.
Understanding of how user applications interact with the operating system, including familiarity with system calls, processes, and threads.
Willingness to build high-load systems and understanding of how to do that.
Understanding of fault tolerance and service scaling.
High degree of emotional intelligence, ability to find common ground with colleagues and work as part of a team.
Must be professionally fluent in English.
Benefits
Full-time remote collaboration with a convenient schedule. Professional freedom, where we trust your experience instead of wasting each other's time and effort micromanaging;
A diverse and tight-knit team. Our teammates are spread out across Serbia, the US, Israel, Georgia, Armenia, Latin America, and more. They’ve worked at all of big techs, ed-techs, design agencies, and cultural institutions;
Comfortable digital workspace. We use Miro, Notion, Google Workspace, Jira, etc.— to make working together process seamless.