DataArt's SRE Center of Competence successfully develops and provides SRE expertise and solutions for our clients.
We are looking for a Senior SRE specialist who will join our team and provide consulting services to our clients and delivery teams.
The Senior SRE expert will participate in sales/pre-sales and discovery, provide consultancy and architecture reviews, and supervise projects during all the stages of development.
We offer an opportunity to grow professionally: lead initiatives, expand your SRE skills and technologies, mentor colleagues, and participate in R&Ds or PoCs.
• Collect and analyze data metrics, traces, and logs from the environment and the application
• Take part in system design consulting, platform management, and capacity planning
• Analyzing the requirements and supporting them from an SRE perspective
• Assist in making decisions regarding the priorities of feature development and reliability improvements based on the current state of the system
• Partner with development teams to improve services through rigorous testing and release procedures
• Programming skills with at least one of any modern programming language
• Experience with containerized environments, Docker, Kubernetes
• Experience managing code, database, infrastructure (networking, operating systems, storage)
• Experience with monitoring frameworks (Grafana, Kibana, Prometheus)
• Experience with IaaC and related tools (e.g. Terraform, CloudFormation)
• Experience with modern CI/CD (e.g. Github Actions)
• Experience with a major Cloud Provider (e.g. AWS, GCP, Azure)
• SRE experience within a service development team for supporting, troubleshooting, and log analysis to meet our service availability and observability
• Experience maintaining Service Level Objectives (SLO) / Service Level Indicators (SLI)
• Good spoken and written English, great communication skills
• Teamwork experience
Nice to have
• Strong knowledge of a scripting language (e.g. Python, Bash)
• Experience with OpenStack
• Strong Linux or Windows system-level analysis capabilities
• Experience optimizing cloud cost and reducing system resource usage by setting clear requirements through efficiency and capacity planning
• Experience with varieties of SaaS operation tools like uptime, Dynatrace, PagerDuty
• Experience in improving documentation on-site reliability measures, either in application documentation or in runbooks, explaining the issues encountered and the solutions implemented
• Experience in a negotiation process within a team or during inter-team communication
What we offer
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, technology stacks, and try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practise English: courses and communication with colleagues and clients from different countries;
— Support of speakers who make presentations at conferences and meetings of technology communities;
• Health insurance;
• The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
• Friendly atmosphere, concern for the comfort of specialists, contemporary office space;
• Flexible schedule (there are core mandatory hours), the ability to work remotely upon agreement with colleagues.