DevJobs

NOC Manager / SRE

Overview
Skills
  • Linux Linux
  • AWS AWS
  • Azure Azure
  • Docker Docker
  • Kubernetes Kubernetes
  • DataDog ꞏ 2y
  • Site24x7 ꞏ 2y
  • Automations
  • Database
  • DNS
  • firewalls
  • Helm Cloud systems
  • HTTP servers
  • proxies
  • WebHosting
  • Windows systems
Job Description

As part of your role, you would improve and establish new monitoring, alerting and observability of services using a wide range of tools.

Additionally, you would handle critical alerts and incidents and work directly with DevOps Teams to improve and optimize availability

Responsibilities

  • Own the production infrastructure over Public and Private Cloud, On-Premise and internal systems.
  • Research production workflows, identify optimization opportunities, issues and improve monitoring.
  • Help Identify root causes for incidents and prevent them from happening again including publishing RCA’s.
  • Improve and establish alerting for our infrastructure, services and business logic.
  • Communicate and escalate issues to senior management in R&D, DevOps, Support.

Requirements:

  • At least 3+ years of experience as DevOps, SRE , Infra Backend.
  • At least 2 years of experience with Alerting & Monitoring systems such as DataDog, Site24x7
  • Experience with running distributed systems deployed multiple geographies across the globe
  • Solid knowledge in networking and internet technologies - e.g. HTTP servers, DNS, firewalls, proxies, etc’
  • Experience working with Linux and Windows systems
  • Experience with Docker, Kubernetes and Helm
  • Cloud systems such as AWS / Azure
  • Familiarity with Database, WebHosting, Automations
  • An innovative approach, with the ability to quickly learn technologies
  • High Analytical & Troubleshooting skills - ability to solve complex problems
  • Fast learner and able to take a project from POC to production, while handling decision making and communications
Cobwebs Technology