The role
We’re looking for a hands-on DevOps Engineer to own our cloud infrastructure and delivery pipelines at scale. You’ll design, build, and operate production-grade AWS/Kubernetes environments, standardize IaC, and level-up our observability across a modern microservices stack.
What you’ll do
- Build and operate AWS infrastructure (VPC/IAM/EKS/EC2/ECR/ALB/RDS/Aurora/Neptune or equivalent) using Terraform (modules, workspaces, CI-driven plans/applies).
- Run Kubernetes (EKS) in production: multi-env clusters, Helm chart authoring/versioning, progressive delivery (canary/blue-green) with GitOps (Argo CD/Flux).
- Own observability end-to-end: OpenTelemetry (collectors/auto-instrumentation), metrics (Prometheus), Grafana dashboards/alerts, logs (Loki/ELK), distributed tracing and SLOs.
- Harden reliability at scale: capacity planning, autoscaling, fault-tolerant designs, incident response/on-call, post-mortems, and performance tuning.
- Build secure, repeatable CI/CD (GitHub Actions/GitLab CI) for microservices and jobs; speed up developer workflows with golden paths and reusable templates.
- Operate and optimize databases: at least one graph DB (e.g., Neo4j/Amazon Neptune) and one SQL DB (e.g., PostgreSQL/MySQL/Aurora); backups, migrations, monitoring
- Implement security best practices: least-privilege IAM, secrets mgmt (KMS/SSM), image and IaC scanning, policy-as-code (OPA/Gatekeeper), network policies.
- Partner with backend teams on service design, cost/perf trade-offs, and production readiness checklists.
Requirements:
What you bring (must-have)
- 7+ years of professional experience in DevOps/SRE/Platform roles.
- Production experience with AWS, Kubernetes (preferably EKS), Terraform, Helm, Grafana, OpenTelemetry, and microservices.
- Proven track-record running high-scale systems (multi-service, high traffic, HA) and owning incidents/SLIs/SLOs.
- Hands-on with graph and relational databases in production (ops/observability/perf).
- Strong CI/CD fundamentals, container build pipelines (Docker), and Git workflows.
- Solid understanding of networking (VPCs, subnets, routing, TLS, ingress, DNS) and Linux.
Nice to have
- Prior experience in security/privacy-sensitive environments (e.g., DLP, EDR, IAM, compliance) or building controls for regulated customers.
- GitOps (Argo CD/Flux), Argo Rollouts, service mesh (Istio/Linkerd), Karpenter.
- Logging pipelines (Vector/Fluent Bit), cost optimization (FinOps), chaos testing.
- Scripting/coding for tooling/automation (Python/Go/Bash) and SDKs/CLIs for AWS.
- Experience with policy & compliance automation (CIS/FedRAMP/SOC 2) and SBOM/vuln scanning (Trivy/Grype).
How we work
- Pragmatic engineering, strong ownership, and blameless post-mortems.
- Shipping culture: automate what hurts, measure what matters.