About
Jedify is redefining data intelligence. Trusted by enterprise customers, we enable true actionable insight through Agentic interfaces (chat, research, MCP) built on top of our proprietary Semantic Fusion Model (SFM).
We're seeking an exceptional DevOps Engineer to own and evolve our multi-cloud infrastructure. You'll build and maintain the production platform that powers our AI-driven data intelligence system serving enterprise customers in multi-cloud setup (AWS, GCP, Azure).
This role combines cloud infrastructure, Kubernetes orchestration, and CI/CD automation — perfect for engineers who want to build the backbone of a cutting-edge AI product.
What You'll Build
- Multi-cloud infrastructure using Terraform and Terragrunt, managing K8S (EKS) clusters, networking, databases, and security at scale.
- CI/CD pipelines and deployment automation for a microservices architecture with 14+ Kubernetes workloads deployed via Helm.
- Observability and reliability platform using Datadog (APM, logs, metrics, NPM) ensuring production SLAs for enterprise customers.
- Authorization mechanisms for multiple interfaces (API, MCP, SDKs).
- Security-first infrastructure including secrets management (External Secrets Operator + AWS Secrets Manager), WAF policies, IAM/Pod Identity, and network segmentation.
Requirements
- Strong problem-solving mindset with ability to tackle complex infrastructure challenges.
- Autonomous player who can take ownership and drive solutions independently.
- Overall 5+ years in DevOps / Infrastructure / SRE roles with startup experience.
- Strong Kubernetes experience: EKS/GKE cluster management, Helm charts, Gateway API, scaling strategies.
- Solid Terraform/Terragrunt experience: module design, state management, multi-environment configurations.
- AWS experience: EKS, VPC, ALB, RDS, ElastiCache, S3, ECR, IAM, Secrets Manager, WAF, Bedrock.
- GCP experience — an advantage: GKE, VPC, Cloud Armor, Cloud DNS, Cloud Load Balancing.
- CI/CD pipeline design and maintenance (GitHub Actions).
- Monitoring and observability: Datadog or equivalent (Prometheus, Grafana).
- Networking fundamentals: VPC design, security groups, DNS, TLS/cert management.
- Production system experience: incident response, capacity planning, disaster recovery.
- Super important — Get shit done attitude, curiosity, and proactive mindset.
Nice to Have
- Experience supporting AI/ML workloads (LLM inference, GPU scheduling, model serving).
- MongoDB operations (Kubernetes Operator, backup/restore).
- WebSocket infrastructure (Soketi/Pusher).
- Cost optimization across multi-cloud environments.
- On-prem / hybrid deployment experience.
Tech Stack
Terraform, Terragrunt, AWS, GCP, Kubernetes (EKS/GKE), Helm, Docker, GitHub Actions, Datadog, PostgreSQL, MongoDB, Redis, Traefik, cert-manager, External Secrets Operator, Python
Dev Stack
Cursor, Claude Code, Warp