EasySend is a no-code platform for building and optimizing digital customer journeys in insurance, banking, and financial services. Our platform turns paperwork-based processes into revenue-generating customer engagements for some of the largest financial institutions in Israel, the US, Europe, and APAC.
Behind that no-code surface sits a non-trivial backend: a distributed workflow runtime that executes customer-defined business processes, integrates with dozens of third-party systems, and serves real production transactions across 7 regional clusters. We are looking for a Senior Backend Engineer to help us evolve that runtime.
Responsibilities
- Own architectural decisions on the workflow runtime, single vs. multi-tenancy boundaries, and integration patterns. Write the design doc, defend it in review, ship it.
- End-to-end delivery: requirements → design → implementation → deployment → monitoring → post-incident learning—no throwing over the wall.
- Partner with Product, DevOps, and the other engineering teams on cross-team initiatives (Celery → temporal.io rollout, customer onboarding to new regions, breaking changes in the workflow runtime).
- Mentor mid-level engineers through code review, design review, and on-call shadowing.
- Continuously raise the bar on code quality, system stability, and test coverage.
Requirements:
- 7+ years building production backend systems, including at least 2 years owning a distributed/async runtime (workflow engine, task queue, event-driven system, stream processor - Temporal.io, Celery, Kafka Streams, AWS Step Functions, or equivalent).
- Deep hands-on Python (the player and workflow runtime are Python), plus working JavaScript. Java or Go is a nice-to-have.
- Production scale experience and mindset. You've operated systems where throughput, tail latency, and blast radius are first-class design constraints, not afterthoughts.
- Production experience operating self-hosted infrastructure on Kubernetes (EKS preferred). Comfortable reading helm charts, debugging pod resource issues, and reasoning about autoscaling.
- Comfortable with CI/CD pipelines, infra-as-code, and being on-call for systems you ship.
- AI fluency on both sides of the product. You use Claude Code or an equivalent agentic coding tool for real SDLC work (design docs, refactors, on-call triage) and have shipped LLM-backed features in production - agentic workflows, retrieval, tool-use, structured output, with an eval loop and a real opinion on latency, cost, and safety.
Strongly preferred (any one of these moves you up the stack):
- Direct experience with Temporal.io in production - workflow versioning, replayer-based regression detection, signal/query/update patterns, codec authoring, per-namespace operational concerns.
- Strong grasp of distributed-systems failure modes: idempotency, retries, timeouts, partial failures, non-determinism, exactly-once vs at-least-once semantics. You can debug a "stuck workflow" without a runbook.
- Performance optimization in Python at the deserialization/interpreter / JIT layer (cython, mypyc, alternative JS engines, profiling tooling).
- Operating modern self-hosted infrastructure at scale. Hands-on with multi-cluster EKS, a metrics + logs stack (Prometheus / Grafana / centralized log aggregation), IaC, and GitOps workflows. You've owned the dashboards and alerts you're on-call against - not just consumed someone else's.