DevJobs

Artificial Intelligence Researcher

Overview
Skills
  • Python Python
  • Deep learning Deep learning
  • ML ML
  • Dataset Curation
  • LLM
  • Model Training
  • Open Source Libraries
  • Unsupervised Clustering

About Us:

Zenity is the first and only holistic platform built to secure and govern AI Agents from buildtime to runtime. We help organizations defend against security threats, meet compliance, and drive business productivity. Trusted by many of the world’s F500 companies, Zenity provides centralized visibility, vulnerability assessments, and governance by continuously scanning business-led development environments. We recently raised $38 million in a Series B funding, solidifying our position as a leader in the industry and enabling us to accelerate our mission of securing AI Agents everywhere.



About the Role:

This is a research‑first role focused on deeply understanding LLM internals to improve the security of AI agents.

You’ll design careful experiments on activations and interpretable features- e.g., probing, attribution & ablation/patching, representation‑geometry analyses-to uncover mechanisms behind jailbreak, indirect prompt injection, and other attacks.

Then translate those insights into signals that can be used for detection and analysis of a model response.

The field of LLM interpretability at scale is exploding, with several major publications in the last months, and major opportunities for innovation.



Responsibilities:

  • Investigate model internals, including activation/features analysis, unsupervised clustering, discovery of directions in latent space, etc. It may also require training specific model parts to improve interpretability metrics.
  • Design security‑grounded evaluations: curate datasets for different attack types, evaluate performance of different white box (model internals) methods compared to black box (input/output only) baselines.
  • Publish and share: produce Zenity Labs posts and open artifacts; when the work is strong, aim for tier‑1 ML venues (NeurIPS, ICML, etc.) and security forums. A publication of code and/or trained models in cases of community relevant novelty.
  • Build tools: Several open source libraries exist (like Anthropic’s attribution graphs infra), but the research in the field is very dynamic, which will require you to build and adapt tools to your own research directions. This also includes agents to automate research work and distill knowledge from designed experiments.

Zenity