Skip to main content

Roadmap

What we shipped, what we are building next, and what we plan to build.

Last Shipped

Test Set Versioning and New UI
1/20/2026
Evaluation
Track test set changes with versioning. Every edit creates a new version. Evaluations link to specific versions for reliable comparisons. Plus a rebuilt UI that scales to 100K+ rows with inline editing for chat messages and JSON.
Chat Sessions in Observability
1/9/2026
Observability
Track multi-turn conversations with session grouping. All traces with the same session ID are automatically grouped together, showing complete conversation flows with cost, latency, and token metrics per session.
PDF Support in the Playground
12/17/2025
PlaygroundEvaluationObservability
Attach PDF documents to chat messages in the playground. Upload files, provide URLs, or use file IDs from provider APIs. Works with OpenAI, Gemini, and Claude models. PDFs are supported in evaluations and observability traces.
Provider Built-in Tools in the Playground
12/11/2025
Playground
Use provider built-in tools like web search, code execution, and file search directly in the Playground. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with prompts and automatically used via the LLM gateway.
Projects within Organizations
12/4/2025
Misc
Create projects within organizations to divide work between different AI products. Each project scopes its prompts, traces, and evaluations independently.
Jinja2 Template Support in the Playground
11/17/2025
Playground
Use Jinja2 templating in prompts to add conditional logic, filters, and template blocks. The template format is stored in the configuration schema, and the SDK handles rendering automatically.
Programmatic Evaluation through the SDK
11/11/2025
Evaluation
Run evaluations programmatically from code with full control over test data and evaluation logic. Evaluate agents built with any framework and view results in the Agenta dashboard.

In progress

Planned

Feature Requests

Upvote or comment on the features you care about or request a new feature.