TraceroAI

Debug RAG failures before they reach users.

TraceroAI traces, evaluates, and diagnoses why retrieval-augmented generation systems produce bad answers — and a recovery agent that retries the stage that failed.

Evaluation

Embedding + LLM-judge

Recovery

LangGraph self-healing

SDK

pip install traceroai

Quickstart

Send your first trace in a few lines.

TraceroAI is instrumentation, not a chat app. Drop the SDK into any RAG pipeline — LangChain, LlamaIndex, or your own — and every answer becomes a debuggable trace in the dashboard.

python
from traceroai import TraceroClient

client = TraceroClient(
    base_url="https://traceroai.onrender.com",
    api_key="your_project_key",
)

with client.trace(user_question) as t:
    t.log_retrieval(retrieved_chunks, strategy="hybrid")
    t.log_generation(answer, model="gpt-4o-mini")
# auto-times the block and sends the trace on exit

Product

A debugger for the full RAG answer lifecycle.

Trace every RAG answer

Capture the question, retrieval step, selected context, prompt, model response, and latency in one timeline — via a drop-in Python SDK.

Two-tier evaluation

Fast embedding-cosine relevance scores every trace; an LLM-as-judge runs claim-level groundedness asynchronously. Each answer is reduced to a single diagnosis.

Self-healing recovery

A LangGraph agent retries the stage that failed — re-retrieving on a retrieval miss, re-generating with a stricter prompt on an unsupported claim — until the answer is healthy.

Experiment harness

Replay a labeled dataset across pipeline configs (top_k, prompt, model), grade each with an LLM judge, and get a recommended winner. A/B testing for RAG.

Diagnosis

Bad answers are symptoms. TraceroAI shows the cause.

A hallucinated answer is not always a model problem. Sometimes the retriever missed the right document. Sometimes the context was noisy. Sometimes the prompt let the model over-answer. TraceroAI is built to separate these failure modes.

Healthy answer

Correct refusal

Retrieval miss

Unsupported claim

Wrong answer

Needs review

See it

Every answer becomes a debuggable trace.

A wrong answer is a symptom. The trace view shows the per-stage evaluation — retrieval, grounding, relevance — that explains the cause.

TraceroAI trace detail showing the diagnosis and per-stage evaluation
Trace detail — answer, diagnosis, and per-evaluator scores.
TraceroAI reliability dashboard with metrics and failure mix
Reliability dashboard — healthy rate, latency, failure mix, experiments.