# Evaluation

In Epsilla, AI agent evaluation is designed as a continuous performance assessment framework, aimed at testing and improving the AI agents' response quality over time. This evaluation system runs predefined scenarios that simulate real-world interactions, allowing AI agent builders and operation team to monitor the performance of AI agents across various situations. The evaluation process utilizes large language models (LLMs) to compare the AI-generated responses against human-labeled answers, scoring them based on a set of metrics such as accuracy, relevance, and coverage.

This approach is conceptually similar to Continuous Integration/Continuous Delivery (CI/CD) practices, where the goal is to iteratively test and improve the system. It leverages human input and LLMs to provide ongoing feedback on the AI's performance, ensuring that the agents meet high-quality standards as they are updated and refined over time.

On the navigation bar, click on the **Evaluations** tab.

<figure><img src="/files/q9HOW3zXFTL1vSq30zI6" alt="" width="253"><figcaption></figcaption></figure>

This will lead you to the page where you can create and manage all your evaluations.

<figure><img src="/files/yAo4Ouu98rDHjXGJchEP" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://epsilla-inc.gitbook.io/epsilladb/evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
