Epsilla
HomeDiscordTwitterGithubEmail
  • Welcome
    • Register and Login
    • Explore App Portal
  • Build Your First AI Agent
    • Create a Knowledge Base
    • Set Up Your AI Agent
    • Publish Your AI Agent
  • Knowledge Base
    • Local Files
    • Website
    • Google Drive
    • S3
    • Notion
    • Share Point
    • Google Cloud Storage
    • Azure Blob Storage
    • Confluence
    • Jira
    • Advanced Settings
      • Auto Sync
      • Embedding
      • Data Parsing
      • Data Chunking
      • Hypothetical Questions
      • Webhook
      • Meta Data
    • Data Storage
    • Programmatically Manage Knowledge Bases
  • Application
    • Create New AI Agent
    • Basic Chat Agent Config
    • Basic Smart Search Agent Config
    • Advanced Workflow Customization
    • Publish and Deployment
    • User Engagement Analytics
  • Evaluation
    • Create New Evaluation
    • Run Evaluation
    • Evaluation Run History
  • Integration
  • Team Member Management
  • Project Management
  • Billing Management
  • Release Notes
  • Epsilla Vector Database
    • Overview
    • Quick Start
      • Run with Docker
      • Epsilla Cloud
    • User Manual
      • Connect to a database
      • Create a new table
      • Drop a table
      • Delete a database
      • Insert records
      • Upsert records
      • Search the top K semantically similar records
      • Retrieve records (with filters and pagination)
      • Delete records
      • Performance Tuning
    • Advanced Topics
      • Embeddings
      • Dense vector vs. sparse vector
      • Hybrid Search
    • Integrations
      • OpenAI
      • Mistral AI
      • Jina AI
      • Voyage AI
      • Mixedbread AI
      • Nomic AI
    • Roadmap
Powered by GitBook
On this page

Evaluation

PreviousUser Engagement AnalyticsNextCreate New Evaluation

Last updated 6 months ago

In Epsilla, AI agent evaluation is designed as a continuous performance assessment framework, aimed at testing and improving the AI agents' response quality over time. This evaluation system runs predefined scenarios that simulate real-world interactions, allowing AI agent builders and operation team to monitor the performance of AI agents across various situations. The evaluation process utilizes large language models (LLMs) to compare the AI-generated responses against human-labeled answers, scoring them based on a set of metrics such as accuracy, relevance, and coverage.

This approach is conceptually similar to Continuous Integration/Continuous Delivery (CI/CD) practices, where the goal is to iteratively test and improve the system. It leverages human input and LLMs to provide ongoing feedback on the AI's performance, ensuring that the agents meet high-quality standards as they are updated and refined over time.

On the navigation bar, click on the Evaluations tab.

This will lead you to the page where you can create and manage all your evaluations.