Epsilla
HomeDiscordTwitterGithubEmail
  • Welcome
    • Register and Login
    • Explore App Portal
  • Build Your First AI Agent
    • Create a Knowledge Base
    • Set Up Your AI Agent
    • Publish Your AI Agent
  • Knowledge Base
    • Local Files
    • Website
    • Google Drive
    • S3
    • Notion
    • Share Point
    • Google Cloud Storage
    • Azure Blob Storage
    • Confluence
    • Jira
    • Advanced Settings
      • Auto Sync
      • Embedding
      • Data Parsing
      • Data Chunking
      • Hypothetical Questions
      • Webhook
      • Meta Data
    • Data Storage
    • Programmatically Manage Knowledge Bases
  • Application
    • Create New AI Agent
    • Basic Chat Agent Config
    • Basic Smart Search Agent Config
    • Advanced Workflow Customization
    • Publish and Deployment
    • User Engagement Analytics
  • Evaluation
    • Create New Evaluation
    • Run Evaluation
    • Evaluation Run History
  • Integration
  • Team Member Management
  • Project Management
  • Billing Management
  • Release Notes
  • Epsilla Vector Database
    • Overview
    • Quick Start
      • Run with Docker
      • Epsilla Cloud
    • User Manual
      • Connect to a database
      • Create a new table
      • Drop a table
      • Delete a database
      • Insert records
      • Upsert records
      • Search the top K semantically similar records
      • Retrieve records (with filters and pagination)
      • Delete records
      • Performance Tuning
    • Advanced Topics
      • Embeddings
      • Dense vector vs. sparse vector
      • Hybrid Search
    • Integrations
      • OpenAI
      • Mistral AI
      • Jina AI
      • Voyage AI
      • Mixedbread AI
      • Nomic AI
    • Roadmap
Powered by GitBook
On this page
  • When to use Auto
  • When to use PDF, CSV, or JSONL
  • When to use Advanced Parsing with Tables/Charts
  1. Knowledge Base
  2. Advanced Settings

Data Parsing

PreviousEmbeddingNextData Chunking

Last updated 6 months ago

Data Parsing in Epsilla involves selecting a method to process and extract content from different types of data sources, such as PDFs, CSVs, or JSONL files.

The Data Parsing option allows users to choose from various parsing modes, including Auto, which automatically detects the data format, or manual modes like PDF, CSV, and JSONL, depending on the source type. Additionally, advanced parsing options are available for processing tables and charts.

When to use Auto

In most cases, the default Auto mode is sufficient. It automatically detects the data file format and leverages different types of file loaders accordingly. If your data source contains multiple file types (such as PDF, DOC, TXT, JSON, CSV, HTML, etc.), Auto is your best choice.

When to use PDF, CSV, or JSONL

If you only have one type of file in your knowledge base, you can optionally use PDF, CSV, or JSONL as your parsing option.

When using CSV and JSONL as parsing option, Epsilla automatically detects the schema of uploaded CSV and JSON files, creating additional metadata fields for each column in a CSV or each object field in a JSON file. This ensures that all relevant data attributes are seamlessly integrated into the knowledge base. Additionally, users can define custom semantic indices on these fields, enabling advanced search and retrieval capabilities tailored to their specific needs. This functionality provides a flexible and efficient way to structure and index data for improved discoverability and analysis.

When to use Advanced Parsing with Tables/Charts

This option is currently available only for the Enterprise tier. It provides superior data extraction accuracy using an industry-leading Large Vision Language Model (VLM) technique provided by . At present, we support only PDF files. This option can accurately extract text, nested tables, and charts from PDF files in any layout. Read more in our .

if you want to use this technology without an Enterprise tier to test it out. We'd love to enable it for you and support your use case!

CambioML
white paper
Talk to us