> For the complete documentation index, see [llms.txt](https://epsilla-inc.gitbook.io/epsilladb/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://epsilla-inc.gitbook.io/epsilladb/knowledge-base/advanced-settings/data-parsing.md).

# Data Parsing

**Data Parsing** in Epsilla involves selecting a method to process and extract content from different types of data sources, such as PDFs, CSVs, or JSONL files.

The **Data Parsing** option allows users to choose from various parsing modes, including **Auto**, which automatically detects the data format, or manual modes like **PDF**, **CSV**, and **JSONL**, depending on the source type. Additionally, advanced parsing options are available for processing tables and charts.

<figure><img src="/files/Dr8GkqDNphlh43EDWWTJ" alt="" width="563"><figcaption></figcaption></figure>

### When to use Auto

In most cases, the default Auto mode is sufficient. It automatically detects the data file format and leverages different types of file loaders accordingly. If your data source contains multiple file types (such as PDF, DOC, TXT, JSON, CSV, HTML, etc.), Auto is your best choice.

### When to use PDF, CSV, or JSONL

If you only have one type of file in your knowledge base, you can optionally use PDF, CSV, or JSONL as your parsing option.

When using CSV and JSONL as parsing option, Epsilla automatically detects the schema of uploaded CSV and JSON files, creating additional metadata fields for each column in a CSV or each object field in a JSON file. This ensures that all relevant data attributes are seamlessly integrated into the knowledge base. Additionally, users can define custom semantic indices on these fields, enabling advanced search and retrieval capabilities tailored to their specific needs. This functionality provides a flexible and efficient way to structure and index data for improved discoverability and analysis.

<figure><img src="/files/0aaczZRMMJlZoC1MrLp7" alt="" width="563"><figcaption></figcaption></figure>

### When to use Advanced Parsing with Tables/Charts

This option is currently available only for the Enterprise tier. It provides superior data extraction accuracy using an industry-leading Large Vision Language Model (VLM) technique provided by [CambioML](https://www.cambioml.com/). At present, we support only PDF files. This option can accurately extract text, nested tables, and charts from PDF files in any layout. Read more in our [white paper](https://epsilla.com/AnyParser_Epsilla_Whitepaper.pdf#zoom=100%).

[Talk to us](https://epsilla-ai.larksuite.com/scheduler/4aca8159d1224454) if you want to use this technology without an Enterprise tier to test it out. We'd love to enable it for you and support your use case!


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://epsilla-inc.gitbook.io/epsilladb/knowledge-base/advanced-settings/data-parsing.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
