# Data Parsing

**Data Parsing** in Epsilla involves selecting a method to process and extract content from different types of data sources, such as PDFs, CSVs, or JSONL files.&#x20;

The **Data Parsing** option allows users to choose from various parsing modes, including **Auto**, which automatically detects the data format, or manual modes like **PDF**, **CSV**, and **JSONL**, depending on the source type. Additionally, advanced parsing options are available for processing tables and charts.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-fe802e15e1f127ff36a326f8b3770d9b386e3e28%2FScreenshot%202024-10-04%20at%204.12.37%20PM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### When to use Auto

In most cases, the default Auto mode is sufficient. It automatically detects the data file format and leverages different types of file loaders accordingly. If your data source contains multiple file types (such as PDF, DOC, TXT, JSON, CSV, HTML, etc.), Auto is your best choice.

### When to use PDF, CSV, or JSONL

If you only have one type of file in your knowledge base, you can optionally use PDF, CSV, or JSONL as your parsing option.

When using CSV and JSONL as parsing option, Epsilla automatically detects the schema of uploaded CSV and JSON files, creating additional metadata fields for each column in a CSV or each object field in a JSON file. This ensures that all relevant data attributes are seamlessly integrated into the knowledge base. Additionally, users can define custom semantic indices on these fields, enabling advanced search and retrieval capabilities tailored to their specific needs. This functionality provides a flexible and efficient way to structure and index data for improved discoverability and analysis.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-ce21cade5ee256be64841a9d613ac5cdd16b4e1f%2FScreenshot%202024-12-01%20at%207.45.14%20PM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### When to use Advanced Parsing with Tables/Charts

This option is currently available only for the Enterprise tier. It provides superior data extraction accuracy using an industry-leading Large Vision Language Model (VLM) technique provided by [CambioML](https://www.cambioml.com/). At present, we support only PDF files. This option can accurately extract text, nested tables, and charts from PDF files in any layout. Read more in our [white paper](https://epsilla.com/AnyParser_Epsilla_Whitepaper.pdf#zoom=100%).

[Talk to us](https://epsilla-ai.larksuite.com/scheduler/4aca8159d1224454) if you want to use this technology without an Enterprise tier to test it out. We'd love to enable it for you and support your use case!
