Data Parsing
Last updated
Last updated
Data Parsing in Epsilla involves selecting a method to process and extract content from different types of data sources, such as PDFs, CSVs, or JSONL files.
The Data Parsing option allows users to choose from various parsing modes, including Auto, which automatically detects the data format, or manual modes like PDF, CSV, and JSONL, depending on the source type. Additionally, advanced parsing options are available for processing tables and charts.
In most cases, the default Auto mode is sufficient. It automatically detects the data file format and leverages different types of file loaders accordingly. If your data source contains multiple file types (such as PDF, DOC, TXT, JSON, CSV, HTML, etc.), Auto is your best choice.
If you only have one type of file in your knowledge base, you can optionally use PDF, CSV, or JSONL as your parsing option.
This option is currently available only for the Enterprise tier. It provides superior data extraction accuracy using an industry-leading Large Vision Language Model (VLM) technique provided by CambioML. At present, we support only PDF files. This option can accurately extract text, nested tables, and charts from PDF files in any layout. Read more in our white paper.
Talk to us if you want to use this technology without an Enterprise tier to test it out. We'd love to enable it for you and support your use case!