# Website

The **Website** data source in Epsilla allows users to import and manage content directly from webpages, making it ideal for dynamic or continuously updated information. This type of data source is optimal for retrieving data from websites, blogs, and other web-based platforms into the knowledge base.

### **Select Knowledge Base Type**

To begin, choose **Website** as the data source to load content from webpages.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-1fd6c0c4d252c3938925635f142341928902c23c%2FScreenshot%202024-10-13%20at%201.02.17%20AM.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

Click **Continue** to proceed.

### **Knowledge Base Name**

Provide a **Knowledge Base Name**. The name should begin with a letter or an underscore (`_`), and can contain only letters, digits, underscores, and whitespaces. Enter your desired name in the input box, such as `MyWebsiteKnowledge`.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-7038201e7ca55b14a8705369022eec033d38120f%2FScreenshot%202024-10-13%20at%201.02.49%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### **Add Webpage URLs**

In the **Webpage URLs** section, input the publicly accessible URLs you want to extract data from.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-8be15bf2548da6a6735e0199ce5c4f3e3974aaea%2FScreenshot%202024-10-13%20at%201.03.02%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

* The URLs must start with `http://` or `https://`.
* You can add a **Single Webpage** manually, **Crawl** a webpage for subpages, or add multiple webpages at once.

For example, you can manually enter:

```arduino
https://epsilla-inc.gitbook.io/epsilladb
```

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-e3b5b820f3f722b02976deaf26270974aa0eca9d%2FScreenshot%202024-10-13%20at%201.03.32%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

To **crawl a webpage**, click the **Crawl webpage** button. In the dialogue that appears:

* Input the base URL, such as `https://epsilla-inc.gitbook.io/`.
* Set the **Max number of pages** to crawl (e.g., 100).

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-015857fedd4e48898be8271704deef89eaaff569%2FScreenshot%202024-10-13%20at%201.03.43%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

* Click **Search** to locate subpages. A list of subpages will be displayed.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-cb3e858943dd49c53d73f955329c5e8d14fe9ced%2FScreenshot%202024-10-13%20at%201.04.03%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

* Select the pages you want to include in the knowledge base and click **Add**.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-2f074aa504708657b566934666d0dd79aa66bb26%2FScreenshot%202024-10-13%20at%201.04.11%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

You can also use the **Add multiple webpages** option to input multiple URLs at once, separated by new lines.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-cdf8e81e9620addc98ee1edbbbc50296cc0c0ddf%2FScreenshot%202024-10-13%20at%201.05.07%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### **Data Processing**

Once you've added the desired URLs, click **Create** to begin processing the data.&#x20;

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-ddfa390f2eeaac9eb390b363b04710e987d7e30c%2FScreenshot%202024-10-13%20at%201.11.15%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

Epsilla will automatically retrieve the content from the pages, chunk it into manageable pieces, and embed it into vectors. You can monitor the progress during this step.

<figure><img src="https://2532879721-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FM0ZX7fId7ifK45ldHWEp%2Fuploads%2Fgit-blob-278e94769be6f7045652d5e77c3c7cecc1204af1%2FScreenshot%202024-10-13%20at%201.11.38%20AM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

You can inspect the processed data (chunks) at the [**Data Storage**](https://epsilla-inc.gitbook.io/epsilladb/knowledge-base/data-storage) tab.[<br>](https://epsilla-inc.gitbook.io/epsilladb/knowledge-base)
