Connect to a database

In Epsilla, the data are organized as databases. A database is consisted of multiple tables.

There is a slightly difference between Docker and Epsilla Cloud when connecting to a database.

Connect to a database using Epsilla Docker

Step 0. Download docker image and start

Epsilla vector database docker images can be found at docker hub.

Use the command below to pull the latest version of Epsilla vector DB:

docker pull epsilla/vectordb

You can also specify the version to pull:

docker pull epsilla/vectordb:0.3.1

Start the docker as the backend service

docker run --pull=always -d -p 8888:8888 epsilla/vectordb

Use the EMBEDDING_MODELS environment variable to enable more built-in embedding models (learn more about embeddings):

docker run --pull=always -d -p 8888:8888 -e EMBEDDING_MODELS="BAAI/bge-small-zh-v1.5,BAAI/bge-base-en" epsilla/vectordb

Step 1. Initialize Client

### client connect to localhost
from pyepsilla import vectordb
db = vectordb.Client()


### client connect to remote server
from pyepsilla import vectordb
db = vectordb.Client(
    protocol='http',      # http or https. Default is http
    host='3.100.100.100', # The host machine for the vector db. Default localhost
    port='8888'           # The port for the vector db, default 8888
)

In order to use OpenAI for embedding, pass in X-OpenAI-API-Key header:

db = vectordb.Client(
    ...
    headers={
        "X-OpenAI-API-Key": <Your OpenAI API key here>
    }
)

Step 2. Load database

Use the command to load a database into memory. Epsilla can hold multiple databases in memory at the same time.

status_code, response = db.load_db(
    db_name="myDB",         # The name of the DB. Can give any valid
                            # name when loading a DB from disk.
    db_path="/tmp/epsilla", # The path on the disk where the DB is persisted. 
                            # If the path doesn't exist, will create
                            # a new DB at this path.
    vector_scale=1000000,   # (Optional) the limit of the number of records in
                            # the table. Can provide any positive number at
                            # load_db time. If not specified, the default value
                            # is 150000.
    wal_enabled=True        # (Optional) Enable write ahead log or not. Default 
                            # is True. For high throughput low consistency case,
                            # can disable it to save disk IO.
)

Step 3. Use database

You can use the command to switch between multiple databases that are already loaded in memory. Then the following interactions will be towards this database.

db.use_db(db_name="myDB")

Connect to a vector database on Epsilla Cloud

First, create a vector database on Cloud GUI.

We will support creating vector databases via Python/JavaScript client in the near future.

Then connect to the created database. Replace the Project ID, Database ID and API Key.

from pyepsilla import cloud
client = cloud.Client(
  project_id="PROJECT-ID",     # Copied from the GUI code snippet
  api_key="YOUR-API-KEY"       # Replace with your API Key
)
db = client.vectordb(db_id="DB-ID") # Copied from the GUI code snippet

The Project ID and Database ID can be copied from the database card under project resources:

The API Key can be created under project configurations:

Last updated