Epsilla
HomeDiscordTwitterGithubEmail
  • Welcome
    • Register and Login
    • Explore App Portal
  • Build Your First AI Agent
    • Create a Knowledge Base
    • Set Up Your AI Agent
    • Publish Your AI Agent
  • Knowledge Base
    • Local Files
    • Website
    • Google Drive
    • S3
    • Notion
    • Share Point
    • Google Cloud Storage
    • Azure Blob Storage
    • Confluence
    • Jira
    • Advanced Settings
      • Auto Sync
      • Embedding
      • Data Parsing
      • Data Chunking
      • Hypothetical Questions
      • Webhook
      • Meta Data
    • Data Storage
    • Programmatically Manage Knowledge Bases
  • Application
    • Create New AI Agent
    • Basic Chat Agent Config
    • Basic Smart Search Agent Config
    • Advanced Workflow Customization
    • Publish and Deployment
    • User Engagement Analytics
  • Evaluation
    • Create New Evaluation
    • Run Evaluation
    • Evaluation Run History
  • Integration
  • Team Member Management
  • Project Management
  • Billing Management
  • Release Notes
  • Epsilla Vector Database
    • Overview
    • Quick Start
      • Run with Docker
      • Epsilla Cloud
    • User Manual
      • Connect to a database
      • Create a new table
      • Drop a table
      • Delete a database
      • Insert records
      • Upsert records
      • Search the top K semantically similar records
      • Retrieve records (with filters and pagination)
      • Delete records
      • Performance Tuning
    • Advanced Topics
      • Embeddings
      • Dense vector vs. sparse vector
      • Hybrid Search
    • Integrations
      • OpenAI
      • Mistral AI
      • Jina AI
      • Voyage AI
      • Mixedbread AI
      • Nomic AI
    • Roadmap
Powered by GitBook
On this page
  • Use Embeddings
  • Built-in Embeddings
  • OpenAI Embedding
  • Jina AI Embedding
  • Voyage AI Embedding
  • Mixedbread AI Embedding
  • Nomic AI Embedding
  • Mistral AI Embedding
  1. Epsilla Vector Database
  2. Advanced Topics

Embeddings

PreviousAdvanced TopicsNextDense vector vs. sparse vector

Last updated 3 months ago

Embeddings/vectors are numerical representations of complex data like text or images in a format that machines can process. Embedding models convert text, images, audios, and videos into vectors of real numbers. This process captures semantic meaning, allowing algorithms to understand content similarity and context. This technique is pivotal in various applications, from Retrieval Augmented Generation (RAG) to recommendation systems to language translation, as it enables computers to 'understand' and work with human language.

In embedding models, the mathematical premise is that the closer two vectors are in high-dimensional space, the more semantically similar they are. This characteristic is leveraged in vector databases for semantic similarity search, using algorithms like nearest neighbor search. These algorithms compute the distances between vectors, interpreting smaller distances as higher similarity. This approach enables applications to find closely related items (like texts, images, audios, videos) based on their embedded vector representations, making it possible to conduct searches and analyses based on the meaning and context of the data, rather than just literal matches.

Use Embeddings

Starting in v0.3, Epsilla supports automatically embed your documents and questions within the vector database, which significantly simplify the end to end semantic similarity search workflow.

When creating tables, you can define indices to let Epsilla automatically create embeddings for the STRING fields:

status_code, response = db.create_table(
    table_name="MyTable",
    table_fields=[
        {"name": "ID", "dataType": "INT", "primaryKey": True},
        {"name": "Doc", "dataType": "STRING"}
    ],
    indices=[
        {"name": "Index", "field": "Doc", "model": "BAAI/bge-small-en-v1.5"}
    ]
)
await db.createTable('MyTable',
  [
    {"name": "ID", "dataType": "INT", "primaryKey": true},
    {"name": "Doc", "dataType": "STRING"}
  ],
  [
    {"name": "Index", "field": "Doc", "model": "BAAI/bge-small-en-v1.5"}
  ]
);

You can omit the model when defining indices, and Epsilla uses BAAI/bge-small-en-v1.5 by default.

Then you can insert records in their raw format and let Epsilla handle the embedding:

status_code, response = db.insert(
  table_name="MyTable",
  records=[
    {"ID": 1, "Doc": "The garden was blooming with vibrant flowers, attracting butterflies and bees with their sweet nectar.", "Embedding": [0.05, 0.61, 0.76, 0.74]},
    {"ID": 2, "Doc": "In the busy city streets, people rushed to and fro, hardly noticing the beauty of the day.", "Embedding": [0.19, 0.81, 0.75, 0.11]},
    {"ID": 3, "Doc": "The library was a quiet haven, filled with the scent of old books and the soft rustling of pages.", "Embedding": [0.36, 0.55, 0.47, 0.94]},
    {"ID": 4, "Doc": "High in the mountains, the air was crisp and clear, revealing breathtaking views of the valley below.", "Embedding": [0.18, 0.01, 0.85, 0.80]},
    {"ID": 5, "Doc": "At the beach, children played joyfully in the sand, building castles and chasing the waves.", "Embedding": [0.24, 0.18, 0.22, 0.44]}
  ]
)
await db.insert('MyTable',
  [
    {"ID": 1, "Doc": "The garden was blooming with vibrant flowers, attracting butterflies and bees with their sweet nectar.", "Embedding": [0.05, 0.61, 0.76, 0.74]},
    {"ID": 2, "Doc": "In the busy city streets, people rushed to and fro, hardly noticing the beauty of the day.", "Embedding": [0.19, 0.81, 0.75, 0.11]},
    {"ID": 3, "Doc": "The library was a quiet haven, filled with the scent of old books and the soft rustling of pages.", "Embedding": [0.36, 0.55, 0.47, 0.94]},
    {"ID": 4, "Doc": "High in the mountains, the air was crisp and clear, revealing breathtaking views of the valley below.", "Embedding": [0.18, 0.01, 0.85, 0.80]},
    {"ID": 5, "Doc": "At the beach, children played joyfully in the sand, building castles and chasing the waves.", "Embedding": [0.24, 0.18, 0.22, 0.44]}
  ]
);

After inserting records, you can query the table with natural language questions:

status_code, response = db.query(
  table_name="MyTable",
  query_text="Where can I find a serene environment, ideal for relaxation and introspection?",
  limit=2
)
print(response)

Output

{
  'message': 'Query search successfully.',
  'result': [
    {'Doc': 'The library was a quiet haven, filled with the scent of old books and the soft rustling of pages.', 'ID': 3},
    {'Doc': 'High in the mountains, the air was crisp and clear, revealing breathtaking views of the valley below.', 'ID': 4}
  ],
  'statusCode': 200
}
// search
const query = await db.query(
  'MyTable',
  {
    query: "Where can I find a serene environment, ideal for relaxation and introspection?",
    limit: 2
  }
);
console.log(JSON.stringify(query));

Output:

{
  "statusCode":200,
  "message":"Query search successfully.",
  "result":[
    {"Doc": "The library was a quiet haven, filled with the scent of old books and the soft rustling of pages.", "ID": 3},
    {"Doc": "High in the mountains, the air was crisp and clear, revealing breathtaking views of the valley below.", "ID": 4}
  ]
}

Built-in Embeddings

Here is the list of built-in embedding models Epsilla supports:

  • BAAI/bge-small-en

  • BAAI/bge-small-en-v1.5

  • BAAI/bge-small-zh-v1.5

  • BAAI/bge-base-en

  • BAAI/bge-base-en-v1.5

  • sentence-transformers/all-MiniLM-L6-v2

BAAI/bge-small-en-v1.5 and sentence-transformers/all-MiniLM-L6-v2 are enabled by default. You can turn on other models via docker run command environment variable EMBEDDING_MODELS (using comma separated string):

docker run --pull=always -d -p 8888:8888 -e EMBEDDING_MODELS="BAAI/bge-small-zh-v1.5,BAAI/bge-base-en" epsilla/vectordb

When using these built-in embedding models, the embedding are conducted within your local laptop without outbound network. They use CPU first to do the embedding. So make sure you have enough CPU power and memory to handle the models before enabling them.

OpenAI Embedding

Epsilla supports these OpenAI embedding models:

Name
Dimensions
Support Dimension Reduction

openai/text-embedding-3-large

3072

Yes

openai/text-embedding-3-small

1536

Yes

openai/text-embedding-ada-002

1536

No

When using OpenAI embedding on Docker, make sure provide the X-OpenAI-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-OpenAI-API-Key": <Your OpenAI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-OpenAI-API-Key": <Your OpenAI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "openai/text-embedding-3-large"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "openai/text-embedding-ada-002"}
  ]
);

Jina AI Embedding

Name
Dimensions

jinaai/jina-embeddings-v2-base-en

768

jinaai/jina-embeddings-v2-base-de

768

jinaai/jina-embeddings-v2-base-zh

768

jinaai/jina-embeddings-v2-base-code

768

jinaai/jina-embeddings-v2-small-en

512

When using Jina AI embedding on Docker, make sure provide the X-JinaAI-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-JinaAI-API-Key": <Your Jina AI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-JinaAI-API-Key": <Your Jina AI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "jinaai/jina-embeddings-v2-base-en"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "jinaai/jina-embeddings-v2-base-en"}
  ]
);

Voyage AI Embedding

Name
Dimensions

voyageai/voyage-large-2

1536

voyageai/voyage-code-2

1536

voyageai/voyage-2

1024

voyageai/voyage-02

1024

voyageai/voyage-law-2

1024

voyageai/voyage-finance-2

1024

voyageai/voyage-multilingual-2

1024

voyageai/voyage-lite-02-instruct

1024

voyageai/voyage-3-large

1024

voyageai/voyage-3

1024

voyageai/voyage-3-lite

512

voyageai/voyage-code-3

1024

When using Voyage AI embedding on Docker, make sure provide the X-VoyageAI-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-VoyageAI-API-Key": <Your Voyage AI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-VoyageAI-API-Key": <Your Voyage AI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "voyageai/voyage-02"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "voyageai/voyage-02"}
  ]
);

Mixedbread AI Embedding

Name
Dimensions

mixedbreadai/UAE-Large-V1

1024

mixedbreadai/bge-large-en-v1.5

1024

mixedbreadai/gte-large

1024

mixedbreadai/e5-large-v2

1024

mixedbreadai/multilingual-e5-large

1024

mixedbreadai/multilingual-e5-base

768

mixedbreadai/gte-large-zh

1024

When using Mixedbread AI embedding on Docker, make sure provide the X-MixedbreadAI-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-MixedbreadAI-API-Key": <Your Mixedbread AI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-MixedbreadAI-API-Key": <Your Mixedbread AI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "mixedbreadai/UAE-Large-V1"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "mixedbreadai/UAE-Large-V1"}
  ]
);

Nomic AI Embedding

Name
Dimensions

nomicai/nomic-embed-text-v1.5

768

nomicai/nomic-embed-text-v1

768

When using Nomic AI embedding on Docker, make sure provide the X-NOMIC-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-NOMIC-API-Key": <Your Nomic AI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-NOMIC-API-Key": <Your Nomic AI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "nomicai/nomic-embed-text-v1"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "nomicai/nomic-embed-text-v1"}
  ]
);

Mistral AI Embedding

Name
Dimensions

mistralai/mistral-embed

1024

When using Mistral AI embedding on Docker, make sure provide the X-MistralAI-API-Key header when connecting to the vector database:

db = vectordb.Client(
    ...
    headers={
        "X-MistralAI-API-Key": <Your Mistral AI API key here>
    }
)
const db = new epsillajs.EpsillaDB({
    ...
    headers: {
        "X-MistralAI-API-Key": <Your Mistral AI API key here>
    }
});

And use the embedding model when defining the index:

status_code, response = db.create_table(
    ...
    indices=[
        {"name": "Index", "field": "Doc", "model": "mistralai/mistral-embed"}
    ]
)
await db.createTable(
  ...
  [
    {"name": "Index", "field": "Doc", "model": "mistralai/mistral-embed"}
  ]
);

If you are using Epsilla Cloud, make sure to add instead of passing the header.

Epsilla supports these JinaAI embedding models (learn more about Jina AI embedding at ):

If you are using Epsilla Cloud, make sure to add instead of passing the header.

Epsilla supports these VoyageAI embedding models (learn more about Voyage AI embedding at ):

If you are using Epsilla Cloud, make sure to add instead of passing the header.

Epsilla supports these Mixedbread AI embedding models (learn more about Mixedbread AI embedding at ):

If you are using Epsilla Cloud, make sure to add instead of passing the header.

Epsilla supports these Nomic AI embedding models (learn more about Nomic AI embedding at ):

If you are using Epsilla Cloud, make sure to add instead of passing the header.

Epsilla supports these Mistral AI embedding models (learn more about Mistral AI embedding at ):

If you are using Epsilla Cloud, make sure to add instead of passing the header.

OpenAI integration
https://jina.ai/embeddings/
JinaAI integration
https://www.voyageai.com/
VoyageAI integration
https://www.mixedbread.ai/docs/models/embeddings#models
Mixedbread AI integration
https://docs.nomic.ai/reference/endpoints/nomic-embed-text
Nomic AI integration
https://docs.mistral.ai/guides/embeddings/
Mistral AI integration