Integration: mixedbread ai
Use mixedbread's models as well as top open-source models in seconds
Table of Contents
Overview
mixedbread ai is an ai start-up that provides open-source, as well as, in-house embedding models. You can choose from various foundation models to find the one best suited for your use case. More information can be found on the documentation page.
Installation
Install the mixedbread ai integration with a simple pip command:
pip install mixedbread-ai-haystack
Usage
This integration comes with 2 components:
For documents you can use MixedbreadAiDocumentEmbedder
and for queries you can use MixedbreadAiTextEmbedder
. Once you’ve selected the component for your specific use case, initialize the component with the model
and the
api_key
. You can also set the environment variable MIXEDBREAD_API_KEY
instead of passing the api key as an argument.
In a Pipeline
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore, InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from mixedbread_ai_haystack.embedders import MixedbreadAiDocumentEmbedder, MixedbreadAiTextEmbedder
# -------------------------------------
# Indexing Pipeline
# -------------------------------------
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="china is the most populous country in the world."), Document(content="india is the second most populous country in the world."), Document(content="united states is the third most populous country in the world.")]
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("doc_embedder", MixedbreadAiDocumentEmbedder(api_key="MIXEDBREAD_API_KEY", model="UAE-Large-V1"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("doc_embedder", "writer")
indexing_pipeline.run({"doc_embedder": {"documents": documents}})
# -------------------------------------
# Query Pipeline
# -------------------------------------
text_embedder = MixedbreadAiTextEmbedder(model="UAE-Large-V1", api_key="MIXEDBREAD_API_KEY")
# Query Pipeline
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
results = query_pipeline.run({"text_embedder": {"text": "Which country has the biggest population?"}})
top_document = results["retriever"]["documents"][0].content
print(top_document)