langchain.vectorstores.hippo.Hippo

class langchain.vectorstores.hippo.Hippo(embedding_function: Embeddings, table_name: str = 'test', database_name: str = 'default', number_of_shards: int = 1, number_of_replicas: int = 1, connection_args: Optional[Dict[str, Any]] = None, index_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]

Hippo vector store.

You need to install hippo-api and run Hippo.

Please visit our official website for how to run a Hippo instance: https://www.transwarp.cn/starwarp

Parameters
  • embedding_function (Embeddings) – Function used to embed the text.

  • table_name (str) – Which Hippo table to use. Defaults to “test”.

  • database_name (str) – Which Hippo database to use. Defaults to “default”.

  • number_of_shards (int) – The number of shards for the Hippo table.Defaults to 1.

  • number_of_replicas (int) – The number of replicas for the Hippo table.Defaults to 1.

  • connection_args (Optional[dict[str, any]]) – The connection args used for this class comes in the form of a dict.

  • index_params (Optional[dict]) – Which index params to use. Defaults to IVF_FLAT.

  • drop_old (Optional[bool]) – Whether to drop the current collection. Defaults to False.

  • primary_field (str) – Name of the primary key field. Defaults to “pk”.

  • text_field (str) – Name of the text field. Defaults to “text”.

  • vector_field (str) – Name of the vector field. Defaults to “vector”.

The connection args used for this class comes in the form of a dict, here are a few of the options:

host (str): The host of Hippo instance. Default at “localhost”. port (str/int): The port of Hippo instance. Default at 7788. user (str): Use which user to connect to Hippo instance. If user and

password are provided, we will add related header in every RPC call.

password (str): Required when user is provided. The password

corresponding to the user.

Example


from langchain.vectorstores import Hippo from langchain.embeddings import OpenAIEmbeddings

embedding = OpenAIEmbeddings() # Connect to a hippo instance on localhost vector_store = Hippo.from_documents(

docs, embedding=embeddings, table_name=”langchain_test”, connection_args=HIPPO_CONNECTION

)

Raises

ValueError – If the hippo-api python package is not installed.

Attributes

embeddings

Access the query embedding object if available.

Methods

__init__(embedding_function[, table_name, ...])

aadd_documents(documents, **kwargs)

Run more documents through the embeddings and add to the vectorstore.

aadd_texts(texts[, metadatas])

Run more texts through the embeddings and add to the vectorstore.

add_documents(documents, **kwargs)

Run more documents through the embeddings and add to the vectorstore.

add_texts(texts[, metadatas, timeout, ...])

Add text to the collection.

adelete([ids])

Delete by vector ID or other criteria.

afrom_documents(documents, embedding, **kwargs)

Return VectorStore initialized from documents and embeddings.

afrom_texts(texts, embedding[, metadatas])

Return VectorStore initialized from texts and embeddings.

amax_marginal_relevance_search(query[, k, ...])

Return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector(...)

Return docs selected using the maximal marginal relevance.

as_retriever(**kwargs)

Return VectorStoreRetriever initialized from this VectorStore.

asearch(query, search_type, **kwargs)

Return docs most similar to query using specified search type.

asimilarity_search(query[, k])

Return docs most similar to query.

asimilarity_search_by_vector(embedding[, k])

Return docs most similar to embedding vector.

asimilarity_search_with_relevance_scores(query)

Return docs and relevance scores in the range [0, 1], asynchronously.

asimilarity_search_with_score(*args, **kwargs)

Run similarity search with distance asynchronously.

delete([ids])

Delete by vector ID or other criteria.

from_documents(documents, embedding, **kwargs)

Return VectorStore initialized from documents and embeddings.

from_texts(texts, embedding[, metadatas, ...])

Creates an instance of the VST class from the given texts.

max_marginal_relevance_search(query[, k, ...])

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search_by_vector(...)

Return docs selected using the maximal marginal relevance.

search(query, search_type, **kwargs)

Return docs most similar to query using specified search type.

similarity_search(query[, k, param, expr, ...])

Perform a similarity search on the query string.

similarity_search_by_vector(embedding[, k])

Return docs most similar to embedding vector.

similarity_search_with_relevance_scores(query)

Return docs and relevance scores in the range [0, 1].

similarity_search_with_score(query[, k, ...])

Performs a search on the query string and returns results with scores.

similarity_search_with_score_by_vector(embedding)

Performs a search on the query string and returns results with scores.

__init__(embedding_function: Embeddings, table_name: str = 'test', database_name: str = 'default', number_of_shards: int = 1, number_of_replicas: int = 1, connection_args: Optional[Dict[str, Any]] = None, index_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]
async aadd_documents(documents: List[Document], **kwargs: Any) List[str]

Run more documents through the embeddings and add to the vectorstore.

Parameters

(List[Document] (documents) – Documents to add to the vectorstore.

Returns

List of IDs of the added texts.

Return type

List[str]

async aadd_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str]

Run more texts through the embeddings and add to the vectorstore.

add_documents(documents: List[Document], **kwargs: Any) List[str]

Run more documents through the embeddings and add to the vectorstore.

Parameters

(List[Document] (documents) – Documents to add to the vectorstore.

Returns

List of IDs of the added texts.

Return type

List[str]

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, timeout: Optional[int] = None, batch_size: int = 1000, **kwargs: Any) List[str][source]

Add text to the collection.

Parameters
  • texts – An iterable that contains the text to be added.

  • metadatas – An optional list of dictionaries,

  • text. (each dictionary contains the metadata associated with a) –

  • timeout – Optional timeout, in seconds.

  • batch_size – The number of texts inserted in each batch, defaults to 1000.

  • **kwargs – Other optional parameters.

Returns

A list of strings, containing the unique identifiers of the inserted texts.

Note

If the collection has not yet been created, this method will create a new collection.

async adelete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool]

Delete by vector ID or other criteria.

Parameters
  • ids – List of ids to delete.

  • **kwargs – Other keyword arguments that subclasses might use.

Returns

True if deletion is successful, False otherwise, None if not implemented.

Return type

Optional[bool]

async classmethod afrom_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST

Return VectorStore initialized from documents and embeddings.

async classmethod afrom_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) VST

Return VectorStore initialized from texts and embeddings.

Return docs selected using the maximal marginal relevance.

async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document]

Return docs selected using the maximal marginal relevance.

as_retriever(**kwargs: Any) VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

Parameters
  • search_type (Optional[str]) – Defines the type of search that the Retriever should perform. Can be “similarity” (default), “mmr”, or “similarity_score_threshold”.

  • search_kwargs (Optional[Dict]) –

    Keyword arguments to pass to the search function. Can include things like:

    k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold

    for similarity_score_threshold

    fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR;

    1 for minimum diversity and 0 for maximum. (Default: 0.5)

    filter: Filter by document metadata

Returns

Retriever class for VectorStore.

Return type

VectorStoreRetriever

Examples:

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 6, 'lambda_mult': 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 5, 'fetch_k': 50}
)

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.8}
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={'k': 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}}
)
async asearch(query: str, search_type: str, **kwargs: Any) List[Document]

Return docs most similar to query using specified search type.

Return docs most similar to query.

async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document]

Return docs most similar to embedding vector.

async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]

Return docs and relevance scores in the range [0, 1], asynchronously.

0 is dissimilar, 1 is most similar.

Parameters
  • query – input text

  • k – Number of Documents to return. Defaults to 4.

  • **kwargs

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs

Returns

List of Tuples of (doc, similarity_score)

async asimilarity_search_with_score(*args: Any, **kwargs: Any) List[Tuple[Document, float]]

Run similarity search with distance asynchronously.

delete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool]

Delete by vector ID or other criteria.

Parameters
  • ids – List of ids to delete.

  • **kwargs – Other keyword arguments that subclasses might use.

Returns

True if deletion is successful, False otherwise, None if not implemented.

Return type

Optional[bool]

classmethod from_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST

Return VectorStore initialized from documents and embeddings.

classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, table_name: str = 'test', database_name: str = 'default', connection_args: Dict[str, Any] = {'host': 'localhost', 'password': 'admin', 'port': '7788', 'username': 'admin'}, index_params: Optional[Dict[Any, Any]] = None, search_params: Optional[Dict[str, Any]] = None, drop_old: bool = False, **kwargs: Any) Hippo[source]

Creates an instance of the VST class from the given texts.

Parameters
  • texts (List[str]) – List of texts to be added.

  • embedding (Embeddings) – Embedding model for the texts.

  • metadatas (List[dict], optional) –

  • None. (List of metadata dictionaries for each text.Defaults to) –

  • table_name (str) – Name of the table. Defaults to “test”.

  • database_name (str) – Name of the database. Defaults to “default”.

  • connection_args (dict[str, Any]) – Connection parameters.

  • DEFAULT_HIPPO_CONNECTION. (Defaults to) –

  • index_params (dict) – Indexing parameters. Defaults to None.

  • search_params (dict) – Search parameters. Defaults to an empty dictionary.

  • drop_old (bool) – Whether to drop the old collection. Defaults to False.

  • kwargs – Other arguments.

Returns

An instance of the VST class.

Return type

Hippo

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

search(query: str, search_type: str, **kwargs: Any) List[Document]

Return docs most similar to query using specified search type.

Perform a similarity search on the query string.

Parameters
  • query (str) – The text to search for.

  • k (int, optional) – The number of results to return. Default is 4.

  • param (dict, optional) – Specifies the search parameters for the index.

  • None. (Defaults to) –

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – Time to wait before a timeout error.

  • None.

  • kwargs – Keyword arguments for Collection.search().

Returns

The document results of the search.

Return type

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document]

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query vector.

similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters
  • query – input text

  • k – Number of Documents to return. Defaults to 4.

  • **kwargs

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs

Returns

List of Tuples of (doc, similarity_score)

similarity_search_with_score(query: str, k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[Document, float]][source]

Performs a search on the query string and returns results with scores.

Parameters
  • query (str) – The text being searched.

  • k (int, optional) – The number of results to return.

  • 4. (Default is) –

  • param (dict) – Specifies the search parameters for the index.

  • None. (Default is) –

  • expr (str, optional) – Filtering expression. Default is None.

  • timeout (int, optional) – The waiting time before a timeout error.

  • None.

  • kwargs – Keyword arguments for Collection.search().

Return type

List[float], List[Tuple[Document, any, any]]

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[Document, float]][source]

Performs a search on the query string and returns results with scores.

Parameters
  • embedding (List[float]) – The embedding vector being searched.

  • k (int, optional) – The number of results to return.

  • 4. (Default is) –

  • param (dict) – Specifies the search parameters for the index.

  • None. (Default is) –

  • expr (str, optional) – Filtering expression. Default is None.

  • timeout (int, optional) – The waiting time before a timeout error.

  • None.

  • kwargs – Keyword arguments for Collection.search().

Returns

Resulting documents and scores.

Return type

List[Tuple[Document, float]]