`langchain_community.document_transformers.embeddings_redundant_filter`.EmbeddingsClusteringFilter¶

class langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsClusteringFilter[source]¶

Bases: BaseDocumentTransformer, BaseModel

Perform K-means clustering on document vectors. Returns an arbitrary number of documents closest to center.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param embeddings: Embeddings [Required]¶: Embeddings to use for embedding document contents.

param num_closest: int = 1¶: The number of closest vectors to return for each cluster center.

param num_clusters: int = 5¶: Number of clusters. Groups of documents with similar meaning.

param random_state: int = 42¶: Controls the random number generator used to initialize the cluster centroids. If you set the random_state parameter to None, the KMeans algorithm will use a random number generator that is seeded with the current time. This means that the results of the KMeans algorithm will be different each time you run it.

param remove_duplicates: bool = False¶: By default duplicated results are skipped and replaced by the next closest vector in the cluster. If remove_duplicates is true no replacement will be done: This could dramatically reduce results when there is a lot of overlap between clusters.

param sorted: bool = False¶: By default results are re-ordered “grouping” them by cluster, if sorted is true result will be ordered by the original position from the retriever

async atransform_documents(documents: Sequence[Document], **kwargs: Any) → Sequence[Document]¶

Asynchronously transform a list of documents.

Parameters

documents (Sequence[Document]) – A sequence of Documents to be transformed.
kwargs (Any) –

Returns

A list of transformed Documents.

Return type

Sequence[Document]

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model¶

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

Parameters

_fields_set (Optional[SetStr]) –
values (Any) –

Return type

Model

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) → Model¶

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to include in new model
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to exclude from new model, as with values this takes precedence over include
update (Optional[DictStrAny]) – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep (bool) – set to True to make a deep copy of the model
self (Model) –

Returns

new model instance

Return type

Model

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) → DictStrAny¶

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –

Return type

DictStrAny

classmethod from_orm(obj: Any) → Model¶

Parameters: obj (Any) –
Return type: Model

json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) → unicode¶

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –
encoder (Optional[Callable[[Any], Any]]) –
models_as_dict (bool) –
dumps_kwargs (Any) –

Return type

unicode

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) → Model¶

Parameters

path (Union[str, Path]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –

Return type

Model

classmethod parse_obj(obj: Any) → Model¶

Parameters: obj (Any) –
Return type: Model

classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) → Model¶

Parameters

b (Union[str, bytes]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –

Return type

Model

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') → DictStrAny¶

Parameters

by_alias (bool) –
ref_template (unicode) –

Return type

DictStrAny

classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) → unicode¶

Parameters

by_alias (bool) –
ref_template (unicode) –
dumps_kwargs (Any) –

Return type

unicode

transform_documents(documents: Sequence[Document], **kwargs: Any) → Sequence[Document][source]¶

Filter down documents.

Parameters

documents (Sequence[Document]) –
kwargs (Any) –

Return type

Sequence[Document]

classmethod update_forward_refs(**localns: Any) → None¶

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Parameters: localns (Any) –
Return type: None

classmethod validate(value: Any) → Model¶

Parameters: value (Any) –
Return type: Model

Examples using EmbeddingsClusteringFilter¶

Get 3 diff embeddings.

langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsClusteringFilter¶

Examples using EmbeddingsClusteringFilter¶

`langchain_community.document_transformers.embeddings_redundant_filter`.EmbeddingsClusteringFilter¶