`langchain_community.document_compressors.llmlingua_filter`.LLMLinguaCompressor¶

class langchain_community.document_compressors.llmlingua_filter.LLMLinguaCompressor[source]¶

Bases: BaseDocumentCompressor

Compress using LLMLingua Project.

https://github.com/microsoft/LLMLingua

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param additional_compress_kwargs: dict = {'condition_compare': True, 'condition_in_question': 'after', 'context_budget': '+100', 'dynamic_context_compression_ratio': 0.4, 'reorder_context': 'sort'}¶: Extra compression arguments

param device_map: str = 'cuda'¶: The device to use for llm lingua

param instruction: str = 'Given this documents, please answer the final question'¶: The instruction for the LLM

param lingua: Any = None¶: The instance of the llm linqua

param model_config: dict = {}¶: Custom configuration for the model

param model_name: str = 'NousResearch/Llama-2-7b-hf'¶: The hugging face model to use

param open_api_config: dict = {}¶: open_api configuration

param rank_method: str = 'longllmlingua'¶: The ranking method to use

param target_token: int = 300¶: The target number of compressed tokens

async acompress_documents(documents: Sequence[Document], query: str, callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None) → Sequence[Document]¶

Compress retrieved documents given the query context.

Parameters

documents (Sequence[Document]) –
query (str) –
callbacks (Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]]) –

Return type

Sequence[Document]

compress_documents(documents: Sequence[Document], query: str, callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None) → Sequence[Document][source]¶

Compress documents using BAAI/bge-reranker models.

Parameters

documents (Sequence[Document]) – A sequence of documents to compress.
query (str) – The query to use for compressing the documents.
callbacks (Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]]) – Callbacks to run during the compression process.

Returns

A sequence of compressed documents.

Return type

Sequence[Document]

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model¶

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

Parameters

_fields_set (Optional[SetStr]) –
values (Any) –

Return type

Model

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) → Model¶

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to include in new model
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to exclude from new model, as with values this takes precedence over include
update (Optional[DictStrAny]) – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep (bool) – set to True to make a deep copy of the model
self (Model) –

Returns

new model instance

Return type

Model

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) → DictStrAny¶

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –

Return type

DictStrAny

extract_ref_id_tuples_and_clean(contents: List[str]) → List[Tuple[str, int]][source]¶

Extracts reference IDs from the contents and cleans up the ref tags.

This function processes a list of strings, searching for reference ID tags at the beginning and end of each string. When a ref tag is found, it is removed from the string, and its ID is recorded. If no ref ID is found, a generic ID of “-1” is assigned.

The search for ref tags is performed only at the beginning and end of the string, with the assumption that there will be at most one ref ID per string. Malformed ref tags are handled gracefully.

Parameters: contents (List[str]) – A list of contents to be processed.
Returns: The cleaned string and the associated ref ID.
Return type: List[Tuple[str, int]]

Examples

>>> strings_list = [
        '<#ref0#> Example content <#ref0#>',
        'Content with no ref ID.'
    ]
>>> extract_ref_id_tuples_and_clean(strings_list)
[('Example content', 0), ('Content with no ref ID.', -1)]

classmethod from_orm(obj: Any) → Model¶

Parameters: obj (Any) –
Return type: Model

json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) → unicode¶

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

Parameters

include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –
encoder (Optional[Callable[[Any], Any]]) –
models_as_dict (bool) –
dumps_kwargs (Any) –

Return type

unicode

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) → Model¶

Parameters

path (Union[str, Path]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –

Return type

Model

classmethod parse_obj(obj: Any) → Model¶

Parameters: obj (Any) –
Return type: Model

classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) → Model¶

Parameters

b (Union[str, bytes]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –

Return type

Model

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') → DictStrAny¶

Parameters

by_alias (bool) –
ref_template (unicode) –

Return type

DictStrAny

classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) → unicode¶

Parameters

by_alias (bool) –
ref_template (unicode) –
dumps_kwargs (Any) –

Return type

unicode

classmethod update_forward_refs(**localns: Any) → None¶

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Parameters: localns (Any) –
Return type: None

classmethod validate(value: Any) → Model¶

Parameters: value (Any) –
Return type: Model

Examples using LLMLinguaCompressor¶

Helper function for printing docs

langchain_community.document_compressors.llmlingua_filter.LLMLinguaCompressor¶

Examples using LLMLinguaCompressor¶

`langchain_community.document_compressors.llmlingua_filter`.LLMLinguaCompressor¶