langchain.document_transformers.openai_functions
.OpenAIMetadataTagger¶
- class langchain.document_transformers.openai_functions.OpenAIMetadataTagger[source]¶
Bases:
BaseDocumentTransformer
,BaseModel
Extract metadata tags from document contents using OpenAI functions.
- Example:
from langchain.chat_models import ChatOpenAI from langchain.document_transformers import OpenAIMetadataTagger from langchain_core.documents import Document schema = { "properties": { "movie_title": { "type": "string" }, "critic": { "type": "string" }, "tone": { "type": "string", "enum": ["positive", "negative"] }, "rating": { "type": "integer", "description": "The number of stars the critic rated the movie" } }, "required": ["movie_title", "critic", "tone"] } # Must be an OpenAI model that supports functions llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613") tagging_chain = create_tagging_chain(schema, llm) document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain) original_documents = [ Document(page_content="Review of The Bee Movie
By Roger Ebert
- This is the greatest movie ever made. 4 out of 5 stars.”),
Document(page_content=”Review of The Godfather
By Anonymous
- This movie was super boring. 1 out of 5 stars.”, metadata={“reliable”: False}),
]
enhanced_documents = document_transformer.transform_documents(original_documents)
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param tagging_chain: langchain.chains.llm.LLMChain [Required]¶
The chain used to extract metadata from each document.
- async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document] [source]¶
Asynchronously transform a list of documents.
- Parameters
documents – A sequence of Documents to be transformed.
- Returns
A list of transformed Documents.
- classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model ¶
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values
- copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model ¶
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include – fields to include in new model
exclude – fields to exclude from new model, as with values this takes precedence over include
update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep – set to True to make a deep copy of the model
- Returns
new model instance
- dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny ¶
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- classmethod from_orm(obj: Any) Model ¶
- json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode ¶
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ¶
- classmethod parse_obj(obj: Any) Model ¶
- classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ¶
- classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny ¶
- classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode ¶
- transform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document] [source]¶
Automatically extract and populate metadata for each document according to the provided schema.
- classmethod update_forward_refs(**localns: Any) None ¶
Try to update ForwardRefs on fields based on this Model, globalns and localns.
- classmethod validate(value: Any) Model ¶