langchain_community.document_loaders.parsers.generic.MimeTypeBasedParser¶

class langchain_community.document_loaders.parsers.generic.MimeTypeBasedParser(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None)[source]¶

Parser that uses mime-types to parse a blob.

This parser is useful for simple pipelines where the mime-type is sufficient to determine how to parse a blob.

To use, configure handlers based on mime-types and pass them to the initializer.

Example


from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser

parser = MimeTypeBasedParser(
handlers={

“application/pdf”: …,

}, fallback_parser=…,

)

Define a parser that uses mime-types to determine how to parse a blob.

Parameters
  • handlers (Mapping[str, BaseBlobParser]) – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser (Optional[BaseBlobParser]) – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

Methods

__init__(handlers, *[, fallback_parser])

Define a parser that uses mime-types to determine how to parse a blob.

lazy_parse(blob)

Load documents from a blob.

parse(blob)

Eagerly parse the blob into a document or documents.

__init__(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None) None[source]¶

Define a parser that uses mime-types to determine how to parse a blob.

Parameters
  • handlers (Mapping[str, BaseBlobParser]) – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser (Optional[BaseBlobParser]) – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

Return type

None

lazy_parse(blob: Blob) Iterator[Document][source]¶

Load documents from a blob.

Parameters

blob (Blob) –

Return type

Iterator[Document]

parse(blob: Blob) List[Document]¶

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters

blob (Blob) – Blob instance

Returns

List of documents

Return type

List[Document]

Examples using MimeTypeBasedParser¶