langchain.document_loaders.parsers.generic.MimeTypeBasedParser

class langchain.document_loaders.parsers.generic.MimeTypeBasedParser(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None)[source]

Parser that uses mime-types to parse a blob.

This parser is useful for simple pipelines where the mime-type is sufficient to determine how to parse a blob.

To use, configure handlers based on mime-types and pass them to the initializer.

Example


from langchain.document_loaders.parsers.generic import MimeTypeBasedParser

parser = MimeTypeBasedParser(
handlers={

“application/pdf”: …,

}, fallback_parser=…,

)

Define a parser that uses mime-types to determine how to parse a blob.

Parameters
  • handlers – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

Methods

__init__(handlers, *[, fallback_parser])

Define a parser that uses mime-types to determine how to parse a blob.

lazy_parse(blob)

Load documents from a blob.

parse(blob)

Eagerly parse the blob into a document or documents.

__init__(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None) None[source]

Define a parser that uses mime-types to determine how to parse a blob.

Parameters
  • handlers – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

lazy_parse(blob: Blob) Iterator[Document][source]

Load documents from a blob.

parse(blob: Blob) List[Document]

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters

blob – Blob instance

Returns

List of documents