langchain.document_loaders.parsers.pdf
.PyMuPDFParser¶
- class langchain.document_loaders.parsers.pdf.PyMuPDFParser(text_kwargs: Optional[Mapping[str, Any]] = None, extract_images: bool = False)[source]¶
Parse PDF using PyMuPDF.
Initialize the parser.
- Parameters
text_kwargs – Keyword arguments to pass to
fitz.Page.get_text()
.
Methods
__init__
([text_kwargs, extract_images])Initialize the parser.
lazy_parse
(blob)Lazily parse the blob.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(text_kwargs: Optional[Mapping[str, Any]] = None, extract_images: bool = False) None [source]¶
Initialize the parser.
- Parameters
text_kwargs – Keyword arguments to pass to
fitz.Page.get_text()
.
- parse(blob: Blob) List[Document] ¶
Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
- Parameters
blob – Blob instance
- Returns
List of documents