langchain.document_loaders.generic.GenericLoader¶

class langchain.document_loaders.generic.GenericLoader(blob_loader: BlobLoader, blob_parser: BaseBlobParser)[source]¶

Generic Document Loader.

A generic document loader that allows combining an arbitrary blob loader with a blob parser.

Examples

from langchain.document_loaders import GenericLoader
from langchain.document_loaders.blob_loaders import FileSystemBlobLoader

loader = GenericLoader.from_filesystem(
    path="path/to/directory",
    glob="**/[!.]*",
    suffixes=[".pdf"],
    show_progress=True,
)

docs = loader.lazy_load()
next(docs)

Example instantiations to change which files are loaded:

.. code-block:: python

    # Recursively load all text files in a directory.
    loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt")

    # Recursively load all non-hidden files in a directory.
    loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*")

    # Load all files in a directory without recursion.
    loader = GenericLoader.from_filesystem("/path/to/dir", glob="*")

Example instantiations to change which parser is used:

.. code-block:: python

    from langchain.document_loaders.parsers.pdf import PyPDFParser

    # Recursively load all text files in a directory.
    loader = GenericLoader.from_filesystem(
        "/path/to/dir",
        glob="**/*.pdf",
        parser=PyPDFParser()
    )

A generic document loader.

Parameters
  • blob_loader – A blob loader which knows how to yield blobs

  • blob_parser – A blob parser which knows how to parse blobs into documents

Methods

__init__(blob_loader, blob_parser)

A generic document loader.

from_filesystem(path, *[, glob, exclude, ...])

Create a generic document loader using a filesystem blob loader.

lazy_load()

Load documents lazily.

load()

Load all documents.

load_and_split([text_splitter])

Load all documents and split them into sentences.

__init__(blob_loader: BlobLoader, blob_parser: BaseBlobParser) None[source]¶

A generic document loader.

Parameters
  • blob_loader – A blob loader which knows how to yield blobs

  • blob_parser – A blob parser which knows how to parse blobs into documents

classmethod from_filesystem(path: Union[str, Path], *, glob: str = '**/[!.]*', exclude: Sequence[str] = (), suffixes: Optional[Sequence[str]] = None, show_progress: bool = False, parser: Union[Literal['default'], BaseBlobParser] = 'default') GenericLoader[source]¶

Create a generic document loader using a filesystem blob loader.

Parameters
  • path – The path to the directory to load documents from.

  • glob – The glob pattern to use to find documents.

  • suffixes – The suffixes to use to filter documents. If None, all files matching the glob will be loaded.

  • exclude – A list of patterns to exclude from the loader.

  • show_progress – Whether to show a progress bar or not (requires tqdm). Proxies to the file system loader.

  • parser – A blob parser which knows how to parse blobs into documents

Returns

A generic document loader.

lazy_load() Iterator[Document][source]¶

Load documents lazily. Use this when working at a large scale.

load() List[Document][source]¶

Load all documents.

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document][source]¶

Load all documents and split them into sentences.

Examples using GenericLoader¶