langchain.document_loaders.generic
.GenericLoader¶
- class langchain.document_loaders.generic.GenericLoader(blob_loader: BlobLoader, blob_parser: BaseBlobParser)[source]¶
Generic Document Loader.
A generic document loader that allows combining an arbitrary blob loader with a blob parser.
Examples
from langchain.document_loaders import GenericLoader from langchain.document_loaders.blob_loaders import FileSystemBlobLoader loader = GenericLoader.from_filesystem( path="path/to/directory", glob="**/[!.]*", suffixes=[".pdf"], show_progress=True, ) docs = loader.lazy_load() next(docs) Example instantiations to change which files are loaded: .. code-block:: python # Recursively load all text files in a directory. loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt") # Recursively load all non-hidden files in a directory. loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*") # Load all files in a directory without recursion. loader = GenericLoader.from_filesystem("/path/to/dir", glob="*") Example instantiations to change which parser is used: .. code-block:: python from langchain.document_loaders.parsers.pdf import PyPDFParser # Recursively load all text files in a directory. loader = GenericLoader.from_filesystem( "/path/to/dir", glob="**/*.pdf", parser=PyPDFParser() )
A generic document loader.
- Parameters
blob_loader – A blob loader which knows how to yield blobs
blob_parser – A blob parser which knows how to parse blobs into documents
Methods
__init__
(blob_loader, blob_parser)A generic document loader.
from_filesystem
(path, *[, glob, exclude, ...])Create a generic document loader using a filesystem blob loader.
Load documents lazily.
load
()Load all documents.
load_and_split
([text_splitter])Load all documents and split them into sentences.
- __init__(blob_loader: BlobLoader, blob_parser: BaseBlobParser) None [source]¶
A generic document loader.
- Parameters
blob_loader – A blob loader which knows how to yield blobs
blob_parser – A blob parser which knows how to parse blobs into documents
- classmethod from_filesystem(path: Union[str, Path], *, glob: str = '**/[!.]*', exclude: Sequence[str] = (), suffixes: Optional[Sequence[str]] = None, show_progress: bool = False, parser: Union[Literal['default'], BaseBlobParser] = 'default') GenericLoader [source]¶
Create a generic document loader using a filesystem blob loader.
- Parameters
path – The path to the directory to load documents from.
glob – The glob pattern to use to find documents.
suffixes – The suffixes to use to filter documents. If None, all files matching the glob will be loaded.
exclude – A list of patterns to exclude from the loader.
show_progress – Whether to show a progress bar or not (requires tqdm). Proxies to the file system loader.
parser – A blob parser which knows how to parse blobs into documents
- Returns
A generic document loader.
- lazy_load() Iterator[Document] [source]¶
Load documents lazily. Use this when working at a large scale.
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document] [source]¶
Load all documents and split them into sentences.