langchain_community.document_loaders.docugami.DocugamiLoader¶

class langchain_community.document_loaders.docugami.DocugamiLoader[source]¶

Bases: BaseLoader, BaseModel

Deprecated since version 0.0.24: Use docugami_langchain.DocugamiLoader instead.

Load from Docugami.

To use, you should have the dgml-utils python package installed.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param access_token: Optional[str] = None¶

The Docugami API access token to use.

param api: str = 'https://api.docugami.com/v1preview1'¶

The Docugami API endpoint to use.

param docset_id: Optional[str] = None¶

The Docugami API docset ID to use.

param document_ids: Optional[Sequence[str]] = None¶

The Docugami API document IDs to use.

param file_paths: Optional[Sequence[Union[Path, str]]] = None¶

The local file paths to use.

param include_project_metadata_in_doc_metadata: bool = True¶

Set to True if you want to include the project metadata in the doc metadata.

param include_xml_tags: bool = False¶

Set to true for XML tags in chunk output text.

param max_metadata_length: int = 512¶

Max length of metadata text returned.

param max_text_length: int = 4096¶

Max length of chunk text returned.

param min_text_length: int = 32¶

Threshold under which chunks are appended to next to avoid over-chunking.

param parent_hierarchy_levels: int = 0¶

Set appropriately to get parent chunks using the chunk hierarchy.

param parent_id_key: str = 'doc_id'¶

Metadata key for parent doc ID.

param sub_chunk_tables: bool = False¶

Set to True to return sub-chunks within tables.

param whitespace_normalize_text: bool = True¶

Set to False if you want to full whitespace formatting in the original XML doc, including indentation.

async alazy_load() AsyncIterator[Document]¶

A lazy loader for Documents.

Return type

AsyncIterator[Document]

async aload() List[Document]¶

Load data into Document objects.

Return type

List[Document]

lazy_load() Iterator[Document]¶

A lazy loader for Documents.

Return type

Iterator[Document]

load() List[Document][source]¶

Load documents.

Return type

List[Document]

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

Return type

List[Document]