langchain_community.document_loaders.parsers.vsdx.VsdxParser¶

class langchain_community.document_loaders.parsers.vsdx.VsdxParser[source]¶

Parser for vsdx files.

Methods

__init__()

get_pages_content(zfile, source)

Get the content of the pages of a vsdx file.

get_relationships(page, zfile, filelist, ...)

Get the relationships of a page and the relationships of its relationships, etc.

lazy_parse(blob)

Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.

parse(blob)

Parse a vsdx file.

__init__()¶
get_pages_content(zfile: ZipFile, source: str) List[Tuple[int, str, str]][source]¶

Get the content of the pages of a vsdx file.

zfile¶

The vsdx file under zip format.

Type

zipfile.ZipFile

source¶

The path of the vsdx file.

Type

str

Returns

A list of tuples containing the page number, the name of the page and the content of the page for each page of the vsdx file.

Return type

list[tuple[int, str, str]]

Parameters
  • zfile (ZipFile) –

  • source (str) –

get_relationships(page: str, zfile: ZipFile, filelist: List[str], pagexml_rels: List[dict]) Set[str][source]¶

Get the relationships of a page and the relationships of its relationships, etc… recursively. Pages are based on other pages (ex: background page), so we need to get all the relationships to get all the content of a single page.

Parameters
  • page (str) –

  • zfile (ZipFile) –

  • filelist (List[str]) –

  • pagexml_rels (List[dict]) –

Return type

Set[str]

lazy_parse(blob: Blob) Iterator[Document][source]¶

Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.

Parameters

blob (Blob) –

Return type

Iterator[Document]

parse(blob: Blob) Iterator[Document][source]¶

Parse a vsdx file.

Parameters

blob (Blob) –

Return type

Iterator[Document]