langchain_community.document_loaders.pebblo.PebbloSafeLoader¶

class langchain_community.document_loaders.pebblo.PebbloSafeLoader(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: Optional[str] = None, load_semantic: bool = False, classifier_url: Optional[str] = None)[source]¶

Pebblo Safe Loader class is a wrapper around document loaders enabling the data to be scrutinized.

Methods

__init__(langchain_loader, name[, owner, ...])

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

calculate_content_size(page_content)

Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.

get_file_owner_from_path(file_path)

Fetch owner of local file path.

get_source_size(source_path)

Fetch size of source path.

lazy_load()

Load documents in lazy fashion.

load()

Load Documents.

load_and_split([text_splitter])

Load Documents and split into chunks.

set_discover_sent()

set_loader_sent()

Parameters
  • langchain_loader (BaseLoader) –

  • name (str) –

  • owner (str) –

  • description (str) –

  • api_key (Optional[str]) –

  • load_semantic (bool) –

  • classifier_url (Optional[str]) –

__init__(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: Optional[str] = None, load_semantic: bool = False, classifier_url: Optional[str] = None)[source]¶
Parameters
  • langchain_loader (BaseLoader) –

  • name (str) –

  • owner (str) –

  • description (str) –

  • api_key (Optional[str]) –

  • load_semantic (bool) –

  • classifier_url (Optional[str]) –

async alazy_load() AsyncIterator[Document]¶

A lazy loader for Documents.

Return type

AsyncIterator[Document]

async aload() List[Document]¶

Load data into Document objects.

Return type

List[Document]

static calculate_content_size(page_content: str) int[source]¶

Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.

Parameters

page_content (str) – Data string.

Returns

Size of string in bytes.

Return type

int

static get_file_owner_from_path(file_path: str) str[source]¶

Fetch owner of local file path.

Parameters

file_path (str) – Local file path.

Returns

Name of owner.

Return type

str

get_source_size(source_path: str) int[source]¶

Fetch size of source path. Source can be a directory or a file.

Parameters

source_path (str) – Local path of data source.

Returns

Source size in bytes.

Return type

int

lazy_load() Iterator[Document][source]¶

Load documents in lazy fashion.

Raises
  • NotImplementedError – raised when lazy_load id not implemented

  • within wrapped loader. –

Yields

list – Documents from loader’s lazy loading.

Return type

Iterator[Document]

load() List[Document][source]¶

Load Documents.

Returns

Documents fetched from load method of the wrapped loader.

Return type

list

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

Return type

List[Document]

classmethod set_discover_sent() None[source]¶
Return type

None

classmethod set_loader_sent() None[source]¶
Return type

None

Examples using PebbloSafeLoader¶