langchain.document_loaders.chromium.AsyncChromiumLoader

class langchain.document_loaders.chromium.AsyncChromiumLoader(urls: List[str])[source]

Scrape HTML pages from URLs using a headless instance of the Chromium.

Initialize the loader with a list of URL paths.

Parameters

urls (List[str]) – A list of URLs to scrape content from.

Raises

ImportError – If the required ‘playwright’ package is not installed.

Methods

__init__(urls)

Initialize the loader with a list of URL paths.

ascrape_playwright(url)

Asynchronously scrape the content of a given URL using Playwright's async API.

lazy_load()

Lazily load text content from the provided URLs.

load()

Load and return all Documents from the provided URLs.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(urls: List[str])[source]

Initialize the loader with a list of URL paths.

Parameters

urls (List[str]) – A list of URLs to scrape content from.

Raises

ImportError – If the required ‘playwright’ package is not installed.

async ascrape_playwright(url: str) str[source]

Asynchronously scrape the content of a given URL using Playwright’s async API.

Parameters

url (str) – The URL to scrape.

Returns

The scraped HTML content or an error message if an exception occurs.

Return type

str

lazy_load() Iterator[Document][source]

Lazily load text content from the provided URLs.

This method yields Documents one at a time as they’re scraped, instead of waiting to scrape all URLs before returning.

Yields

Document – The scraped content encapsulated within a Document object.

load() List[Document][source]

Load and return all Documents from the provided URLs.

Returns

A list of Document objects containing the scraped content from each URL.

Return type

List[Document]

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]

Load Documents and split into chunks. Chunks are returned as Documents.

Parameters

text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

Examples using AsyncChromiumLoader