langchain.document_loaders.tensorflow_datasets.TensorflowDatasetLoader

class langchain.document_loaders.tensorflow_datasets.TensorflowDatasetLoader(dataset_name: str, split_name: str, load_max_docs: Optional[int] = 100, sample_to_document_function: Optional[Callable[[Dict], Document]] = None)[source]

Load from TensorFlow Dataset.

dataset_name

the name of the dataset to load

split_name

the name of the split to load.

load_max_docs

a limit to the number of loaded documents. Defaults to 100.

sample_to_document_function

a function that converts a dataset sample into a Document

Example

from langchain.document_loaders import TensorflowDatasetLoader

def mlqaen_example_to_document(example: dict) -> Document:
    return Document(
        page_content=decode_to_str(example["context"]),
        metadata={
            "id": decode_to_str(example["id"]),
            "title": decode_to_str(example["title"]),
            "question": decode_to_str(example["question"]),
            "answer": decode_to_str(example["answers"]["text"][0]),
        },
    )

tsds_client = TensorflowDatasetLoader(
        dataset_name="mlqa/en",
        split_name="test",
        load_max_docs=100,
        sample_to_document_function=mlqaen_example_to_document,
    )

Initialize the TensorflowDatasetLoader.

Parameters
  • dataset_name – the name of the dataset to load

  • split_name – the name of the split to load.

  • load_max_docs – a limit to the number of loaded documents. Defaults to 100.

  • sample_to_document_function – a function that converts a dataset sample into a Document.

Attributes

load_max_docs

The maximum number of documents to load.

sample_to_document_function

Custom function that transform a dataset sample into a Document.

Methods

__init__(dataset_name, split_name[, ...])

Initialize the TensorflowDatasetLoader.

lazy_load()

A lazy loader for Documents.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(dataset_name: str, split_name: str, load_max_docs: Optional[int] = 100, sample_to_document_function: Optional[Callable[[Dict], Document]] = None)[source]

Initialize the TensorflowDatasetLoader.

Parameters
  • dataset_name – the name of the dataset to load

  • split_name – the name of the split to load.

  • load_max_docs – a limit to the number of loaded documents. Defaults to 100.

  • sample_to_document_function – a function that converts a dataset sample into a Document.

lazy_load() Iterator[Document][source]

A lazy loader for Documents.

load() List[Document][source]

Load data into Document objects.

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]

Load Documents and split into chunks. Chunks are returned as Documents.

Parameters

text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

Examples using TensorflowDatasetLoader