langchain_experimental.tabular_synthetic_data.base.SyntheticDataGenerator

class langchain_experimental.tabular_synthetic_data.base.SyntheticDataGenerator[source]

Bases: BaseModel

Generate synthetic data using the given LLM and few-shot template.

Utilizes the provided LLM to produce synthetic data based on the few-shot prompt template.

template

Template for few-shot prompting.

Type

FewShotPromptTemplate

llm

Large Language Model to use for generation.

Type

Optional[BaseLanguageModel]

llm_chain

LLM chain with the LLM and few-shot template.

Type

Optional[Chain]

example_input_key

Key to use for storing example inputs.

Type

str

Usage Example:
>>> template = FewShotPromptTemplate(...)
>>> llm = BaseLanguageModel(...)
>>> generator = SyntheticDataGenerator(template=template, llm=llm)
>>> results = generator.generate(subject="climate change", runs=5)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param example_input_key: str = 'example'
param llm: Optional[BaseLanguageModel] = None
param llm_chain: Optional[Chain] = None
param results: list = []
param template: FewShotPromptTemplate [Required]
async agenerate(subject: str, runs: int, extra: str = '', *args: Any, **kwargs: Any) List[str][source]

Generate synthetic data using the given subject asynchronously.

Note: Since the LLM calls run concurrently, you may have fewer duplicates by adding specific instructions to the “extra” keyword argument.

Parameters
  • subject (str) – The subject the synthetic data will be about.

  • runs (int) – Number of times to generate the data asynchronously.

  • extra (str) – Extra instructions for steerability in data generation.

  • args (Any) –

  • kwargs (Any) –

Returns

List of generated synthetic data for the given subject.

Return type

List[str]

Usage Example:
>>> results = await generator.agenerate(subject="climate change", runs=5,
extra="Focus on env impacts.")
generate(subject: str, runs: int, *args: Any, **kwargs: Any) List[str][source]

Generate synthetic data using the given subject string.

Parameters
  • subject (str) – The subject the synthetic data will be about.

  • runs (int) – Number of times to generate the data.

  • extra (str) – Extra instructions for steerability in data generation.

  • args (Any) –

  • kwargs (Any) –

Returns

List of generated synthetic data.

Return type

List[str]

Usage Example:
>>> results = generator.generate(subject="climate change", runs=5,
extra="Focus on environmental impacts.")