langchain_experimental.tabular_synthetic_data.base
.SyntheticDataGenerator¶
- class langchain_experimental.tabular_synthetic_data.base.SyntheticDataGenerator[source]¶
Bases:
BaseModel
Generate synthetic data using the given LLM and few-shot template.
Utilizes the provided LLM to produce synthetic data based on the few-shot prompt template.
- template¶
Template for few-shot prompting.
- llm¶
Large Language Model to use for generation.
- Type
Optional[BaseLanguageModel]
- example_input_key¶
Key to use for storing example inputs.
- Type
str
- Usage Example:
>>> template = FewShotPromptTemplate(...) >>> llm = BaseLanguageModel(...) >>> generator = SyntheticDataGenerator(template=template, llm=llm) >>> results = generator.generate(subject="climate change", runs=5)
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param example_input_key: str = 'example'¶
- param llm: Optional[BaseLanguageModel] = None¶
- param results: list = []¶
- param template: FewShotPromptTemplate [Required]¶
- async agenerate(subject: str, runs: int, extra: str = '', *args: Any, **kwargs: Any) List[str] [source]¶
Generate synthetic data using the given subject asynchronously.
Note: Since the LLM calls run concurrently, you may have fewer duplicates by adding specific instructions to the “extra” keyword argument.
- Parameters
subject (str) – The subject the synthetic data will be about.
runs (int) – Number of times to generate the data asynchronously.
extra (str) – Extra instructions for steerability in data generation.
args (Any) –
kwargs (Any) –
- Returns
List of generated synthetic data for the given subject.
- Return type
List[str]
- Usage Example:
>>> results = await generator.agenerate(subject="climate change", runs=5, extra="Focus on env impacts.")
- generate(subject: str, runs: int, *args: Any, **kwargs: Any) List[str] [source]¶
Generate synthetic data using the given subject string.
- Parameters
subject (str) – The subject the synthetic data will be about.
runs (int) – Number of times to generate the data.
extra (str) – Extra instructions for steerability in data generation.
args (Any) –
kwargs (Any) –
- Returns
List of generated synthetic data.
- Return type
List[str]
- Usage Example:
>>> results = generator.generate(subject="climate change", runs=5, extra="Focus on environmental impacts.")