langchain.embeddings.openai.OpenAIEmbeddings

class langchain.embeddings.openai.OpenAIEmbeddings[source]

Bases: BaseModel, Embeddings

OpenAI embedding models.

To use, you should have the openai python package installed, and the environment variable OPENAI_API_KEY set with your API key or pass it as a named parameter to the constructor.

Example

from langchain.embeddings import OpenAIEmbeddings
openai = OpenAIEmbeddings(openai_api_key="my-api-key")

In order to use the library with Microsoft Azure endpoints, you need to set the OPENAI_API_TYPE, OPENAI_API_BASE, OPENAI_API_KEY and OPENAI_API_VERSION. The OPENAI_API_TYPE must be set to ‘azure’ and the others correspond to the properties of your endpoint. In addition, the deployment name must be passed as the model parameter.

Example

import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://<your-endpoint.openai.azure.com/"
os.environ["OPENAI_API_KEY"] = "your AzureOpenAI key"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["OPENAI_PROXY"] = "http://your-corporate-proxy:8080"

from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
    deployment="your-embeddings-deployment-name",
    model="your-embeddings-model-name",
    openai_api_base="https://your-endpoint.openai.azure.com/",
    openai_api_type="azure",
)
text = "This is a test query."
query_result = embeddings.embed_query(text)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param allowed_special: Union[Literal['all'], Set[str]] = {}
param chunk_size: int = 1000

Maximum number of texts to embed in each batch

param default_headers: Optional[Mapping[str, str]] = None
param default_query: Optional[Mapping[str, object]] = None
param deployment: Optional[str] = 'text-embedding-ada-002'
param disallowed_special: Union[Literal['all'], Set[str], Sequence[str]] = 'all'
param embedding_ctx_length: int = 8191

The maximum number of tokens to embed at once.

param headers: Any = None
param http_client: Optional[Any] = None

Optional httpx.Client.

param max_retries: int = 2

Maximum number of retries to make when generating.

param model: str = 'text-embedding-ada-002'
param model_kwargs: Dict[str, Any] [Optional]

Holds any model parameters valid for create call not explicitly specified.

param openai_api_base: Optional[str] = None (alias 'base_url')

Base URL path for API requests, leave blank if not using a proxy or service emulator.

param openai_api_key: Optional[str] = None (alias 'api_key')

Automatically inferred from env var OPENAI_API_KEY if not provided.

param openai_api_type: Optional[str] = None
param openai_api_version: Optional[str] = None (alias 'api_version')

Automatically inferred from env var OPENAI_API_VERSION if not provided.

param openai_organization: Optional[str] = None (alias 'organization')

Automatically inferred from env var OPENAI_ORG_ID if not provided.

param openai_proxy: Optional[str] = None
param request_timeout: Optional[Union[float, Tuple[float, float], Any]] = None (alias 'timeout')

Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.

param show_progress_bar: bool = False

Whether to show a progress bar when embedding.

param skip_empty: bool = False

Whether to skip empty strings when embedding or raise an error. Defaults to not skipping.

param tiktoken_model_name: Optional[str] = None

The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

async aembed_documents(texts: List[str], chunk_size: Optional[int] = 0) List[List[float]][source]

Call out to OpenAI’s embedding endpoint async for embedding search docs.

Parameters
  • texts – The list of texts to embed.

  • chunk_size – The chunk size of embeddings. If None, will use the chunk size specified by the class.

Returns

List of embeddings, one for each text.

async aembed_query(text: str) List[float][source]

Call out to OpenAI’s embedding endpoint async for embedding query text.

Parameters

text – The text to embed.

Returns

Embedding for the text.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

embed_documents(texts: List[str], chunk_size: Optional[int] = 0) List[List[float]][source]

Call out to OpenAI’s embedding endpoint for embedding search docs.

Parameters
  • texts – The list of texts to embed.

  • chunk_size – The chunk size of embeddings. If None, will use the chunk size specified by the class.

Returns

List of embeddings, one for each text.

embed_query(text: str) List[float][source]

Call out to OpenAI’s embedding endpoint for embedding query text.

Parameters

text – The text to embed.

Returns

Embedding for the text.

classmethod from_orm(obj: Any) Model
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model

Examples using OpenAIEmbeddings