langchain_openai.embeddings.base
.OpenAIEmbeddings¶
- class langchain_openai.embeddings.base.OpenAIEmbeddings[source]¶
Bases:
BaseModel
,Embeddings
OpenAI embedding models.
To use, you should have the environment variable
OPENAI_API_KEY
set with your API key or pass it as a named parameter to the constructor.In order to use the library with Microsoft Azure endpoints, use AzureOpenAIEmbeddings.
Example
from langchain_openai import OpenAIEmbeddings model = OpenAIEmbeddings(model="text-embedding-3-large")
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param allowed_special: Optional[Union[Literal['all'], Set[str]]] = None¶
- param check_embedding_ctx_length: bool = True¶
Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.
- param chunk_size: int = 1000¶
Maximum number of texts to embed in each batch
- param default_headers: Optional[Mapping[str, str]] = None¶
- param default_query: Optional[Mapping[str, object]] = None¶
- param deployment: Optional[str] = 'text-embedding-ada-002'¶
- param dimensions: Optional[int] = None¶
The number of dimensions the resulting output embeddings should have.
Only supported in text-embedding-3 and later models.
- param disallowed_special: Optional[Union[Literal['all'], Set[str], Sequence[str]]] = None¶
- param embedding_ctx_length: int = 8191¶
The maximum number of tokens to embed at once.
- param headers: Any = None¶
- param http_async_client: Optional[Any] = None¶
Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.
- param http_client: Optional[Any] = None¶
Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.
- param max_retries: int = 2¶
Maximum number of retries to make when generating.
- param model: str = 'text-embedding-ada-002'¶
- param model_kwargs: Dict[str, Any] [Optional]¶
Holds any model parameters valid for create call not explicitly specified.
- param openai_api_base: Optional[str] = None (alias 'base_url')¶
Base URL path for API requests, leave blank if not using a proxy or service emulator.
- param openai_api_key: Optional[SecretStr] = None (alias 'api_key')¶
Automatically inferred from env var OPENAI_API_KEY if not provided.
- Constraints
type = string
writeOnly = True
format = password
- param openai_api_type: Optional[str] = None¶
- param openai_api_version: Optional[str] = None (alias 'api_version')¶
Automatically inferred from env var OPENAI_API_VERSION if not provided.
- param openai_organization: Optional[str] = None (alias 'organization')¶
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- param openai_proxy: Optional[str] = None¶
- param request_timeout: Optional[Union[float, Tuple[float, float], Any]] = None (alias 'timeout')¶
Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.
- param retry_max_seconds: int = 20¶
Max number of seconds to wait between retries
- param retry_min_seconds: int = 4¶
Min number of seconds to wait between retries
- param show_progress_bar: bool = False¶
Whether to show a progress bar when embedding.
- param skip_empty: bool = False¶
Whether to skip empty strings when embedding or raise an error. Defaults to not skipping.
- param tiktoken_enabled: bool = True¶
Set this to False for non-OpenAI implementations of the embeddings API, e.g. the –extensions openai extension for text-generation-webui
- param tiktoken_model_name: Optional[str] = None¶
The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- async aembed_documents(texts: List[str], chunk_size: Optional[int] = 0) List[List[float]] [source]¶
Call out to OpenAI’s embedding endpoint async for embedding search docs.
- Parameters
texts (List[str]) – The list of texts to embed.
chunk_size (Optional[int]) – The chunk size of embeddings. If None, will use the chunk size specified by the class.
- Returns
List of embeddings, one for each text.
- Return type
List[List[float]]
- async aembed_query(text: str) List[float] [source]¶
Call out to OpenAI’s embedding endpoint async for embedding query text.
- Parameters
text (str) – The text to embed.
- Returns
Embedding for the text.
- Return type
List[float]
- classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model ¶
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values
- Parameters
_fields_set (Optional[SetStr]) –
values (Any) –
- Return type
Model
- copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model ¶
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to include in new model
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to exclude from new model, as with values this takes precedence over include
update (Optional[DictStrAny]) – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep (bool) – set to True to make a deep copy of the model
self (Model) –
- Returns
new model instance
- Return type
Model
- dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny ¶
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- Parameters
include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –
- Return type
DictStrAny
- embed_documents(texts: List[str], chunk_size: Optional[int] = 0) List[List[float]] [source]¶
Call out to OpenAI’s embedding endpoint for embedding search docs.
- Parameters
texts (List[str]) – The list of texts to embed.
chunk_size (Optional[int]) – The chunk size of embeddings. If None, will use the chunk size specified by the class.
- Returns
List of embeddings, one for each text.
- Return type
List[List[float]]
- embed_query(text: str) List[float] [source]¶
Call out to OpenAI’s embedding endpoint for embedding query text.
- Parameters
text (str) – The text to embed.
- Returns
Embedding for the text.
- Return type
List[float]
- classmethod from_orm(obj: Any) Model ¶
- Parameters
obj (Any) –
- Return type
Model
- json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode ¶
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- Parameters
include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –
by_alias (bool) –
skip_defaults (Optional[bool]) –
exclude_unset (bool) –
exclude_defaults (bool) –
exclude_none (bool) –
encoder (Optional[Callable[[Any], Any]]) –
models_as_dict (bool) –
dumps_kwargs (Any) –
- Return type
unicode
- classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ¶
- Parameters
path (Union[str, Path]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –
- Return type
Model
- classmethod parse_obj(obj: Any) Model ¶
- Parameters
obj (Any) –
- Return type
Model
- classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ¶
- Parameters
b (Union[str, bytes]) –
content_type (unicode) –
encoding (unicode) –
proto (Protocol) –
allow_pickle (bool) –
- Return type
Model
- classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny ¶
- Parameters
by_alias (bool) –
ref_template (unicode) –
- Return type
DictStrAny
- classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode ¶
- Parameters
by_alias (bool) –
ref_template (unicode) –
dumps_kwargs (Any) –
- Return type
unicode
- classmethod update_forward_refs(**localns: Any) None ¶
Try to update ForwardRefs on fields based on this Model, globalns and localns.
- Parameters
localns (Any) –
- Return type
None
- classmethod validate(value: Any) Model ¶
- Parameters
value (Any) –
- Return type
Model
Examples using OpenAIEmbeddings¶
%pip install -qU langchain langchain-community langchain-openai faker langchain-chroma
%pip install -qU langchain langchain-community langchain-openai langchain-chroma
> ChatPromptValue(messages=[HumanMessage(content=’tell me a short joke about ice cream’)])
Adding values to chain state {#adding-values-to-chain-state}
Clean up KDB.AI “documents” table and index for similarity search
Dynamically route logic based on input {#dynamically-route-logic-based-on-input}
Establishing a connection to the database is facilitated through the singlestoredb Python connector.
Get an OpenAI token: https://platform.openai.com/account/api-keys
If using the default Docker installation, use this instantiation instead:
Pip install necessary package {#pip-install-necessary-package}
QA with private data protection {#qa-with-private-data-protection}
The input schema of the chain is the input schema of its first part, the prompt.
This is a prompt template used to format each individual example.
To make the caching really obvious, lets use a slower model.
Uncomment this to install psychicapi if you don’t already have it installed
Use Meilisearch vector store to store texts & associated embeddings as vector
connection to redis standalone at localhost, db 0, no password
from langchain_community.embeddings.openai import OpenAIEmbeddings
in case if some queries fail consider installing libdeeplake manually
set the environment variables needed for openai package to know to reach out to azure