langchain_text_splitters.base.Tokenizer

class langchain_text_splitters.base.Tokenizer(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]])[source]

Tokenizer data class.

Attributes

chunk_overlap

Overlap in tokens between chunks

tokens_per_chunk

Maximum number of tokens per chunk

decode

Function to decode a list of token ids to a string

encode

Function to encode a string to a list of token ids

Methods

__init__(chunk_overlap, tokens_per_chunk, ...)

Parameters
  • chunk_overlap (int) –

  • tokens_per_chunk (int) –

  • decode (Callable[[List[int]], str]) –

  • encode (Callable[[str], List[int]]) –

__init__(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]]) None
Parameters
  • chunk_overlap (int) –

  • tokens_per_chunk (int) –

  • decode (Callable[[List[int]], str]) –

  • encode (Callable[[str], List[int]]) –

Return type

None