langchain_experimental.data_anonymizer.presidio.PresidioAnonymizerBase

class langchain_experimental.data_anonymizer.presidio.PresidioAnonymizerBase(analyzed_fields: Optional[List[str]] = None, operators: Optional[Dict[str, OperatorConfig]] = None, languages_config: Optional[Dict] = None, add_default_faker_operators: bool = True, faker_seed: Optional[int] = None)[source]

Base Anonymizer using Microsoft Presidio.

See more: https://microsoft.github.io/presidio/

Parameters
  • analyzed_fields (Optional[List[str]]) – List of fields to detect and then anonymize. Defaults to all entities supported by Microsoft Presidio.

  • operators (Optional[Dict[str, OperatorConfig]]) – Operators to use for anonymization. Operators allow for custom anonymization of detected PII. Learn more: https://microsoft.github.io/presidio/tutorial/10_simple_anonymization/

  • languages_config (Optional[Dict]) – Configuration for the NLP engine. First language in the list will be used as the main language in self.anonymize(…) when no language is specified. Learn more: https://microsoft.github.io/presidio/analyzer/customizing_nlp_models/

  • faker_seed (Optional[int]) – Seed used to initialize faker. Defaults to None, in which case faker will be seeded randomly and provide random values.

  • add_default_faker_operators (bool) –

Methods

__init__([analyzed_fields, operators, ...])

param analyzed_fields

List of fields to detect and then anonymize.

add_operators(operators)

Add operators to the anonymizer

add_recognizer(recognizer)

Add a recognizer to the analyzer

anonymize(text[, language, allow_list])

Anonymize text.

__init__(analyzed_fields: Optional[List[str]] = None, operators: Optional[Dict[str, OperatorConfig]] = None, languages_config: Optional[Dict] = None, add_default_faker_operators: bool = True, faker_seed: Optional[int] = None)[source]
Parameters
  • analyzed_fields (Optional[List[str]]) – List of fields to detect and then anonymize. Defaults to all entities supported by Microsoft Presidio.

  • operators (Optional[Dict[str, OperatorConfig]]) – Operators to use for anonymization. Operators allow for custom anonymization of detected PII. Learn more: https://microsoft.github.io/presidio/tutorial/10_simple_anonymization/

  • languages_config (Optional[Dict]) – Configuration for the NLP engine. First language in the list will be used as the main language in self.anonymize(…) when no language is specified. Learn more: https://microsoft.github.io/presidio/analyzer/customizing_nlp_models/

  • faker_seed (Optional[int]) – Seed used to initialize faker. Defaults to None, in which case faker will be seeded randomly and provide random values.

  • add_default_faker_operators (bool) –

add_operators(operators: Dict[str, OperatorConfig]) None[source]

Add operators to the anonymizer

Parameters

operators (Dict[str, OperatorConfig]) – Operators to add to the anonymizer.

Return type

None

add_recognizer(recognizer: EntityRecognizer) None[source]

Add a recognizer to the analyzer

Parameters

recognizer (EntityRecognizer) – Recognizer to add to the analyzer.

Return type

None

anonymize(text: str, language: Optional[str] = None, allow_list: Optional[List[str]] = None) str

Anonymize text.

Parameters
  • text (str) –

  • language (Optional[str]) –

  • allow_list (Optional[List[str]]) –

Return type

str