- class*, device: Optional[str] = 'cuda', model_size: Optional[str] = None)[source]¶
Transcribe and parse audio files with faster-whisper.
faster-whisper is a reimplementation of OpenAI’s Whisper model using CTranslate2, which is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
It can automatically detect the following 14 languages and transcribe the text into their respective languages: en, zh, fr, de, ja, ko, ru, es, th, it, pt, vi, ar, tr.
The gitbub repository for faster-whisper is :
- Example: Load a YouTube video and transcribe the video speech into a document.
from langchain.document_loaders.generic import GenericLoader from import FasterWhisperParser from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader url="" save_dir="your_dir/" loader = GenericLoader( YoutubeAudioLoader([url],save_dir), FasterWhisperParser() ) docs = loader.load()
Initialize the parser.
- Parameters
device (Optional[str]) – It can be “cuda” or “cpu” based on the available device.
model_size (Optional[str]) – There are four model sizes to choose from: “base”, “small”, “medium”, and “large-v3”, based on the available GPU memory.
(*[, device, model_size])Initialize the parser.
(blob)Lazily parse the blob.
(blob)Eagerly parse the blob into a document or documents.
- __init__(*, device: Optional[str] = 'cuda', model_size: Optional[str] = None)[source]¶
Initialize the parser.
- Parameters
device (Optional[str]) – It can be “cuda” or “cpu” based on the available device.
model_size (Optional[str]) – There are four model sizes to choose from: “base”, “small”, “medium”, and “large-v3”, based on the available GPU memory.