Speech Transcription & Diarization
Turning hours of multi-speaker interview audio into labeled transcripts.
Overview
A local pipeline built to transcribe and diarize research-interview recordings, converting many hours of multi-speaker, multi-language audio into accurate, speaker-attributed transcripts for qualitative analysis.
Approach
Whisper large-v3-turbo for transcription across many languages, paired with ECAPA-TDNN speaker embeddings and eigengap-based clustering (with VBx refinement) for diarization. Benchmarked across model-quality configurations; runs entirely locally.
Scope
A focused solo build supporting an academic research programme.