Joint timbral and non-timbral speaker anonymisation

Bakari, Rayane; Le Blouch, Olivier; Gengembre, Nicolas; Evans, Nicholas
ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal

Voice anonymisation aims to conceal speaker identity while preserving linguistic content. Most approaches target predominantly timbral cues and often overlook non-timbral cues such as prosody, rhythm, speaking style and accent, which may still leak speaker-specific information related to voice identity after anonymisation. With this paper, we propose a speaker anonymisation system that explicitly obfuscates both timbral and non-timbral cues. Extensive experiments conducted within the VoicePrivacy Challenge framework show improved protection against attacks exploiting non-timbral information compared to state-of-the-art systems. For evaluation, we use a pair of complementary automatic speaker verification models to demonstrate improved anonymisation robustness by 32% relative to attacks which target either timbral and non-timbral cues. Results also show stronger anonymisation comes at the cost of only moderate degradation to intelligibility and naturalness.


Type:
Conférence
City:
Lisbon
Date:
2026-06-23
Department:
Sécurité numérique
Eurecom Ref:
8840
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal and is available at :

PERMALINK : https://www.eurecom.fr/publication/8840