Joint timbral and non-timbral speaker anonymisation

Bakari, Rayane; Le Blouch, Olivier; Gengembre, Nicolas; Evans, Nicholas

ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal

Voice anonymisation aims to conceal speaker identity while preserving linguistic content. Most approaches target predominantly timbral cues and often overlook non-timbral cues such as prosody, rhythm, speaking style and accent, which may still leak speaker-specific information related to voice identity after anonymisation. With this paper, we propose a speaker anonymisation system that explicitly obfuscates both timbral and non-timbral cues. Extensive experiments conducted within the VoicePrivacy Challenge framework show improved protection against attacks exploiting non-timbral information compared to state-of-the-art systems. For evaluation, we use a pair of complementary automatic speaker verification models to demonstrate improved anonymisation robustness by 32% relative to attacks which target either timbral and non-timbral cues. Results also show stronger anonymisation comes at the cost of only moderate degradation to intelligibility and naturalness.

Detail

BIBTEX

Type:

Conférence

City:

Lisbon

Date:

2026-06-23

Department:

Sécurité numérique

Eurecom Ref:

8840

© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal and is available at :