NATIVE ACCENT SENSITIVE VOICE CLONING USING PAIRWISE RANKING BASED DECODER MODELS, 122-129.

Chetan Madan, Harshita Diddee, Deepika Kumar, Shilpa Gupta, Shivani Jindal, Mansi Lal, and Chiranjeev

Keywords

Audio, encoding, speech enhancement, voice analysis, accent classi-fication

Abstract

Voice cloning has become one of the most significant applications of artificial intelligence (AI) infrastructures, owing to its common use in education, multimedia, and security domains. While extensive research has been carried out in the said domain, most existing systems do not achieve optimum voice naturalization, one probable reason of which can be attributed to the inability of systems to accurately map the user’s native dialect. Aiming to tackle this pitfall, this research proposes a generative decoder model which aims to clone the user’s voice while capturing the native accent and linguistic features of the speaker to provide a more naturalized synthesized output voice. The proposed methodology achieves an equal error rate of 0.051 for the speaker encoder module of the system, while achieving an accuracy of 98.12% on the accent adaptation module of the system.

Important Links:



Go Back