Speech conveys linguistic, speaker and environmental information. The objective of voice conversion is transforming the source speaker's voice characteristic to the target one, and preserving the other two parts of information. According to the voice production mechanism of human, the classical source-filter model was proposed to represent the speech signal, in which it was decomposed into glottal excitation signal and vocal tract acoustic filters. Each part of the signal is then represented by acoustic correlates which relate to anatomical or control device of the speech production mechanism. Acoustic correlates consist of pitch level, pitch range and pitch dynamics, formant locations and formant dynamics, and spectral shape and its dynamics. The glottal pulse and the aspiration and constriction noise correlates to the voice quality. Timing parameters, i.e. speaking rate and fluency, also contribute to the speaker's voice characteristics. None of the existing voice conversion systems transforms all of those acoustic correlates between the two speakers. Instead, the most prevailing method chooses to map the spectrum and the pitch. The converted utterance, which is produced by the speaker-dependent voice conversion system, maintains its naturalness and reaches a good similarity to the voice from the target speaker. Unfortunately, the mapping function estimated from a large amount of training utterances is only suitable for this particular conversion between the two certain speakers. To overcome the limitation, a novel source speaker-independent conversion framework is proposed. Firstly, the target tracks of the pitch and the first three formants are predicted from the MFCC (Mel-Frequency Cepstrum Coefficients) vectors of the source utterance with the trained SVR (Support Vector Regression) models.Then, the parameters from STRAIGHT (Speech Transform and Representation using Adaptive Interpolation of weiGHTed spectrogram) analysis are transformed according to the predicted tracks. Finally, the converted voice is synthesized from the modified parameters.
修改评论