A Mutual Information Algorithm for Text-Independent Voice Conversion
Citation
Seyed Mehdi Iranmanesh, Behnam Dehghan"A Mutual Information Algorithm for Text-Independent Voice Conversion", International Journal of Engineering Trends and Technology (IJETT), V30(8),400-404 December 2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group
Abstract
Most of voice conversion systems require
parallel corpora for their training stage which means
that source and target speakers should utter the same
sentences. But in most practical applications, it is
impossible to obtain parallel corpora. To solve this
problem, text-independent voice conversion has been
introduced. The main problem in text-independent
voice conversion is data alignment. In this paper we
introduce a novel algorithm based on mutual
information for data alignment which shows the
similar results to those of text-dependent systems. This
algorithm does not require phonetic labeling and can
be used in practical applications.
References
[1] Z. Cao and N. Schmid. Matching heterogeneous periocular
regions: Short and long standoff distances. In Image Processing
(ICIP), 2014 IEEE International Conference on, pages 4967–4971,
Oct 2014.
[2] Rabiner L, Juang B-H. Fundamental of Speech Recognition. NJ:
Prentice Hall; 1993
[3] Motiian S, Pergami P, Guffey K, Mancinelli C.A, Doretto G.
Automated extraction and validation of children?s gait parameters
with the Kinect. BioMedEngOnLine. 2015;14:112.
[4] S. Sempena, N. Maulidevi, P. Aryan, Human action recognition
using dynamic time warping, ICEEI, IEEE (2011) 1–5.
[5] M. Toman, M. Pucher, S. Moosmuller, and D. Schabus,
“Unsupervised interpolation of language varieties for speech
synthesis,” Speech Communication, 2015.
[6] J. Dean et aI., "Large scale distributed deep networks," NIPS,
2012. [II] L. Deng and X. Li. "Machine learning paradigms for
speech recognition: An overview," IEEE Trans. Audio, Speech &
Lang. Proc., Vol. 21, No. 5, May 2013.
[7] S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black,
and K. Prahallad, “Voice conversion using artificial neural
networks,” Proc. ICASSP, pp. 3893–3896, 2009.
[8] S. Motiian, K. Feng, H. Bharthavarapu, S. Sharlemin, and G.
Doretto, “Pairwise kernels for human interaction recognition,” in
Advances in Visual Computing, 2013, vol. 8034, pp. 210–221.
[9] C. W. Han, T. G. Kang, D. H. Hong, N. S. Kim, K. Eom, and J.
Lee, “Switching linear dynamic transducer for stereo data based
speech feature mapping,” in Proc. IEEE ICASSP, May 2011, pp.
4776–4779.
[10] N. S. Kim, T. G. Kang, S. J. Kang, C. W. Han, and D. H. Hong,
“Speech feature mapping based on switching linear dynamic
system,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2,
pp. 620–631, Feb. 2012.
[11] M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice
conversion through vector quantization," in Acoustics, Speech, and
Signal Processing, 1988. ICASSP-88., 1988 International
Conference on, 1988, pp. 655-658 vol.1.
[12] Y. Stylianou, O. Cappe, and E. Moulines, "Continuous
probabilistic transform for voice conversion," Speech and Audio
Processing, IEEE Transactions on, vol. 6, pp. 131-142, 1998
[13] A. Kain and M. W. Macon, "Spectral voice conversion for textto-
speech synthesis," in Acoustics, Speech and Signal Processing,
1998. Proceedings of the 1998 IEEE International Conference on,
1998, pp. 285-288 vol.1.
[14] Machado, A.F. and Queirozm M.: „Voice Conversion: A
Critical Survey?, Proceedings of Sound and Music Computing
(SMC) (2010).
[15] Y. Stylianou, “Voice transformation: a survey,” in ICASSP
2009.
[16] A. B. D. Sündermann, H. Ney, and H. Höge, "A first step
towards text-independent voice conversion," in in Proc. Int. Conf.
Spoken Lang. Process, 2004, pp. 1173–1176
[17] D. Erro, A. Moreno, and A. Bonafonte, "INCA Algorithm for
Training Voice Conversion Systems From Nonparallel Corpora,"
Audio, Speech, and Language Processing, IEEE Transactions on,
vol. 18, pp. 944-953, 2010
[18] Z. Cao and N. A. Schmid, “Fusion of operators for
heterogeneous periocular recognition at varying ranges,” IEEE
International Conference on Image Processing, 2014, pp. 4967-4971.
[19] Fazlollah M. Reza, “An Introduction to Information Theory.
Dover Publications,” Inc, New York. ISBN 0-486-68210-2.
[20] T. Toda, A. W. Black, and K. Tokuda, "Voice Conversion
Based on Maximum-Likelihood Estimation of Spectral Parameter
Trajectory," Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 15, pp. 2222-2235, 2007.
[21] D. Chazan, R. Hoory, G. Cohen, and M. Zibulski, "Speech
reconstruction from mel frequency cepstral coefficients and pitch
frequency," in Acoustics, Speech, and Signal Processing, 2000
[22] J. Tao, M. Zhang, J. Nurminen, J. Tian, X. Wang, “Supervisory
Data Alignment for Text-Independent Voice Conversion,” IEEE
TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE
PROCESSING, VOL. 18, NO. 5, JULY 2010
Keywords
Text-Independent Voice conversion,
Mutual Information, Frame alignment, Mel cepstral
frequency warping.