Sound Group

We are carrying on some researches about sound/speech processing for embedded platforms. Our works involve fundamental research and practical implementation for 3D sound effect, some voice technologies, and so on.

3D Sound Effects for Embedded Systems

3d.jpg

A certain sound is influenced by the surrounding environment such as diffraction and reflection. Subsequently, the sound arrives at the both eardrums of the listener's ears, and then the listener perceives the sound by an auditory organ. The spatial image perceived by human auditory sense in such a way is referred to as a "sound image," and perception of direction and distance of a sound image is referred to as "sound localization."

If the factors of sound localization are explicated and can be controlled by digital signal processing, a listener can experience the virtual sound to the same degree as experiencing the actual sound in the real world. Our group is researching the method to simulate transfer functions from a sound source to a listener at low computational cost. Our method can reduce computational cost based on feature extraction of the functions so that real-time implementation in embedded platforms is applicable.

For more details, please visit here

Direction of Arrival Estimation of Speech using Microphone Array

doa.jpg

There are numerous examples of applications currently incorporating peech direction of arrival (DOA) estimation of human speech, such as video conferencing, sound recording enhancement, speech recognition and recently interactive robots.

Current DOA estimation systems have to deal with the unavoidable trade-off between accuracy and system size. In other words, to achieve higher accuracy, the number of elements needs to be increased and with that array size. However, large size arrays have a number of issues preventing it from being used by small, low cost applications. We are considering highly accurate DOA estimation systems for speech with two channel microphone array, based on non-utterance frame omission and fundamental frequency selection.

Speech Recognition for Embedded Systems

onnsei_ninnshiki.png

Speech recognition technology is expected to provide more convenient and natural interface than existing input devices such as key-board, ten-key pad, and so on. Although continuous speech recognition allows a user to speak without pauses among words, the process incurs high computation load and thus it is difficult to realize practical recognition system on embedded platform.

Motivated by this technical issue, we are devising an efficient pipeline technique of continuous speech recognition for embedded system implementation. In our work, a continuous speech recognition system Julius is employed and an ARM is used as the embedded processor.


Last-modified: 2010-03-20 (Sat) 22:35:33