For the last post I will make for our blog, I want to say a little bit about the recent condition and the future about speaker recognition.
Although speaker recognition have already been researched for several decades, it still have a long way to go. We have good algorithms that can get the recognition accuracy to more than 98%, even 100% for very long time training. Those prototypes are more aiming to high performance instead of computation cost and so on. So computation time is a barrier that stop the system to be used in some real system like banking system etc. An 630 people’s group needs few hours to generate the model, we can’t imagine how many time is needed for a bank where have millions of people. But at least you can divide the calculation in different processors which however needs investigate more money.
Another barrier is that the high performances are getting from very clean speech where don’t even have silence or other voice instead of the speaker. However, in real application and devices, we are difficult to filter out all the noise and other sounds, these will give a big challenge to the frond-end processing and the innovation of good recording hardware.
The biggest problem will still be the security problems, since we can simply mimic a people’s voice or record that, that will make the speaker recognition system more complex to detect the channel difference from the source and receiver. Some sophistic algorithms will be involved in.
So, a very good speaker recognition system needs enormous peripherals, and the balance between the core system and the outside processing parts. All in all, it seems we still need a long way to go to achieve a simple, accurate, secure speaker recognition system!