CMUSphinx-Open Source Toolkit For Speech Recognition
2012-03-06 13:00
453 查看
http://cmusphinx.sourceforge.net/wiki/
可以从上面学习一些概念,比如:
An acoustic model contains acoustic properties for each senone. There are context-independent models that contain properties (most probable feature vectors for each phone) and context-dependent ones (built from senones with context).
A phonetic dictionary contains a mapping from words to phones. This mapping is not very effective. For example, only two to three pronunciation variants are noted in it, but it's practical enough most of the time. The dictionary is not the only variant of mapper from words to phones. It could be done with some complex function learned with a machine learning algorithm.
A language model is used to restrict word search. It defines which word could follow previously recognized words (remember that matching is a sequential process) and helps to significantly restrict the matching process by stripping words that are not probable. Most common language models used are n-gram language models-these contain statistics of word sequences-and finite state language models-these define speech sequences by finite state automation, sometimes with weights. To reach a good accuracy rate, your language model must be very successful in search space restriction. This means it should be very good at predicting the next word. A language model usually restricts the vocabulary considered to the words it contains. That's an issue for name recognition. To deal with this, a language model can contain smaller chunks like subwords or even phones. Please note that search space restriction in this case is usually worse and corresponding recognition accuracies are lower than with a word-based language model.
Those three entities are combined together in an engine to recognize speech. If you are going to apply your engine for some other language, you need to get such structures in place. For many languages there are acoustic models, phonetic dictionaries and even large vocabulary language models available for download.
可以从上面学习一些概念,比如:
Models
According to the speech structure, three models are used in speech recognition to do the match:An acoustic model contains acoustic properties for each senone. There are context-independent models that contain properties (most probable feature vectors for each phone) and context-dependent ones (built from senones with context).
A phonetic dictionary contains a mapping from words to phones. This mapping is not very effective. For example, only two to three pronunciation variants are noted in it, but it's practical enough most of the time. The dictionary is not the only variant of mapper from words to phones. It could be done with some complex function learned with a machine learning algorithm.
A language model is used to restrict word search. It defines which word could follow previously recognized words (remember that matching is a sequential process) and helps to significantly restrict the matching process by stripping words that are not probable. Most common language models used are n-gram language models-these contain statistics of word sequences-and finite state language models-these define speech sequences by finite state automation, sometimes with weights. To reach a good accuracy rate, your language model must be very successful in search space restriction. This means it should be very good at predicting the next word. A language model usually restricts the vocabulary considered to the words it contains. That's an issue for name recognition. To deal with this, a language model can contain smaller chunks like subwords or even phones. Please note that search space restriction in this case is usually worse and corresponding recognition accuracies are lower than with a word-based language model.
Those three entities are combined together in an engine to recognize speech. If you are going to apply your engine for some other language, you need to get such structures in place. For many languages there are acoustic models, phonetic dictionaries and even large vocabulary language models available for download.
相关文章推荐
- CMUSphinx Wiki--Open Source Toolkit For Speech Recognition
- TextUML Toolkit is an open-source IDE for UML
- iPhone开发常用开源库5---More useful open source libraries for iPhone development
- Open Source for Perimeter Security @ JDJ
- Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
- A new open source data set for anomaly detection
- GitHub - naver/pinpoint: Pinpoint is an open source APM (Application Performance Management) tool for large-scale distributed systems written in Java.
- Microsoft buildup new website for open-source project
- The Kaldi Speech Recognition Toolkit
- iPhone开源软件列表1---Four Open Source Apps for Your iPhone
- open source project for recommendation system
- Native Client (NaCl) is an open-source technology for running native compiled code in the browser
- Developing Web Services with Open Source - A quick start for Web services technology
- OpenProj: The OpenSource Solution for Managing Your Projects
- 125 open source Big Data architecture papers for data professionals
- Using Open Source .NET Tools for Sophisticated Builds
- Open Source Projects for the .NET Platform
- QT 4.6.0 (win32 opensource for VC2008) 安装,以及openssl和mysql支持
- A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage
- GitHub Open Source For iOS