Home
arrow CSLM
CSLM

The CSLM toolkit is open-source software which implements the so-called continuous space language model.

 

This email address is being protected from spam bots, you need Javascript enabled to view it , LIUM, University of Le Mans, France

The basic idea of this approach is to project the word indices onto a continuous space and to use a probability estimator operating on this space. Since the resulting probability functions are smooth functions of the word representation, better generalization to unknown events can be expected. A neural network can be used to simultaneously learn the projection of the words onto the continuous space and to estimate the n-gram probabilities. This is still a n-gram approach, but the LM probabilities are interpolated for any possible context of length n-1 instead of backing-off to shorter contexts. This approach was successfully used in large vocabulary continuous speech recognition and in phrase-based SMT systems.

Detailed informations are available in the following publications:

  • Holger Schwenk, Continuous Space Language Models , in Computer Speech and Language, volume 21, pages 492-518, 2007.
  • Holger Schwenk, Continuous Space Language Models For Statistical Machine Translation, The Prague Bulletin of Mathematical Linguistics, number 83, pages 137-146, 2010.

When using this software, please cite those references.

 


 

Downloads

Version
 DateDescription
 Download
 1.0 Jan 28 2010
 Initial version of the toolkit
  cslm_v1.0.tgz

 

The toolkit will be frequently updated. You can subscribe to the This email address is being protected from spam bots, you need Javascript enabled to view it , and you will be be informed when new versions will be available.

 
© 2017 Les outils du lium

Joomla! is Free Software released under the GNU/GPL License.