Succinct Language Model Links

Double Array Language Model

N-gram language models are essential components in ASR and MT system. A practical problem is using large models with limited memory. This paper proposes a double array structure and compares the memory usage and speed to kenlm. The slides and code are also available online

Both Watanabe and Sorenson have proposed succinct LM structures which are not mentioned in this paper and have freely available implementations.

Pyfst on OSX 10.9 Mavericks

These are the steps to install pyfst on Mavericks, if CPPFLAGS doesn’t work try CFLAGS or CXXFLAGS. Without the -stdlib=libstdc++ the Cython wrapper will not compile correclty

sudo easy_install pip
sudo pip install virtualenv
mkdir ~/pyfst
cd ~/pyfst
virutalenv .
source bin/activate
pip install ipython
export CPPFLAGS="-stdlib=libstdc++"
pip install pyfst