gzstreamfst.md

refr has a really nice and simple to use gzip stream implementation. It is possible to zip several OpenFst FSTs togeather (or anything else) and read them using the following snippet.

#include <fst/fstlib.h>
#include <iostream>
#include "gzstream.h"

using namespace std;
using namespace fst;

int main() {
  igzstream strm("fsts.gz");
  StdFst* fst1 = StdFst::Read(strm, FstReadOptions());
  cout << "# of states " << CountStates(*fst) << endl;
  //
  //..
  //Read some more fsts from the same stream
  //..
  //
  return 0;
};

Extract the gzstream.{C,h} files and to build on OSX 10.9 (remove the stdlib flag for other compilers/platforms)

   g++ -O2 -o read gzstream.C test.cc -I/usr/local/include/ -stdlib=libstdc++ -lz -lfst

Unique nbest list with OpenFst

Wrapper function to generate a unique n-best list with OpenFst’s ShortestPath algorithm

template<class Arc>
void UniqueNbest(const Fst<Arc>& fst, int n, MutableFst<Arc>* ofst) {
  VectorFst<Arc> ifst(fst);
  Project(&ifst, PROJECT_OUTPUT);
  RmEpsilon(&ifst);
  vector<typename Arc::Weight> d;
  typedef AutoQueue<typename Arc::StateId> Q;
  AnyArcFilter<Arc> filter;
  Q q(ifst, &d, filter);
  ShortestPathOptions<Arc, Q, AnyArcFilter<Arc> > opts(&q, filter);
  opts.nshortest = n;
  opts.unique = true;
  ShortestPath(ifst, ofst, &d, opts);
}

Upcoming ICASSP 2014 Paper Titles

Ths year’s ICASSP accepted paper list is viewable in the technical program. Neural networks are a huge force in speech recognition and this conferences has three sessions just on deep neural networks.
In this ICASSP there are many interesting titles about recurrent neural networks for non acoustic modeling and a few decoding related papers. Here is a list of the upcoming paper that I’m interested in so far.

  • Real-time one-pass decoding with recurrent neural network language model for speech recognition
    Takaaki Hori, Yotaro Kubo, Atsushi Nakamura (NTT Corporation, Japan)

  • Cache based Recurrent Neural Network Language Model Inference for First Pass Speech Recognition
    Zhiheng Huang, Geoffery Zweig, Benoit Dumoulin (Microsoft, USA)

  • Contextual Domain Classification in Spoken Language Understanding Systems Using Recurrent Neural Network
    Ruhi Sarikaya (Microsoft, USA)

  • ASR Error Detection using Recurrent Neural Network Language Model and Complementary ASR
    Yik-Cheung Tam, Yun Lei, ing Zheng, Wen Wang (Google, USA and SRI International, USA)

  • Recurrent Conditional Random Field for Language Understanding
    Kaisheng Yao Baolin Peng, Geoffery Zweig, Dong Yu, Xiaolong Li, Feng Gao (Microsoft, P.R. China)

  • Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
    Xunying Liu, Yongqiang Wang, Xie Chen, Mark Gales, Phil Woodland

  • Phone sequence modeling with recurrent neural networks
    Nicolas Boulanger-Lewandowski, Jasha Droppo, Mike Seltzer, Dong Yu (University of Montreal, Canada and Microsoft Research, USA)

  • Translating TED Speeches by Recurrent Neural Network based Translation Model
    Youzheng Wu, Hu Xinhui, Chiori Hori (NICT, Japan)

  • Reshaping Deep Neural Network for Fast Decoding by Node-pruning
    He Tianxing, Fan YuChen, Yanmin Qian, Tan Tian, Kai Yu (Shanghai Jiao Tong University, P.R. China

  • Accelerating Large Vocabulary Continuous Speech Recognition on Heterogeneous CPU-­GPU Platforms
    Jungsuk Kim, Ian Lane (Carnegie Mellon University, USA)

  • Progress in Dynamic Network Decoding
    David Nolden, Hagen Soltau, Hermann Ney (IBM and RWTH Aachen)

  • Gradient-Free Decoding Parameter Optimization on Automatic Speech Recognition
    Thach Le Nguyen, Daniel Stein, Michael Stadtschnitzer (IAIS Fraunhofer & Fraunhofer, Germany)

  • Multi-Stream Combination for LVCSR and Keyword Search on GPU-Accelerated Platform
    Wonkyum Lee, Jungsuk Kim, Ian Lane (Carnegie Mellon University, USA)

  • Accurate client-server based speech recognition keeping personal data on the client
    Munir Georges, Stephan Kanthak, Dietrich Klakow (Nuance, Germany and Saarland University, Germany)

Succinct Language Model Links

Succinct Language Model Links

Double Array Language Model

N-gram language models are essential components in ASR and MT system. A practical problem is using large models with limited memory. This paper proposes a double array structure and compares the memory usage and speed to kenlm. The slides and code are also available online

Both Watanabe and Sorenson have proposed succinct LM structures which are not mentioned in this paper and have freely available implementations.

Pyfst on OSX 10.9 Mavericks

These are the steps to install pyfst on Mavericks, if CPPFLAGS doesn’t work try CFLAGS or CXXFLAGS. Without the -stdlib=libstdc++ the Cython wrapper will not compile correclty

sudo easy_install pip
sudo pip install virtualenv
mkdir ~/pyfst
cd ~/pyfst
virutalenv .
source bin/activate
pip install ipython
export CPPFLAGS="-stdlib=libstdc++"
pip install pyfst

A Couple of Recent WFST/OpenFst Papers

Longer versions of previous conference papers:

Both papers use Lexicographic Semirings (and write OpenFST not OpenFst).

It seems to of the biggest areas of speech research is low resource speech recognition and spoken dialogue understanding using deep learning techniques.