Lookahead Composition in Openfst 1.2

Openfst 1.2 was just released. One of the big addition is the lookahead composition filters, more details can be found in [2] and [3]. Get the latest source from here . By default the lookahead filters are not built so run configure with all the options.

configure --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt

Here is quick example of using the lookahead filters to compose  two speech recognition transducers, a lexicon L and and language model G. Using these filter it is no possible to determinize the lexicon before composition with G. The advantage is the costly determinization of LG is avoided.

These are the steps to perform an offline lookahead compostion. The below assumes a closed lexicon L with auxillary symbols and and n-gram language model G constructed according to [1]. In this case L has 563232 of states and 656819 arcs and G has 997740 states and 4764980 arcs.

First determinize and convert the lexicon to an olabel_lookahead type fst. The save_olabel_pairs options although not shown in the fstconvert help output is valid option for the command.

fstdeterminize l.fst | fstconvert –fst_type=olabel_lookahead  –save_relabel_opairs=relabel.txt > lookahead.l.fst

Next relabel the input of the language model to match the lookahead lexicon. The arcsort command will sort the input of language model. The sorting is essential for efficient composition in the following stage.

fstrelabel -relabel_ipairs=relabeltxt g.fst| fstarcsort > relabel.g.fst

Finally compose the L and G machines togeather. The compose tool can automatically detect that the L transducer is a lookahead type and select the optimized filter.

fstcompose lookahead.l.fst sorted.g.fst > lg.fst

Here is the output of fstinfo for lg.fst

# of states 5863514
# of arcs 9630754

# of final states 1
# of input/output epsilons 1414832
# of input epsilons 1904971
# of output epsilons 5863511
# of accessible states 5863514
# of coaccessible states 5863514
# of connected states 5863514
# of strongly conn components 150128

 

For a comparison a more standard  det(LG) cascade.

# of states 6294455
# of arcs 10053167
# of final states 1
# of input/output epsilons 1995247
# of input epsilons 1995247
# of output epsilons 6302523
# of accessible states 6294455
# of coaccessible states 6294455
# of connected states 6294455
# of strongly conn components 168471

The difference in size is party due to the epsilon closure in the non-determinized L transducers.

 

 

[1]  Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley.
Speech recognition with weighted finite-state transducers.
In Larry Rabiner and Fred Juang, editors, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition. Springer-Verlag, Heidelberg, Germany, 2008.

[2] Cyril Allauzen, Michael Riley, Johan Schalkwyk, A Generalized Composition Algorithm for Weighted Finite-State Transducers, Interspeech 2009.

[3] Cyril Allauzen, Michael Riley and Johan Schalkwyk, Filters for Efficient Composition of Weighted Finite-State Transducer", Proceedings of the Fifteenth International Conference on Implementation and Application of Automata, (CIAA 2010), Winnipeg, MB.

Importance of Sorting when Composing WFSTs

For efficient composition of large WFSTs it is essential to correctly the sort the input and output labels of the components WFSTs. When performing the composition A○B, sorting A by output label and B by input label should give the fastest composition. The OpenFst documentation states that only one transducer need to be sorted and it is preferential to sort the transducer with the largest out-degree.

Below is a quick comparison of the  composition speed when the joining dimensions of the two FSTs are orders differently.

The first composition pair is a speech recognition lexicon L and a backoff n-gram language mode G.

L - Sort G - Sort Time (seconds)
None None Error
None Input 1322
Input None Error
Input Input 1399
Output None 14
Output Input 14

Error is when the OpenFst composition command detects an FST is not not sorted and terminates. Here we observe that it is essential to sort the output side of the lexicon for fast composition.

The previous LG combination was defeminized and composed with a context dependency transducer C.

C - Sort det(LG) Sort Time (seconds)
None Output 14
Output Output 14
Input Output Error
NONE Input 14
Output Input 14
Input Input 24

For these specific transducers there is not much difference but incorrectly sorting C by input gives a slower composition.

Translate and Speak Test

Translate and speak test

Using the Boost code facet for reading UTF8

C++ has code facets which make it possible to read UTF8 or another text encoding into the wstring containers. I couldn’t find much information on how to implement the facet or any sample code apart a utf-8 facet implementation in the Boost libraries.

The below minimal sample will read from a text file (named utf8.txt). For example if the text file utf8.txt contains a single line with the text “日本語”. When reading using standard streams the utf8 is treated a streams of bytes and report a longer length of 9. When reading the stream the

To print the string or individual utf8 chars will require converted back to utf-8 or terminal that support UTF16/32 (nw UCS-16/32??)

Tested on Windows and Linux. The Fedora boost package didn’t include the utf8_codecvt_facet.cpp file. The file can be obtained here

#include <iostream>

#include <fstream>

#include <locale> 

 

#define BOOST_UTF8_BEGIN_NAMESPACE

#define BOOST_UTF8_END_NAMESPACE

#define BOOST_UTF8_DECL

 

#include <boost/detail/utf8_codecvt_facet.hpp>

 

//This file wasn't present in Fedora boost package

// but was under Windows

//#include <libs/detail/utf8_codecvt_facet.cpp>

#include "utf8_codecvt_facet.cpp"

 

 

int main(int argc, char** argv)

{        

    std::wifstream wifs("utf8.txt");

    wifs.imbue(std::locale(std::locale(), new utf8_codecvt_facet));

    std::wstring wstr;

    wifs >> wstr;

    wifs.close();

    std::cout << "wstr.length()=" << wstr.length() << std::endl;

    std::ifstream ifs("utf8.txt");

    std::string str;

    ifs >> str;

    std::cout << "str.length()=" << str.length() << std::endl;

    ifs.close();

    return 0;

}

import site failed; use -v for traceback

Kept getting this message when I ran the python interpreter under Windows

`import site failed; use -v for traceback

Turns out the PYTHONPATH environment variable was damaged. Fix by right My Computer –> Properties (or Windows key + break) –> Advanced –> Environment Variables

Edit (or New)

Variable name = PYTHONPATH

Variable value =C:\PYTHON26;C:\PYTHON26\DLLS;C:\PYTHON26\LIB;

Running Multiple GPGPU Programs at The Same Time

The speech recognition decoder I work on has the ability to perform some of the computation on the graphics processing unit (GPU). I was recently asked if multiple instances of the decoder could access the GPU at the same time and if so what the performance penalty was. I knew multiple instances could access the GPU at the same time but was unsure of the performance cost.

To evaluate the slow down a single instance of decoder was run on a standard Japanese evaluation task using the corpus of spontaneous (CSJ). The models comprised of seta tri-gram language model compiled to an integrated weighted finite state transducer (WFST) and 3000 state acoustic model with 32 Gaussians per mixture. The same evaluation was then re-run but with two instances of the decoder performing the same evaluation at the same time. The times for the GPU simultaneous plot were calculated as the average of these concurrent runs. The beams widths of the decoder were altered to generate the below RTF vs accuracy plot.

image

There is a noticeable slowdown when two decoders are running at the same time. The next question is whether this is due to overall system loading or the concurrent accesses to the  GPU.The next experiment is a re-run with  the decoder running with the standard on-demand CPU (SSE accelerated) acoustic scoring.

image

Overall the slowdown for the CPU only decoder looks larger then the GPU accelerated decoder. Possible explanations are the GPU decoder might have lower memory bandwidth requirements as only acoustic model scores the entire sets of acoustic model parameters need to be moved across the memory buses.

The slowdown factor (simultaneous/exclusive) time is shown below and illustrates that the GPU accelerated decoder is less effected  when multiple decoder are running together. Because the slowdown factor is less than two it also shows it is more efficient to process the data in parallel across two slower decoders rather than all of the data though a single faster decoder. Another important aread is whether a a single multithreaded decoder can perform any better.

image

 

Finally for completeness the performance of all decoding runs are shown below.

image

Further things to consider would be running four decoders together on single machine, or running the decoder on a machine with multiple GPUs.

Compiling HTS Engine for the CLR

For a project I would really like to be able to get HTS Engine to run Silverlight. Managed c++ isn’t supported in Silverlight, I’m trying to think of other ways to compile it to MSIL.

The first technique I have tried is to compile it as managed c++ and then to try and convert the MSIL to C# and recompile again for Silverlight. It seems as this won’t work because of the dependencies on unmanaged functions such as fopen. However it might be useful for someone who wants to create a .NET compatible library for use in .NET programs on the desktop CLT

  1. First grab the hts_engine source code and unpack
  2. Rename the c file to cpp
  3. Edit the makefile.mak in lib and bin dirs and add the /CLR CFLAG and change /TC to /TP. In the bin/makefile.mak change .c to .cpp.
  4. Copy HTS_engine.h into the lib and bin dirs
  5. Open the Visual Studio command prompt and cd into the hts_engine dir
  6. nmake –f makefile.mak all

If the process worked you can verify it is managed compatible by looking at hts_engine.exe in ildasm

image

Other options are I though to try next for a fully managed port are:

  • GCC4NET
  • A C++ to C# converter
  • MSIL backend for LLVM

Getting Lidia Mangu’s Consensus Code to Run on a Recent Machine

I just noticed that confusion network code is available on line. The code is a little bit vintage these are the steps I took to get it to run on Fedora Core 10.

  1. Locate an an old linux machine (or install one in a VM) with a sub gcc 3 installed, mine was gcc296.
  2. Unpack the code and in the src directory make the changes to the following files.
  3. Comment out the line 25 in GetOpt.cc //extern "C" void *__builtin_alloca(...);
  4. Comment out line 35 in Zio.cc  //extern int errno;
  5. In CLP/src type the following: 
  6. make -f Make.depend depend 
  7. Edit the makefile based on you compiler location, I added static linkage to the move the binary too machine. CXX = /usr/bin/g++296 CC = /usr/bin/gcc296 LDLIBS = -lm –static
  8. Now build the code with make Consensus

If it worked the CLP/bin dir should contain the program called Consensus . Copied everything to an F10 machines and successfully by by:

cd List

../bin/Consensus -i latlist -R ../data/prons

The produced the output

OKAY GOOD (sw2121A-ws97-l-0001)

The –R and –i and command are essential if they are missing or invalid the application segmentation faults.

Unable to load DLL The specified module could not be found. (Exception from HRESULT: 0x8007007E)

Kept getting this error when pinvoking a c dll from a c#. The outside error was just it couldn’t find the unmanaged dll. Which wasn’t very helpful as it was definitely in the same dir as the .NET exe. Even after putting the unmanaged dll in System32 the error persisted.

It turns out if the unmanaged dll has a dependency on another dll which was missing but it will throw the standard can’t find file error but not give the name of the next dll in the dependency chain. The inside message was “Unable to load DLL The specified module could not be found. (Exception from HRESULT: 0x8007007E)”

Dependency walker can be used on standard dll to display dependency information, unfortunately it doesn’t visual studio anymore so grab a copy here http://dependencywalker.com/ . Open the unmanaged dll and see are other dependences.

In my case libgvc-4.dll was present but C# app was throwing errors like it missing. Dependency walker shows the missing decencies in yellow as shown below. After adding the missing dlls the problem was solved.

image

PDF Kanji Dictionary for the Pre-2010 JLPT Level 2

A few years ago I was really into studying Japanese and interested in dictionaries and typesetting. I built a system to typeset Japanese Kanji dictionaries, but for the last few years I haven’t working on the system. One of the dictionaries I made was based on the Kanji and vocabulary which appears in the (pre-2010) JLPT level 2 examination. Quite a few people have asked about the dictionary so I am making a version available to download. Below is example entry from the dictionary (The pdf version is much higher quality).

Pages from JLPTDict2

I’m also making another version available that features the animations from www.kanjireactor.com embedded right into the document. When I was using Adobe 6.0 full version I discovered that it was possible to embed Flash applets. The possibility of creating beautiful printing documents that could offer interactive and advanced program like functionality when viewed on screen really interested me. For a kanji dictionary it would allow for stroke order diagrams to be displayed with full animation it allows for the direction to displayed in very clean manner. I’ve successfully viewed the animations in Acrobat readers 9.2 and 9.3 with Japanese language supported installed and I have Flash Player 10 on Windows. The animation does not work with MacOSX native pdf viewer and also do not seem to work under Ubuntu.

When viewing the document Acrobat will first prompt for permission to allow embedded content, after a small delay a the animation like the one below should appear in the document. Single clicking on the character should start the animation. After allowing the embedded content there is normally a slight delay, scrolling the document or clicking around seems to speed things up.

The files can be found are available at www.kanjijisho.com

Flite+HTS Engine for Flash - An HMM Speech Synthesizer Running Entirely in Adobe Flash

Flite+HTS Engine for Flash is an Hidden Markov Model (HMM)-based Speech Synthesis system (HTS) running entirely in Flash. The idea to make HTS run in any modern browser and allow dynamic text-to-speech to be added to any webpage with the entire synthesis performed on the client.

It requires Flash version 10 to run. If the above applet doesn’t load, a demo is also available here. It’s tested on Windows and Mac although I’ve seen it throw exceptions a few times.

I have wanted to learn more about HTS and try out the engine for some time. Installing under Windows requires building from source on Cygwin. After seeing Quake running in the browser, I thought it would be an interesting to try and get the the HTS engine run in Flash and make it widely accessible. Instead of re-writing the entire system for Flash I first took the HTS_Engine and Flite code and compiled it to a Flash component (swc) by using the Alchemy compiler. With the core engine compiled I used FlashDevelop to build a small interface, I looked at QuakeFlash to understand how to perform the interoping. The UI elements are from the Minimalcomps library and it also makes use of the audio playback from Standingwave .

By far the trickiest part was getting the audio to play in the Flash applet. First, the ByteArray for moving data between the swc seems to default to big-endian (thanks Heiga Zen). Second, the Flash player was unable output 16K audio ad I couldn’t get HTS to increase the sample rate to 44.1K without a pitching up the speech. The current implementation uses a very crude up-sampling technique, this is why the pitch of the voice may sound different when playing back through the applet.

Assuming HTS engine has the functionality underneath. I think it should be possible to perform streaming output and write the buffer to Flash as soon as speech is ready. Again if its possible to adjust engine parameters without having to re-initialize, adding the ability to adjust the engine parameters would be a nice way of exploring HTS.

Code for the demo is available on Github https://github.com/edobashira/Flite-hts_engine-for-Flash

FSMJS

On Google code Georg Jaehnig has created a finite state machine library in Javascript called fsmjs. It covers the algorithm from the Mohri paper Weighted automata algorithms. The are demos of fsmjs here. I was playing around with it and wondered if I could compile it to the .net framework using the JScript.NET compiler, I made a few changes and got it to compile and run! The modified file is here.

To compile it at the Visual studio command prompt type 

jsc fsm.js

The default demo is will perform a star closure on toy machine and dump the results in dot format.

fsm.exe | dot –Tpdf > test.pdf

Will create a pdf of the machine.

The JScript.Net compiler doesn't seem to like globals so I can't currently compile it to a library,  but if it can be compiled to .net library  it should be possible to call it  from .net programs.

Openkernel and far style files for OpenFst

Just noticed there is a new version of OpenKernel that came out on 11th January 2010. Spotted some nice code in the last version so I was eager to check this version. My expectations were correct, extracting the archive revelled a directory named far directory. Could this by the compatibility with the ATT far format?

First, to compile I had to grab the icu 4.0 librrary from here as it wasn’t available on yum. These are that far commands that avaliable:

  • farextract – Seems to be the same as the old ATT farsplit command
  • farinfo
  • farprintstrings
  • farcompilestrings
  • farcreate – Not in ATT tool allows compiled fsts to merged in a far file

Quickly tried it out with a far file created from ATT tool ands it is not compatible (kind of obvious really). However, it’s nice to have far like files for OpenFst. One thing I noticed when checking out the commands is that the output file must be specified it doesn't default standard output.

Now to update my fstcount utility to support the OpenFst far format.