Running Multiple GPGPU Programs at The Same Time

The speech recognition decoder I work on has the ability to perform some of the computation on the graphics processing unit (GPU). I was recently asked if multiple instances of the decoder could access the GPU at the same time and if so what the performance penalty was. I knew multiple instances could access the GPU at the same time but was unsure of the performance cost.

To evaluate the slow down a single instance of decoder was run on a standard Japanese evaluation task using the corpus of spontaneous (CSJ). The models comprised of seta tri-gram language model compiled to an integrated weighted finite state transducer (WFST) and 3000 state acoustic model with 32 Gaussians per mixture. The same evaluation was then re-run but with two instances of the decoder performing the same evaluation at the same time. The times for the GPU simultaneous plot were calculated as the average of these concurrent runs. The beams widths of the decoder were altered to generate the below RTF vs accuracy plot.


There is a noticeable slowdown when two decoders are running at the same time. The next question is whether this is due to overall system loading or the concurrent accesses to the  GPU.The next experiment is a re-run with  the decoder running with the standard on-demand CPU (SSE accelerated) acoustic scoring.


Overall the slowdown for the CPU only decoder looks larger then the GPU accelerated decoder. Possible explanations are the GPU decoder might have lower memory bandwidth requirements as only acoustic model scores the entire sets of acoustic model parameters need to be moved across the memory buses.

The slowdown factor (simultaneous/exclusive) time is shown below and illustrates that the GPU accelerated decoder is less effected  when multiple decoder are running together. Because the slowdown factor is less than two it also shows it is more efficient to process the data in parallel across two slower decoders rather than all of the data though a single faster decoder. Another important aread is whether a a single multithreaded decoder can perform any better.



Finally for completeness the performance of all decoding runs are shown below.


Further things to consider would be running four decoders together on single machine, or running the decoder on a machine with multiple GPUs.

This entry was posted by Edobashira. Bookmark the permalink.

2 thoughts on “Running Multiple GPGPU Programs at The Same Time”

Leave a Reply

Note: only a member of this blog may post a comment.