Flite+HTS Engine for Flash - An HMM Speech Synthesizer Running Entirely in Adobe Flash

Flite+HTS Engine for Flash is an Hidden Markov Model (HMM)-based Speech Synthesis system (HTS) running entirely in Flash. The idea to make HTS run in any modern browser and allow dynamic text-to-speech to be added to any webpage with the entire synthesis performed on the client.

It requires Flash version 10 to run. If the above applet doesn’t load, a demo is also available here. It’s tested on Windows and Mac although I’ve seen it throw exceptions a few times.

I have wanted to learn more about HTS and try out the engine for some time. Installing under Windows requires building from source on Cygwin. After seeing Quake running in the browser, I thought it would be an interesting to try and get the the HTS engine run in Flash and make it widely accessible. Instead of re-writing the entire system for Flash I first took the HTS_Engine and Flite code and compiled it to a Flash component (swc) by using the Alchemy compiler. With the core engine compiled I used FlashDevelop to build a small interface, I looked at QuakeFlash to understand how to perform the interoping. The UI elements are from the Minimalcomps library and it also makes use of the audio playback from Standingwave .

By far the trickiest part was getting the audio to play in the Flash applet. First, the ByteArray for moving data between the swc seems to default to big-endian (thanks Heiga Zen). Second, the Flash player was unable output 16K audio ad I couldn’t get HTS to increase the sample rate to 44.1K without a pitching up the speech. The current implementation uses a very crude up-sampling technique, this is why the pitch of the voice may sound different when playing back through the applet.

Assuming HTS engine has the functionality underneath. I think it should be possible to perform streaming output and write the buffer to Flash as soon as speech is ready. Again if its possible to adjust engine parameters without having to re-initialize, adding the ability to adjust the engine parameters would be a nice way of exploring HTS.

Code for the demo is available on Github https://github.com/edobashira/Flite-hts_engine-for-Flash

This entry was posted by Edobashira. Bookmark the permalink.

19 thoughts on “Flite+HTS Engine for Flash - An HMM Speech Synthesizer Running Entirely in Adobe Flash”

  1. I think this is awesome. Do you know what the effort would be to get this working in non-english languages as well? Congrats are in place any which way!

  2. Thanks for the comment. The hts_engine component should be pretty easy because it's language independent, and only require changing the embedded resource files to the new voice. The Flite component which performs the text analysis may be more difficult as I think it is hardwired to English.

  3. Thanks so much for sharing this! What an incredible application - I've already started playing with it and can so see many possibilities!

    I do have one quick question -- I've noticed that certain words cause the text to speech engine to render very garbled audio. For example, type the word "with" or "down" into the demo movie on your site. The garbling also happens when you end a sentence with a word like that (i.e. "the system is down" or "let's go with"). I was wondering if you had any thoughts on this. I tried installing another series of language files but that didn't seem to help. Any guidance would be most appreciated!

    Thanks again for everything -- this is truly amazing!

  4. Thanks for the comment.

    One thing that could be causing the garbled audio problems.
    There were some limitations in the sampling rates the Flash player would accept. I used some very crude up-sampling where I just repeat samples in Main.as and there is still a mismatch between the sample rate the engine generates and the Flash player sampling rate. Perhaps this could be causing corruption.

    If you listen the Flash version to the standalone Flite+hts_engine it's possible to here the difference in pitch. Does the standalone Flite+hts_engine sound fine for the same text input?

  5. Hello, I would like to know how can I use a different voice. I downloaded a spanish voice ("cstr_upc_upm_spanish_hts"), that appears to have nearly the same files as the included one. I embedded them in the application, but I always get an "Alchemy Exit" error, without more information.

    Could you write a short tutorial about how to use other voices?
    That would help a lot.

    By the way, this utility is AMAZING! :O

  6. Chúng tôi chuyên nhận vận chuyển hàng hóa nội địa hiện nay. Chúng tôi xin giới thiệu với bạn các dịch vụ vận chuyển, giao hàng uy tín để phục vụ nhu cầu Tết của quý khách hàng. Cụ thể chúng tôi sẽ cung cấp dịch vụ vận chuyển gửi quà tết. Chúng tôi sẽ giúp bạn vận chuyển hàng hóa đến tay người thân, bạn bè ở xa một cách nhanh chóng nhất. Đảm bảo giá cả hợp lý chất lượng dịch vụ tuyệt vời. Ngoài ra chúng tôi còn cung cấp nhiều dịch vụ khác như dịch vụ ship hàng cod, giao hàng cho shop, dịch vụ chuyển phát nhanh uy tín,... Nếu cần chuyển hàng hãy nhớ liên hệ với chúng tôi nhé.

  7. Đến với dịch vụ thiết kế web giá rẻ tại MAINISAN quý khách sẽ được thiết kế web trọn gói theo yêu cầu, khi tìm trên google website của bạn sẽ là mẫu Thiết kế web chuyên nghiệp giá rẻ, chúng tối chuyên thiết kế website theo mẫu đến khi hoàn thiện.
    Bước đầu thiết kế web tại Blog thiết kế web bạn nên chọn đơn vị thiết kế website có giao diện mobi, hệ quản trị đơn giản dễ sử dụng.Chúng tôi không cạnh tranh về giá chỉ cạnh tranh về chất lượng, tuy nhiên MAINISAN.COM cũng thường xuyên có khuyến mãi giảm giá khi thiết kế web trọn gói để đáp ứng nhu cầu quý khách.

Leave a Reply

Note: only a member of this blog may post a comment.