When your recognition accuracy is not good enough, you still have a few things to be checked for improving the accuracy:

  • Speak clearly.  Don’t murmur.
  • Speak in a constant speed, like reading aloud.
  • Use in a quiet environment.  Suppress background noise.
  • Use a good microphone because PC-internal microphones often cause troubles.

You can see the recognition result in the log window of MMDAgent to see how your recognition result is.  The log window can be enabled by “D” key.

Please note that these methods are just general tips for speech recognition systems, and  sometimes do not work for you.

=== Japanese ========================================


  • 口を大きく動かしてはっきりと発音する.
  • 適度な速度で話す.
  • 静かな場所で使い,周りの雑音が入るのを抑える.
  • 外付けの性能の良いマイクを使う.




MMDAgent generates a speech recognition result as a sequence of keywords.  You can view how your speech input is being recognized in the log window which can be enabled by “D” key.

You can write a dialog scenario file (.fst) to perform some actions to the recognition result.  When you expect only one keyword in the result, the part of the dialog scenario file (.fst) should look like

  1     11     RECOG_EVENT_STOP|こんにちは    SYNTH_START|mei|mei_voice_normal|こんにちは
  (こんにちは means hello.)

When you want to set several keywords to be matched, they can be specified as follows:

  • For AND condition, in case you expect all of the keywords should be included in an utterance.  When multiple keywords are separated by comma like this, all of them should be matched.  Do not insert a space around commas!
  1     11     RECOG_EVENT_STOP|名古屋,天気 SYNTH_START|mei|mei_voice_normal|晴れ
  (名古屋 means Nagoya, 天気 means weather and 晴れ means sunny.)
  • For OR condition, in case you expect any of keywords to be matched. Equivalent words or synonyms can be specified by defining the same arcs with different keywords.  Note that, when several arcs are defined between the same pair of states, they will be evaluated in the order written in the dialog scenario file (.fst).
  1     11     RECOG_EVENT_STOP|昼ごはん  SYNTH_START|mei|mei_voice_normal|いいですね
  1     11     RECOG_EVENT_STOP|ランチ   SYNTH_START|mei|mei_voice_normal|いいですね
  1     11     RECOG_EVENT_STOP|昼食     SYNTH_START|mei|mei_voice_normal|いいですね
  (昼ごはん, ランチ and 昼食 mean lunch and いいですね means good.)

=== Japanese ========================================

MMDAgent では,音声認識の結果をキーワードの列で取得しています.「D」キーを押すと,ログを見ながら自分の発話がどのように認識されているかを確認できます.


  1     11     RECOG_EVENT_STOP|こんにちは    SYNTH_START|mei|mei_voice_normal|こんにちは


  • 複数のキーワードが全て含まれる時に反応させる場合,下記のように半角コンマ(,)で区切って複数のキーワードを記述することで,それら全てのキーワードが認識された時に反応します.半角コンマの直後にはスペース等を入れないでください.
  1     11     RECOG_EVENT_STOP|名古屋, 天気 SYNTH_START|mei|mei_voice_normal|晴れです.
  • 複数のキーワードのどれかが含まれる時に反応させる場合,下記のように複数のキーワードを記述することで,それらのキーワードのいずれかが認識された時に反応します.なお,音声対話スクリプト(.fst)の中である状態からの遷移が複数定義されている場合,上から順に評価されます.
  1     11     RECOG_EVENT_STOP|昼ごはん     SYNTH_START|mei|mei_voice_normal|Aランチがおすすめです.
  1     11     RECOG_EVENT_STOP|ランチ      SYNTH_START|mei|mei_voice_normal|Aランチがおすすめです.
  1     11     RECOG_EVENT_STOP|昼食       SYNTH_START|mei|mei_voice_normal|Aランチがおすすめです.