This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Simon: Problems while compiling the speech model

Tags: None
(comma "," separated)
clemensf
Registered Member
Posts
7
Karma
0
Hello everyone,

Simon has problems to compile the speech model after I upgraded to 13,04. I realized that it takes simon a lot of time to compile a speech model after a training and it doesn´t come to an ending. The training was 54 words long, so I just thought that this is normal. I started Simond (in my case KSimond) in the bash and realized that that Simon hat the same error all of the time:
Code: Select all
Analyzing file:  "/tmp/kde-clemens/simond/default/compile/sphinx//default{2caec4ff-03fd-4285-b22e-c7728e6a4814}/etc/default{2caec4ff-03fd-4285-b22e-c7728e6a4814}.jsgf"
Analyzing file:  "/tmp/kde-clemens/simond/default/compile/sphinx//default{2caec4ff-03fd-4285-b22e-c7728e6a4814}/etc/default{2caec4ff-03fd-4285-b22e-c7728e6a4814}.dic"
Analyzing file:  "/tmp/kde-clemens/simond/default/compile/sphinx//default{2caec4ff-03fd-4285-b22e-c7728e6a4814}/etc/default{2caec4ff-03fd-4285-b22e-c7728e6a4814}_train.transcription"
Analyzing file:  "/tmp/kde-clemens/simond/default/compile/sphinx//default{2caec4ff-03fd-4285-b22e-c7728e6a4814}/etc/default{2caec4ff-03fd-4285-b22e-c7728e6a4814}_train.fileids"
Analyzing file:  "/tmp/kde-clemens/simond/default/compile/sphinx//default{2caec4ff-03fd-4285-b22e-c7728e6a4814}/etc/default{2caec4ff-03fd-4285-b22e-c7728e6a4814}.phone"

simond(3869) ModelCompilationManagerSPHINX::run: Model compilation failed for user  "default"


This message goes on forever (until you kill the process). Here is some information about my system:
Lubuntu 13.04 64-Bit
Kernel 3.8.0-21-generic
Simon 0.4.0
sphinxtrain 1.0.8
spinxbase 0.8
pocketsphinx 0.8

I reinstalled Simon, the Sphinx components and julius but it didn´t solve the problem. I also tried to install HTK instead of Sphinx but I get some errors while compiling.

Greetings
bedahr
Moderator
Posts
141
Karma
0
OS
Without more information, this is hard to diagnose.

Can you possibly tar up /tmp/kde-clemens/simond/default/compile/sphinx/ after a failed compilation and upload it somewhere?

Thanks.

Best regards,
Peter
clemensf
Registered Member
Posts
7
Karma
0
Hello,

default-82ce3268-dbc8-43f2-a648-63f1b193e1b2-.tar
. This is the link to the file you asked. To be honest, I don´t understand much of the content of the folder.

Greetings
bedahr
Moderator
Posts
141
Karma
0
OS
Thanks for the data, this was extremely helpful.

You uncovered some crude dictionary handling in the sphinx adaption engine that caused it to mess up when confronted with the German "ß" (because toLower(toUpper("ß")) != "ß").

I was able to reproduce your errors with the German "Tastatur" scenario and fixed them so that everything now builds and works fine for me.

If you want, you can build Simon from git and it should "just work" (it would be great if you would try that and confirm).
However, the changes will also land in Simon 0.4.1 to be released next Monday.

In any case, thank you again, this report really helped a lot as I probably wouldn't have noticed that before 0.4.1.

Best regards,
Peter
clemensf
Registered Member
Posts
7
Karma
0
Hello,

thank you very much of your effort. I the code from Sourceforge with git but I get an error message while compiling simon:
Code: Select all
./build_ubuntu.sh
-- The C compiler identification is GNU 4.7.3
-- The CXX compiler identification is GNU 4.7.3
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CMake Error at cmake/FindZLIB.cmake:25 (MESSAGE):
  Could not find ZLIB
Call Stack (most recent call first):
  julius/libsent/CMakeLists.txt:3 (find_package)


-- Configuring incomplete, errors occurred!
touch: »./julius/gramtools/mkdfa/mkfa-1.44-flex/*“ kann nicht berührt werden: Datei oder Verzeichnis nicht gefunden

I used the following adress:
"git clone git://speech2text.git.sourceforge.net/g ... peech2text speech2text"
I cloned the code two times, it's unlikely that I have a corrupted file. The error log might be usefull, I just wait till Monday. Thank you a lot anyway

Greetings,
Clemens
bedahr
Moderator
Posts
141
Karma
0
OS
The current Simon version is to be found on git://anongit.kde.org/simon.git - not on Sourceforge.

However the error clearly states that you are missing zlib (or its development files).

From this, I take it that you were using the binary packages? Those will probably not be released on Monday; I have no influence on when Ubuntu will ship updated binaries, I can only tell you when the new version (in source form) will be available.

Best regards,
Peter
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
You need to install the appropriate "ZLib" development package using your package manager. On openSUSE at least, this is called "zlib-devel", but may differ on other distributions.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
clemensf
Registered Member
Posts
7
Karma
0
Hello everyone,

thank you all very much for your effort. I cloned he program with git from "git://anongit.kde.org/simon.git". There were no problems while compiling the programm but I got problems again while compiling the speech model. I get the same output in the teriminal for three times. after the first and the second try is a message "try again: true". after the third time "try again: false". There is also a new error window in simon popping up. I gues this is a new improvement in the new version. It says the following text:
While compiling the model, the server responded with the following error (this line was translated from german):
Too little training material available.

Please train your acoustic model by recording samples.

Details

/usr/local/bin/sphinxtrain -t default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75} setup
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Setting up the database default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}

/usr/local/bin/sphinxtrain run
MODULE: 000 Computing feature from audio files
Feature extraction is done
MODULE: 00 verify training files
Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
Found 69 words using 43 phones
Phase 2: Checking to make sure there are not duplicate entries in the dictionary
WARNING: This word (AUSWÄHLEN(2)) has duplicate entries in (/tmp/kde-clemens/simond/default/compile/sphinx/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}/etc/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}.dic)
WARNING: This word (SYSTEM(2)) has duplicate entries in (/tmp/kde-clemens/simond/default/compile/sphinx/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}/etc/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}.dic)
Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
Phase 4: Checking number of lines in the transcript file should match lines in fileids file
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0442805555555556
This is a small amount of data, no comment at this time
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 66
Words in filler dictionary: 3
Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
MODULE: 0000 train grapheme-to-phoneme model
Skipped (set $CFG_G2P_MODEL = 'yes' to enable)
Feature type is s2_4x which is 4 streams
LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
Skipping LDA training
Feature type is s2_4x which is 4 streams
LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
Skipping MLLT training
MODULE: 05 Vector Quantization
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check the log file for details.
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Baum-Welch iteration 1 Average log-likelihood 14.5827677059156
Baum-Welch iteration 2 Average log-likelihood 23.7236246157707
Baum-Welch iteration 4 Average log-likelihood 26.3370052066997
Baum-Welch iteration 5 Average log-likelihood 26.6560378897183
Baum-Welch iteration 7 Average log-likelihood 27.0306818894674
Baum-Welch iteration 9 Average log-likelihood 27.2602408882755
Training completed after 10 iterations
MODULE: 30 Training Context Dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Initialization
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 3: Forward-Backward
Training failed in iteration 1
MODULE: 40 Build Trees
Phase 1: Cleaning up old log files...
Phase 2: Make Questions
Phase 3: Tree building
Processing each phone with each state
Skipping SIL
MODULE: 45 Prune Trees
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 1: Tree Pruning
This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 2: State Tying
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
MODULE: 50 Training Context dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Copy CI to CD initialize
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 3: Forward-Backward
Training failed in iteration 1
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Phase 1: Cleaning up directories: logs...
Phase 2: Doing interpolation...
This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 3: Dumping senones for PocketSphinx...
This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
MODULE: DECODE Decoding using models previously trained
Aligning results to find error rate
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Running the training

Failed to copy /tmp/kde-clemens/simond/default/compile/sphinx/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}/model_architecture/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}.200.mdef to /tmp/kde-clemens/simond/default/compile/sphinx/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}/model_parameters/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}.cd_semi_200_delinterp/mdef: Datei oder Verzeichnis nicht gefunden at /usr/local/lib/sphinxtrain/scripts/90.deleted_interpolation/deleted_interpolation.pl line 110.
Can't open /tmp/kde-clemens/simond/default/compile/sphinx/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}/result/default{c919283b-3a85-4ec9-98f9-9c4c8df8dd75}-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.


Good night,
Clemens
bedahr
Moderator
Posts
141
Karma
0
OS
Nope, I can't reproduce that at all here.
Please send me another archive of your current /tmp/kde-clemens/simond/default/compile/sphinx/

Thanks.

Best regards,
Peter
clemensf
Registered Member
Posts
7
Karma
0
Hello,

this time it´s only 2 Mb big. Hope you find what you search: default-dfccb977-caea-4e65-ad55-7ad7555212b8-.tar

Greetings,
Clemens
bedahr
Moderator
Posts
141
Karma
0
OS
Thanks. I did.

Please try again.

Best regards,
Peter
clemensf
Registered Member
Posts
7
Karma
0
Hello,

thank you a lot. It works now. The strange thing is that it only works in the third try. It takes a bit long.

Greetings,
Clemens

P.S.: The simon Webistes show a MySQL error on their pages. Don´t know if you are the admin of these pages
bedahr
Moderator
Posts
141
Karma
0
OS
Perfect.
That it works only on the third try is actually expected.

Here's the deal: The keyboard scenario is a bit tricky because it's rather large (lots of words) and we want it to work asap (little training data).
So we include alternate pronunciations in the scenario (e.g.: "Sieben" may be pronounced either like it is written or as "Siebn").

While compiling, the forced alignment will pick the better fitting of those pronunciations and train the model based on that. Because the model is too small to get good phoneme coverage the not-picked transcriptions will cause phonemes that are in the model but have not been trained (e.g. the nasal "n" from the word "Siebn" may not be be observed in your training data). This causes the training to fail.

But, Simon picks up on the error, and fixes it by removing the missing phoneme (and with it the alternate transcription that wasn't picked anyways). It then starts the compilation again.
In your case, you have two of such "errors" in succession, which is why it takes a little while.

The removal of the alternate transcriptions is temporary, btw. So if you train the alternate pronunciation sometime in the future, the full set of possible transcriptions will be considered.

Best regards,
Peter

Ps.: simon-listens.org is the homepage of the organization Simon Listens. I don't have access there as I am no longer a member. If you're looking for updates on Simon, you may want have a look at the Simon website: http://simon.kde.org


Bookmarks



Who is online

Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell