Registered Member
|
I downloaded and installed Simon 0.4 for Windows 7, download and copy to a folder c:\Program Files\Simon\bin\ exe files HTK.
Download and install the shadow dictionary of the Russian language from here: Link. Imported language model from here: Link. I added to the active vocabulary of 40 words from the shadow dictionary and I trained them 4 times. When processing language model there is a error:
I'm doing something wrong or is this a bug of the program? P.S.: Sorry for my English. |
Registered Member
|
Maybe there is wrong dictionary?
Did you tried another version of programm and dictionaries? |
Registered Member
|
Maybe. I do not know.
I think that early versions of the program are even more unstable.
I found only this. |
Registered Member
|
Did you tried Simon on Linux ?
Maybe only windows-version has that bug. |
Registered Member
|
For linux I have too low IQ)
I want to start Simon under Windows. |
Moderator
|
Hi Ivan!
Simon dev here. Did you set up an appropriate grammar? If you haven't read it already, have a look at our manual! (I just set up the model with the Russian model you linked and it is indeed compatible to Simon 0.4) Best regards, Peter |
Registered Member
|
Hi Peter!
No. I'll try.
I read The Simon Handbook. Its best part)
I tested [EN/H4W/SPHINX] HUB4 WSJ 1.0 base model with the scenario [EN/H4W] of Mouse 0.1. It worked perfectly without any critical errors. But when I load the Russian base model with own scenario, an error occurs: "The recognition reported the following error:Failed to setup recognition: ". Peter, please test my basic model as adapted, and scenario under Windows 7. Download link |
Registered Member
|
Great progress! After I set up an appropriate grammar, I got two new error messages! But not at once, and after dictionary training.
My .kde folder: Download link |
Moderator
|
Hi!
I've had a similar problem from another user using adapted SPHINX base models on Windows. I think there could be a deeper issue there but as I'm not running Windows, it's hard to investigate. Could you please compress the %appdata%\.kde\tmp-* folder(s), upload and link the archive here so that I can take a look? Thanks. Best regards, Peter |
Registered Member
|
Link to the archive tmp folder
Link to the all .kde folder Click on the big black button with a strange Russian word) |
Moderator
|
Thank you Ivan. Because of your data files, I have finally been able to reproduce the problem and pinpoint the issue.
The reason why the compilation fails is that Simon names the training samples according to the words that are recorded. So if you record "Test" then the sample will be stored as something like "Test_<date>.wav". It also stores the transcription (in this case: "TEST") and other information in a text file (that is UTF-8 encoded). However, because Russian uses lots of special characters, those file names need to be encoded as well - and Windows does this with a local 8-bit character set (I presume - it doesn't appear to be UTF-8 and I haven't had time to look it up). In any case, the files are not found during the adaption because the file names do no longer match (due to the different encoding). I'm afraid, there is no fix that doesn't require a bit of coding and, more importantly, updated binaries. There are basically two work-arounds for the mean time: a.) Don't use special characters in your words. If you want to write the words out afterwards, you can always "link" the safe ascii-command to the UTF-8 text with e.g. a text-macro command. If you go this route, don't forget to remove your old samples (main screen > manage training data > clear training data). b.) Keep everything the same but manually re-name the training samples later to a "safe" (ASCII) file name. You can find the samples in the path "%appdata%\.kde\share\apps\simon\model\training.data". Afterwards, you also need to re-write the "prompts" file in "%appdata%\.kde\share\apps\simon\model" to reflect the changed file names. Then, update the "TrainingDate" field in "%appdata%\.kde\share\apps\simon\model\modelsrcrc" to the current date / time to let Simon know about the changes to trigger a synchronization. Don't forget to quit Simon before you do this! I've also added a ticket at https://bugs.kde.org/show_bug.cgi?id=315460. You can add yourself to the CC list if you want to get a message as soon as the bug is fixed. Sorry for the inconvenience. Best regards, Peter Edit: Some additional information: The Linux version should generally be not affected as most distributions use UTF-8 as the default file system encoding. Using a static base model of course also avoids this problem altogether. The HTK backend already includes the appropriate workarounds for Windows. However, you'd have to check if someone built and released a Russian HTK model. If not, you could still use a user-generated model as well. |
Registered Member
|
Thanks!!!
You're a real digital magician! Simon worked! According to your advice I transforming Russian letters to a Latin. |
Registered users: Bing [Bot], Google [Bot], Sogou [Bot]