|
steps towards German ASR
timo | weblog | Thu May 15
Klotz hin Gnubbel This is what I get with the current acoustic model and a LM that was even trained including the correct sentence (und füge es ein in den Bauch des Elephanten). Even using only just the correct sentence as a grammar returns und füge es, instead of the complete sentence. The alignment shows, that es is supposedly spans the complete ein in den Bauch des. I read that the current models are severely overtrained on one speaker, so I tried one of his utterances (de43-01, die Anwendung wird entwickelt) which is correctly understood if I use it as a grammar (effectively resulting in forced alignment) and which results in the beautiful phantasie wird entwickelt if I include this one sentence in the statistical LM as above. Thus, the bad results are probably due to the bad acoustic model. I've already uploaded the PentoNamingCorpus to Voxforge, thus hopefully, acoustic models will improve eventually. But if bad comes to worse, we'll have to train based on KCoRS and Verbmobil...
|
|
eclipse plugins
timo | weblog | Wed Apr 16
I am currently investigating the ton of classes that implement the Sphinx interface "SearchSpace", or one of the three sub-interfaces. There are 19 in total and I am likely to have to add another one for the feature that I have in mind. Anyway, I decided that I need something, preferably an Eclipse-plugin to visualize class dependencies, and there are actually a few options: - X-Ray would probably do the job, but it doesn't work. Maybe I just don't know how to install it correctly.
- Byecycle have a great screencast on their page and more friendly installation instructions. It shows a dependency graph between classes and automatically and incrementally optimizes the graph layout. Infinitely. Using 20% of your processor(s). It's quite slow and it seems to be limited to only show dependencies within the package, while the dependencies I'm interested in often cross dependency boundaries (classes from different packages implementing an interface).
Also, I found Fat Jar, an Eclipse plugin that turns your whole project into a single jar. That's something my collegue asked my about the other day.
|
|
maɪ̯ iːpaː tɛkstfiːld
timo | weblog | Sat Apr 12
This has been programmed before, but here it is for you to see (and use): IPATextField, a simple descendant of JTextField that will only accept phonetic input (either SAMPA or IPA if you know your uni codes by heart) and show IPA symbols. You can try it out directly, as a main routine is included. It's even useful as your tiny copy-and-paste-IPA-editor.
|
|
Higgins
timo | page | Fri Apr 11
Higgins-Howto - Mozart über Synaptic installieren:
- mozart
- mozart-gtk
- mozart-stdlib
- Higgins aus svn+ssh://helios/projekte/inpro/INPRO_SVN/Code/Higgins auschecken.
- export HIGGINS_RUN=<pfad-zu-higgins> (am besten gleich in die .bashrc schreiben, dann weiß es jedes neu geöffnete terminal)
- ozengine launcher/Launcher.exe Potsdam/number.higgins
- von da aus die einzelnen Module starten (die Inspektoren gehen noch nicht)
- für die Sphinx-Anbindung:
- svn update in Eclipse (die Pakete heißen de.cocolab.inpro.higgins.*)
- TODO: Build-Pfad anpassen, noch irgendwas einbinden
- voilà
|
|
hours 20080407
timo | page | Wed Apr 09
- Meeting notes 2008-04-07
- present: Michaela, David, Timo
- TOP 1: Timo hat sich auf keine Professur beworben und wird das auch in vorerst nicht tun.
- TOP 2: roadmap until july
- working (albeit simple) automatic system
- ASR, some slot filling
- dialogue management
- domain reasoning
- information fission (concept to action)
- concept to GUI action, GUI control
- working (more complex) WOZ-system
- pento server (GUI control)
- wizard guidelines detailing abilities of the simulated system
- modules shared by both tasks
- TTS (actually, concept to speech)
- pento-client
- TOP 3: variables to control domain complexity
- user
- has solution?
- elephant priming?
- task
- shared visual space?
- tiles visible (or system describes which tile comes next, together with unnumbered solution)
- board visible?
- random placement of tiles?
- correct orientation of tiles?
- coloring of the tiles?
- block-markings on board and tiles? (may trigger different naming of tiles)
- continuous/discrete actions? (cursor and tiles visible while they are being dragged)
- must the task be completed or should the system play the last (three) moves on its own?
- system
- has solution?
- allow for wrong moves by the user? how to cope with user errors?
- free/slot filling
- allows for overanswering?
- continuous/discrete actions? (ability to disrupt GUI actions, only necessary for the continuous task, the continuous task could still be coupled with discrete actions of the system, but would likely lead to bad usability)
- helpfulness? (somewhat coupled with has-solution and domain knowledge)
|
|
Higgins
timo | page | Tue Apr 01
Higgins-Howto - Mozart über Synaptic installieren
- Higgins aus svn+ssh://helios/projekte/inpro/INPRO_SVN/Code/Higgins auschecken.
- export HIGGINS_RUN=`pwd`
- ozengine launcher/Launcher.exe Potsdam/number.higgins
- voilà
|
|
experiment definitions
timo | page | Tue Feb 26
| | unit | selection | class | | bc.utt | word | annot. EOU | EOT | | 3-Xms | word | pause ≥ X ms | EOT | | us | word | any | EOU | | 2 (aw) | word | any | EOT | af.bin-Xms-ns-now (was: 1, 4)
| frane | any | vicinity of X ms incl.(n)/excl. following silence |
|
|
experiment definitions
timo | page | Tue Feb 26
| | unit | selection | class | | bc.utt | word | annot. EOU | EOT | | 3-Xms | word | pause ≥ X ms | EOT | | us | word | any | EOU | | 2 (aw) | word | any | EOT | | af.bin-Xms-ns-now | frane | any | vicinity of X ms incl.(n)/excl. following silence |
|
|
Trigger on data not working in OAA?
timo | weblog | Mon Feb 11
Continuing from the last post, assume you want your OAA-agent to react on certain data changes. You setup a trigger with something like this: oaaAddTrigger(data, otherSpeechEnd(_), oaaSolve(startTalking(), [reply(none)], [on(add), recurrence(whenever)]) Right? No! Well, yes but that's not enough. You have to make sure, that the data (otherSpeechEnd(X)) is already known to the facilitator. So, in order for the trigger to work, you need two lines: oaaAddData(otherSpeechEnd(_), []) aaAddTrigger(data, otherSpeechEnd(_), oaaSolve(startTalking(), [reply(none)], [on(add), recurrence(whenever)]) Very nasty behaviour, because the bug only occurs when you've restarted the facilitator and the data type is still unknown.
|
|
hours 20071126
timo | page | Mon Nov 26
- present: Michaela, David, Timo
- Allgemeines:
- übliche Anwesenheit in Golm
- es ist generell ganz nett, abends bevor man geht noch kurz zu sagen, wann man das nächstemal kommt
- David: Mo, (Di), Fr
- Timo: Meeting-Montage, Di, (Mi), Do
- Michaela: mal abwarten wie schön ihre Wohnung wird
- ansonsten nicht scheuen, auch per IM/E-Mail zu kommunizieren (IM-Status richtig einstellen)
- zur besseren Kommunikation im Institut gibt es realistisch drei Möglichkeiten
- Coffeetalks/Vormittagstalks
- unregelmäßige Postersessions
- häufigeres Kolloquium mit internen Vorträgen
- dabei geht es nicht nur um den wissenschaflichen Austausch, sondern auch um allgemein organisatorische Dinge.
- David schreibt ne Mail an die Professoren um zu erfahren, ob ansonsten auch Interesse an mehr Austausch besteht
- Urlaubsplanung zu Weihnachten:
- Michaela: weiß nicht genau, zwischen den Jahren in Augsburg, danach wohl Umzug
- David: 21.-26. Dezember, ansonten strebsam
- Timo: faul vom 21. Dezember bis 4. Januar in Hamburg/Kiel
- also müssen unsere Ergebnisse so weit wie möglich schon vor Weihnachten stehen
- Paper
- Selling point: Syntax (esp. incrementally)
- What's a turn?
- we should probably distinguish more between turn yield and turn hold instead of just EOT
- we want to be able to try different experiments, thus our feature vectors will be "verdongelt" afterwards and must be able to be identified in different settings
- Syntax: dialogID, wordID
- Prosody: dialogID, channel, time
- currently, the mapping in *words.csv uses information from MSaligned/ and from /projekte/korpora/orig-korpora/pennTreebank/dysfl/mgd/swbd
- Timo will review the scripts that create the mapping until next tuesday
- class feature
- n words still missing
- time to EOT (from end of this word)
- make it easier for the parser:
- possible completion points (PCP?)
- I'm still not sure we completely acknowledge the fact that we are in some areas dealing with EOT and sometimes with EOU.
- features
- additional features
- expected POS after current word
- flags: seen a verb? how many NPs?
- better documentation of our features
- Gold standard
- how do we compare to the Penn treebank data directly
- how would that data have to be "incrementalized"
- difference between inproPitch and GoldPitch
- infrastructure
- Verdengelung of data independent of turns (thus we can still change the notion of turn)
- identification via dialogID/wordID for syntax and dialogID/channel/time for acoustics
- master table has mapping between the two
|