David Schlangen : Home Page > minutes140408
- InPro, meeting, minutes, 14/04/08
- present: M, T, G, D
- Gabriel demo'ed current state of Higgins. Displays duration of
vocal action (both recognised and own) on timeline, uses simple
boundary tone classification (up, down) to base decisions on
thresholding on. (This is mostly a test of the architecture at the
moment, the strategies are very simple.)
- dysfluencies: what to do with aborted words? Most likely, sphinx
will recognise rubbish. Would be too unrestrictive to include
aborted versions of all words; adding other methods (e.g., using
prosodic info) would require too much changes at low level of
ASR. (Hm. But at some point we'll have frame-level /
syllable-level prosodic info anyway. Shouldn't be too hard to let
classifier judge whether word was perhaps misrecognised because it
was a different, aborted one.)
- Timo and Gabriel will work together on getting better classifier
for boundary tone detection to work. Does it need to do speaker
adaptation?
- first step on syntax side: toy grammar for Pento domain.
(``Nimm das {Kreuz | Teil | lange Ding} aus der Mitte links
oben'') in Higgins parser.
- using a grammar as linguistic model in sphinx apparently doesn't
work incrementally (doesn't return results before top category has
been found), but using statistical LM does work. (Although there
are still technical problems, but it looks promising.)
- even if we can't use a grammar, we can still bootstrap an n-gram
LM with utterances generated from a domain grammar.
das, 04/14/08 11:40 (GMT)
Keyword: grammar,
Higgins,
InPro,
meetings,
minutes,
sphinxAdd a new page under this one