David Schlangen : Home Page > minutes221007
- present: Michaela, Timo, David
- re. end of turn project / paper:
- prosody: add online speaker adaptivity? (Learning average f0,
intensity, etc.)
- syntax: how do you evaluate incremental parser?
- infrastructure:
- `incrementalizer': textual input field that sends out new word
each time space bar is hit. Perhaps also faking dysfluencies
when backspace is hit. E.g., if input is "I saw Pe^B^B^John",
output is "I saw Pe- erm John".
This module should be able to take the place of the ASR in the
architecture, sending to the parser exactly what it expects.
- discussed domain for InPro system:
- VM, pro:
- corpus
- symmetric task, both talk same amount, both have the same
roles
- VM, contra:
- pretty complex domain, perhaps not terribly easy to restrict
so that it becomes realistic to model.
[ Then again, perhaps not. Could always be modelled as
slot-filling ("at what day do you want to meet?", "at what
time?", "sorry, that doesn't work. I can offer ___. Does
this work, yes or no?"). Lots of system initiative. But then
the same holds for almost any domain, certainly also
pento. And, if it is modelled with more system-initiative,
it becomes less symmetric, of course, and one looses the
advantage that the system can model either partner (or both
can be modelled by system). ]
- Pentomino / DEAWU setting:
Computer is Instruction Follower (IF), moves pieces on
board. Human is Instruction Giver (IG), directs IF to place
pieces. Most likely, situation should be one where IG sees board
& outline, & sees what IF is doing.
- several variations possible, including DEAWU setting where IG
has numbered solution.
- pro:
- corpus
- existing modules (reference resolution for pieces on board)
- perhaps more like SDS? (But see above: VM domain can be made
SDS-like as well.)
- contra:
- asymmetric task -- perhaps with little talk by system?
- brings in new issues like reference resolution and
clarification. [ But these issues would surface in any kind
of practical system. ]
- more specific new issues:
- action; i.e., moving mouse pointer to target location; and
allowing user to barge in on *action*, "no, not there!"
[ but avoidable, if feature of acting incrementally (=
before end of turn) is not added ]
- the question: WHY? Need for a voice-interface to a game
that is more easily and readily controlled by direct
manipulation (aka: dragging and clicking) is difficult to
see..
- same here: system-initiative can be imposed here as well, even
if it is not necessarily natural. E.g. "which piece do you want
to place?".. "how shall I rotate it?" etc. etc.
- would bring in additional way of showing use of incremental
processing (see above), namely beginning to act
(non-linguistically) as soon as partial information from the
utterance allows is. ("Now take the green [moves to direction
of green piece] piece")
- turn-taking wise, the challenge here would probably be not so
much detecting turn endings and act fast, but rather detecting
hesitations and *not* act. I.e., avoiding wrong time-outs.
--> this could be done on our existing corpus / corpora.
- nice uses of DEAWU setting (IG has solution): solution could be
known to system as well.
- only do part of identifying piece on board; system places it
automatically. I.e., only interface Alex Siebert's thing with
ASR (& some more GUI).
- complete fake: system only detects turn ends, then plays
hesitations, asks the occasional (fake) clarification question,
and then does what it knows is correct anyway..
Occasionally places pieces wrongly, etc. etc..
Actually, any kind of intermediate step is possible: use
keyword spotting to detect which task is being done at the
moment (identifying piece, orientation, placement); etc. etc.
- This could be seen as another point in favour of this domain:
modules (reference resolution parts, logic) could be faked
and system / application could still be interesting. Not sure
how something like this could be done in VM domain.
- general point: turn-taking module (detect end, play hesitation if
necessary) should be general enough so that it can be wrapped
around standard SDS, e.g. one built with CSLU toolkit. The minimal
lag of FSA system (built with time-out) is known (it's the time
out setting), so our wrapper could produce "erms" of at least this
length.
One could then test whether having these "erms" improves
perception of FSA-system that is otherwise kept constant.
das, 10/22/07 04:39 (GMT)
Keyword: inpro,
meetings,
minutesAdd a new page under this one