Log on: Remember me
Powered by Elgg
  • Publish Comment:

  • David Schlangen's Pages:

    Pages
  • David Schlangen

  • Owned communities

David Schlangen : Home Page > minutes030208b > 030208cont

  - kurzfristige Projekte:
    - bababa2, SIGdial Poster
      - TO DOs, unprioritisiert: a) Silbengrenzen, von
        Aussprachewörterbuch kommend; b) echtes Audio verwenden,
  Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere
  speech states, phrasengrenzen (f. BCs); e) besser
        TT-Strategien; f) simulation, constant time < (or >)
        real-time; g) bessere Evaluation; h) interruption management;
        i) BC management; j) Parametrisierung (chattiness,
interruption propability, etc.); k) adaptivity
      - mögliche Ansätze f. Paper:
      - in Richtung David T., `believable, non-scripted content-free
          background chatter'
  Nicht sehr überzeugend; um online erzeugt zu werden, doch
          ein wenig resourcenhungrig. Nur für Hintergrundgerede würde
          das wohl niemand ernsthaft einsetzen.
        - `simple rules create realistic turn-taking patterns'
  SSJ rules as *generative* rules, not just descriptive. Shows
          that such a set of rules, together w/ some audio magic, are
          enough to produce patterns that are `natural' (in a way that
    needs to be defined properly). Again sort of upper-bound; to
          get something like this working properly within a real
          system, here's what we would need in terms of components.
          - to do first: b), d), e), g).
  - needed: more principled metric for `naturalness' of
            resulting corpus. Multi-dimensional: distribution of gaps
      & overlaps, balance btw speakers, turn length (in time,
      but also # of utterances).
    - `syntactic and prosodic language modelling for incremental
      utterance segmentation', für Coling
      utterance end pointing, but in an incremental set up. Needed to
      know where to clear the chart of the parser. Connected to a
      well-researched task (i.e., easy to motivate & compare), but
      different in that we don't allow (as much?) right context.
      - method:
      - select only multi-utterance turns; EOUs to find are the
          turn-internal ones.
        - use original data & variants w/ various WER.
          Those need plausible time information. How much does
          this degrade performance?

      - what's a good way to evaluate this? follow-on effects of wrong
        decisions: an insert for example makes us restart the parser,
        and hence get other things wrong?



das, 03/03/08 10:37 (GMT)

Add a new page under this one