Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


CSE 891

No description

Jared Wein

on 28 April 2010

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of CSE 891

Harnessing the Internet for Automatic
Generation of an In-Domain Language Model Focus: Improving accuracy of Large
Vocabulary Continuous Speech Recognition
System that is used for transcription of class lectures Current implementation uses a BBC corpus and augments the
corpus with text from the PowerPoint slides

I focused on using internet resources to augment the current implementation Use the presentation's title to find related
content online

Google search specific to wikipedia.org
Google general search
Google search specific to *.edu Data from the internet requires a
lot of work to clean it up:
Restricting binary data
Removing javascript and css
Removing html tags In conclusion:

The added value depends heavily on a good choice of presentation title by the presenter
Language on the internet does not necessarily reflect spoken language (such as internet lingo)
Ambiguous topics can have unexpected augmentations if the unintended usage is more prevalent.
Surprised to see Wikipedia not contributing much gain, but Google general and EDU specific independently increased accuracy by about five percent
Full transcript