Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Show and Tell: A Neural Image Caption Generator

No description
by

Martin Bulín

on 9 June 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Show and Tell: A Neural Image Caption Generator

How well
does it work...
How was it
tested
...
An
end-to-end
neural network system that can
automatically view an image
and
generate
a reasonable
description

in plain English
.
Key design choices
Experiments
Achievements
METRICS

::
graders
grades 1-4
two workers/image (65% agree)

::
BLEU
human evaluation comparison

::
literature
possible image description
DATASETS

BLEU scores:
Martin Bulín
Tuesday, March 3, 2015
SDU/MMMI/SCM
What
does it do...
Show and Tell :: Introduction
What have we
observed
...
Conclusions
It automatically describes the content of an image
:: BLEU is not a perfect metric

:: NIC is very robust (sentences reasonable)

:: increasing size of the available datasets
=> improving performance of approaches like NIC

:: how one can use unsupervised data?
A Neural Image Caption Generator
:: visually impaired people...

:: artificial environment perception

:: isn't it just cool?
How
does it work...
Why
is it worth working on it...
:: 'reading' an image

:: forming sentences
in plain English
Challenging points:
how objects, their attributes and also activities, that objects are involved in,
relate to each other
?
Image classification task
Sentence formulation task
CNN
:: pre-trained (classification)
:: last hidden layer
=> input to...
maximizing p(T/S)
done by RNN
inspired by translation task
Questions to be answered
:

how dataset size affects generalization?

how it would deal with weakly labeled
examples?
several metrices, data sources and model architectures
current state-of-the-art
Full transcript