Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


content based video retrieval and video analysis

No description

mostafa fouad

on 21 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of content based video retrieval and video analysis

Content Based Video Retrieval And Video Analysis
Let's Remember !!
A content-based retrieval system processes the information contained in image data and creates an abstraction of its content in terms of visual attributes.
1- Let's Remember !
2-Problem definition and solution
3-input and expected output
4-System architecture
5-Shot Detection (Abrupt transition Algo. and demo )
6-Key Frame Extraction (Entropy Algo.)
7-Module(1) : Low Level.
8-Module(2) : Audio.
9-Module(3) : Face Recognition.
10-Time plane and Current state
Problem definition and solution
Most videos nowadays have long duration and one title might contain many categorize and the title of the video may not be descriptive enough .
Input and expected output
System Architecture
Supervised By:
Prof. Dr/El-Syaed Al-Horbaty
Dr/Mohamed Abd Al-Majeed
Dr/Ahmed Salah
Team Members:
Ahmed Saeed Ibrahim
Aya Ahmed Serry
Ibrahim Mohamed Amer
Hajar Adel Ahmed
Mostafa Fouad Mahmoud
Zainab Mohamed Fouad
Categorize and annotate Video by analyzing it.
Video Analysis
Shot Detection
A shot represents a sequence of frames captured from a unique and continuous record from a camera

Module (1):Low Level Features

Key Frame Extraction
Key frame is the frame which can represent the salient content and information of the shot .

Speech Recognition
Module (3):Text Recognition

Module (2):Face Recognition

Integration Phase
Category & Logos

Category & Persons

Category & Keywords

Category & Keywords

1)Abrupt transitions

2)Gradual transitions

a)Fade in
b)Fade out

Not Yet
1)Get gray histogram for each Frame

2)Get the histogram deference
between each consecutive frames using :

3) Get the threshold value using :

4)For each two consecutive frames : if ( SD > Tb) It is a New Transition .

New abrupt transition
1) Shot boundary based approach :
First /Middle /Last

Key Frame 1 :

Key Frame 2 :

Key Frame 3 :

Previous Demo

3) Visual content based approach :

What is Entropy ?
Entropy is used to measure the amount of Information.
It describes how much randomness (or uncertainty) there is in a signal or an image.

Then , how to extract the key frames ?!
1)Sort the Entropies.
2)Get the threshold value :

3)Calculate the difference between two consecutive frames:

If the difference > a specific value It is a key frame

Why Entropy ??
The main advantage of this algorithm is that it segments the shots with high accuracy into key-frames using a semantic meaning ”Entropy” .

Global Features
-Two Main Topics :
1)Feature Vector Extraction
2)Similarity Measure

Local Features

1) Texture Features :
Gabor Wavelet :
a) Mean-squared energy
b) Mean Amplitude

Why Gabor Wavelet?
-It is Scale & orientation invariant.
-Distortion tolerance space.
-Good performance .
2) Color Features :

HSV histogram

Global Feature Extraction

Normalization :

Z-score (most common).

(sensitive to outliers ).

Med-Mad (insensitive to outliers / low efficiency ) .

Similarity Measure

How to measure the similarity between two Matrices/Histograms:
Bin-by-bin comparison

Cross-bin comparison

Cross-bin comparison

The diffusion distance is robust to :
1) Deformation
2) Lighting change
3) Noise

Cross-bin comparison

Modified hausdorff (Fast and accurate)
Quadratic form distance
Euclidean distance (high efficiency )
Earth mover distance
(low efficiency / accurate results )
Kolomogrove distance
Histogram intersection (high efficiency )
Chi square

Used For Histograms

HOW ??
Query Image

Face Detection
We are using matlab to detect faces locations within frame then we extract the detected faces to use it in face recognition module .

Algorithm used :
- Viola Jones Algorithm
It is a generic object detection method that can be trained to detect a variety of object classes but it is used commonly in face detection

Feature Computation
The features are computed using rectangular filters by subtracting the image pixels in the shaded region from the white region.

A, B, C, D : Rectangular filters for features computation

The existence of the feature is determined by comparing the value computed from the rectangular filter with the training data

Results And Practical Work
Face Recognition
After detecting faces from input frame we attempt to recognize it according to our database .

Currently we are using a computer vision library called EMGU to recognize faces . The algorithm behind is called Eigen faces and PCA

The problem with this method that the recognition results become worse by increasing the database subjects as well as it become worse with different illumination and face orientations

System Architecture
Face Recognition Result
Is the ability of the machine or the program to identify words and phrases in spoken language and convert the to a machine readable format.

Output Sample
Problem Faced
-According to the most repeated keyword.


Different languages.
Audio extraction from the shot.

Database Sample
Full transcript