The Internet belongs to everyone. Let’s keep it that way.

Protect Net Neutrality
Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Final Seminar

No description
by

mostafa fouad

on 8 October 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Final Seminar

Agenda
1- Problem Definition And Solution.
2- Goal.
3- Project Phases.
4- Why All These Modules ?
5- Shot Detection.
6- Module(1): Low Level Feature.
7- Module(2): Speech Recognition.
8- Module(2): Face Recognition.
9- Time Plan.
10- Implementation Environment.
11- Final Demo.
12- Future works.
Content Based Video Retrieval
Problem Definition And Solution
Most news videos have long duration with many categories tittle may not be descriptive enough
Goal
(1) Retrieve annotations & classify the type of news(Weather, Sports, ……)
weather
Sport
(2) Summarization for the video contents (Key words, Key Frames, persons,…….)
Obama
Sport
Tahrir square
Cold day
Project Phases
Why All These Modules ???!!
Isn't one enough ??!!
Shot Detection
A shot represents a sequence of frames captured from a unique and continuous record from a camera
Key Frames Extraction :
Key frame is the frame which can represent the salient content and information of the shot .
3) Visual Content Based Approach :
Using ENTROPY
What is Entropy ?
Entropy is used to measure the amount of Information.
It describes how much randomness (or uncertainty) there is in a signal or an image.

Steps To Calculate Global Entropy Of a Frame:
1) Calculate the probability of appearance using this formula:


2) Calculate the information quantity using this formula :


3) Calculate the local Entropy for this gray level:


4) Calculate the global Entropy for this frame :


Why Entropy ??
The main advantage of this algorithm is that it segments the shots with high accuracy into key-frames using a semantic meaning ”Entropy” .
Then , How To Extract The key Frames ?!
1) Sort the Entropies.
2) Get the threshold value :


3) Calculate the difference between two consecutive frames:



If the difference > a specific value
It is a key frame

Speech Recognition
Introduction:
What is speech recognition ?
Is the ability of the machine or the program to identify words and phrases in spoken language and convert the to a machine readable format.


 speech recognition (SR) is the translation of spoken words into text
Purpose
Categorization
Annotation
Categorize and annotate video by analyzing it
Algorithm To Categorize The Shot
1-calculate rp(x) through summing the number of repeated word matched in the Database .
2-then calculate the summation of rp(x) for each category.
3-calculate the probability of each category by using rule (1).
4-the category with the highest prob would be the category of this shot.
Face Detection
It is a computer technology that determines the locations and sizes of human faces in digital images. It detects face and ignores anything else, such as buildings, trees and bodies.
Viola – Jones Face Detector
The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes

Feature Types And Evaluation
The feature employed by the detection framework universally involve the sums of image pixels within rectangular areas.
Cascade Architecture
The evaluation of the strong classifiers generated by the learning process can be done quickly, but it isn’t fast enough to run in real-time. For this reason, the strong classifiers are arranged in a cascade in order of complexity, where each successive classifier is trained only on those selected samples which pass through the preceding classifiers
Practical Work
We are using to implement the proposed algorithm matlab 2013 built in function called vision.CascadeObjectDetector it’s a generic object detector that can be adapted (trained) to detect any object.
Results
Video 1
Video 2
Face Alignment
After we extract detected faces from an input image, the extracted faces may contain some unwanted background regions that can affect recognition results tremendously. So, we need to extract facial features only then use it in recognition phase . We use for this phase another face detector that can detect face parts eyes, nose, mouse and any face region.
Model
The model is based on mixture of trees with a shared pool of parts V . We model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint
Results
Face Recognition
The final phase of Face Detection, Alignment and Recognition .
We are using a new method called MSPCR (Multi-scale Patch based Collaborative Representation). This method partition the query image into a set of overlapped patches with the same sizes then apply the recognition on each patch individually and finally it combines all results from all patches to output a the final recognition result
Multi-scale Ensemble
This algorithm is performed on different scales (patch sizes) for better recognition results (4, 6, 8, 10, 12, 14, 16) and finally all results are combined to output the final recognition result.
Training Database
Our database consists of 110 training example collected from some of news videos. The database contains 11 different subject (person) 10 images for each subject all detected and aligned using the proposed methods
Implementation Environment
The system was implemented using matlab and C#.Most of system logic was implemented using matlab except speech recognition.The interface design was implemented using C# WPF
Future Works
Add more Modules such as:
1) Text Recognition.
2) Motion Recognition.
3) Object Recognition.
4) Logo Detection.

Supervised By
Prof. Dr. El-Sayed El-Horbaty
Dr. Mohamed Abdel-Mageed
T.A. Ahmed Salah
Team Members
1-Ibrahim Mohammed Amer
2-Ahmed Saeed Ibrahim
3-Mostafa Fouad Mahmoud
4-Aya Ahmed Serry
5-Zeinab Mohammed Fouad
6-Hajer Adel Ahmed
Full transcript