Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Final Seminar
1- Problem Definition And Solution.
3- Project Phases.
4- Why All These Modules ?
5- Shot Detection.
6- Module(1): Low Level Feature.
7- Module(2): Speech Recognition.
8- Module(2): Face Recognition.
9- Time Plan.
10- Implementation Environment.
11- Final Demo.
12- Future works.
Content Based Video Retrieval
Problem Definition And Solution
Most news videos have long duration with many categories tittle may not be descriptive enough
(1) Retrieve annotations & classify the type of news(Weather, Sports, ……)
(2) Summarization for the video contents (Key words, Key Frames, persons,…….)
Why All These Modules ???!!
Isn't one enough ??!!
A shot represents a sequence of frames captured from a unique and continuous record from a camera
Key Frames Extraction :
Key frame is the frame which can represent the salient content and information of the shot .
3) Visual Content Based Approach :
What is Entropy ?
Entropy is used to measure the amount of Information.
It describes how much randomness (or uncertainty) there is in a signal or an image.
Steps To Calculate Global Entropy Of a Frame:
1) Calculate the probability of appearance using this formula:
2) Calculate the information quantity using this formula :
3) Calculate the local Entropy for this gray level:
4) Calculate the global Entropy for this frame :
Why Entropy ??
The main advantage of this algorithm is that it segments the shots with high accuracy into key-frames using a semantic meaning ”Entropy” .
Then , How To Extract The key Frames ?!
1) Sort the Entropies.
2) Get the threshold value :
3) Calculate the difference between two consecutive frames:
If the difference > a specific value
It is a key frame
What is speech recognition ?
Is the ability of the machine or the program to identify words and phrases in spoken language and convert the to a machine readable format.
speech recognition (SR) is the translation of spoken words into text
Categorize and annotate video by analyzing it
Algorithm To Categorize The Shot
1-calculate rp(x) through summing the number of repeated word matched in the Database .
2-then calculate the summation of rp(x) for each category.
3-calculate the probability of each category by using rule (1).
4-the category with the highest prob would be the category of this shot.
It is a computer technology that determines the locations and sizes of human faces in digital images. It detects face and ignores anything else, such as buildings, trees and bodies.
Viola – Jones Face Detector
The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes
Feature Types And Evaluation
The feature employed by the detection framework universally involve the sums of image pixels within rectangular areas.
The evaluation of the strong classifiers generated by the learning process can be done quickly, but it isn’t fast enough to run in real-time. For this reason, the strong classifiers are arranged in a cascade in order of complexity, where each successive classifier is trained only on those selected samples which pass through the preceding classifiers
We are using to implement the proposed algorithm matlab 2013 built in function called vision.CascadeObjectDetector it’s a generic object detector that can be adapted (trained) to detect any object.
After we extract detected faces from an input image, the extracted faces may contain some unwanted background regions that can affect recognition results tremendously. So, we need to extract facial features only then use it in recognition phase . We use for this phase another face detector that can detect face parts eyes, nose, mouse and any face region.
The model is based on mixture of trees with a shared pool of parts V . We model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint
The final phase of Face Detection, Alignment and Recognition .
We are using a new method called MSPCR (Multi-scale Patch based Collaborative Representation). This method partition the query image into a set of overlapped patches with the same sizes then apply the recognition on each patch individually and finally it combines all results from all patches to output a the final recognition result
This algorithm is performed on different scales (patch sizes) for better recognition results (4, 6, 8, 10, 12, 14, 16) and finally all results are combined to output the final recognition result.
Our database consists of 110 training example collected from some of news videos. The database contains 11 different subject (person) 10 images for each subject all detected and aligned using the proposed methods
The system was implemented using matlab and C#.Most of system logic was implemented using matlab except speech recognition.The interface design was implemented using C# WPF
Add more Modules such as:
1) Text Recognition.
2) Motion Recognition.
3) Object Recognition.
4) Logo Detection.
Prof. Dr. El-Sayed El-Horbaty
Dr. Mohamed Abdel-Mageed
T.A. Ahmed Salah
1-Ibrahim Mohammed Amer
2-Ahmed Saeed Ibrahim
3-Mostafa Fouad Mahmoud
4-Aya Ahmed Serry
5-Zeinab Mohammed Fouad
6-Hajer Adel Ahmed