Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Apply Speech recognition to Serve Holy Quran

No description


on 10 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Apply Speech recognition to Serve Holy Quran

Apply Speech recognition Techniques to Serve Holy Quran
Experiments and results

Quranic Recognizer
Experiments and Result
Testing Phase
Training Phase
Comparison between Speech Recognition Techniques of E-hafiz Application and Quranic Recognize Application

By Bayan Bagasi

Data preparing

Quranic Recognition
Pre Emphasis

Silence removal
Speech Segment Extraction
-Short term energy method
Audio signal of "Al-Rhman" before and after silence removal
Using a threshold
-sets threshold to -10 db
- Used to reduce differences component of signal.
-Pre-emphasis filter

A 2-stage pre-emphasis filter with different factor values (0.92 and 0.97).

H(z) =1−aZ-1
- It's filter noise.
- First-order Finite Impulse Response (FIR) filter on digitized speech signal by means
Where a is pre-emphasis parameter whose value is close to 1, they used 0.935
Feature Extracting
Mel-Frequency Cepstral Coefficient
Quranic Recog.
Short Time Fourier Transformation(STFT)
- A signal processing method used for analyzing non-stationary signals, whose statistic characteristics vary with time.
- This analysis useful to extract property of the sound such as its local amplitude and frequency.

- The Mel frequency filter bank is a series of triangular band pass filters, which mimics the human auditory system.
- This goal is achieved by using a set of 30 triangular Mel filters

Mel filter bank
Discrete Cosine Transformation (DCT)
- Transform the audio data to a form that lends itself well to compression
- Express a finite sequence of data in terms of a sum of cosine functions at different frequencies.
- result to extract the MFCC
The Cepstral Mean Normalization (CMN)

- Used to reduce the distortion effect introduced by the transmission medium (microphone).

-Subtracting the cepstral mean, calculated across the utterance,
from each frame

- Segment signal in to frames. In this application
- The signal is segmented into 20-40 msec frames
- Result of this process Fourier transformation process is enabled.

- Used Hamming window function.
- Used to minimize and eliminate discontinuity from start and end of each frame of the signal.

w(n)=0.54-0.46 cos(2πn⁄(N-1))

N is the total number of speech samples in the frame, n is the nth speech sample in the frame.
Where 0 ≤ n ≤ N-1.

. To converted frame sample from time domain into frequency domain used DFT .

Y2[n] is the complex number representing the magnitude and phase of frequency component.
Y1[k] is windowed signal.
Then Y2[n] is Fourier Transformation Y1[k]. Where k=0,1,2,3..N-1

Discrete Fourier Transform (DFT)
- Emphasize low frequency component in speech

- Anumber of triangular filter bank is used and created during MFCC calculation.

Frequency(Mel Scale)=2592*log⁡(1+f/700)

Where f is linear of frequency .

Mel Filter-bank

- Used logarithm function into Mel Filter-bank by replacing each value by its natural log .
- Used log command on MATLAB

Applied inverse of DFT on the result of the logarithm step is final step .

Inverse Discrete Fourier Transformation (IDFT)
x[k] is logged value of each Mel filter speech segment obtained from previous step
General Model
Quranic Recog.
- Feature vectors (Cepstrum, delta cepstrum and delta delta cepstrum).
- An array of MFCC features
1- Decoding phase with acoustic model to create phoneme.
2- Search on Search graph to find match phoneme.
1 - Produce a feature vector as the same steps of MFCC .
2- Obtained same verse from Database.
3- Calculate the average of value of both vectors and calculate difference of averages .
4- Compare the result of with a threshold which is setting by default.
5- If the result is greater than the threshold then it has an error.
- Recite sourat “Al-Ikhlass” several times
- Recorded the number of "Ayate" that were recognized correctly
-Calculated a mean recognition ratio for each tester was .
-Sample is 20 men and 20 women

- Selecting the verse from the list available in E-Hafiz system.
- For testing goal, the reciters made some mistakes intentionally.
- The mean recognition ratio is calculated by taking the ratio of wrong read verses pointed by the E-hafiz

-The result of two application is similar in men persons
- Quranic Recognizer:
- More complexity
- But it's learning machine where is used HMM.
- Ability to improvement system
- It's not allow one mistake and not appear mistake

- E-Hafiz,
- It's the simplest workflow
- Used available toolboxes in Matlab which is accurate.
- It's not learning machine while it's dependent on the vector features from expert reciters only.

Future work study how to improve application to achieve help Arabic Muslims and non-Arabic to recite and memorize Quran by perfect way. .
[1] M.S. Bashir, S.F. Rasheed, M.M.Awais, S. Masud, S.Shamail,.”Simulation of Arabic Phoneme Identification through Spectrographic Analysis.”Department of Computer Science LUMS, Lahore Pakistan.( 2003)

[2] R. Zaidi, I. Noor, I. Mohd , T. Emran , Y. Mohd, Y. Zulkifli and A. Noor, “Quranic Verse Recitation Recognition Module for Support in j-QAF Learning: A Review,” IJCSNS International Journal of Computer Science and Network Security,VOL.8 No.8, (August 2008).

[3] H. Tabbal, W. El-Falou and B. Monla, "Analysis and Implementation of a "Quranic" verses delimitation system in audio files using speech recognition techniques," Proc. of the IEEE Conf. of 2nd Information and Communication Technologies. vol. 2, pp. 2979-2984 (2006).

[4] W. M. Muhammad, R. Muhammad, A. Muhammad and M.-E. A.M., "Voice Content Matching System for Quran Readers," in Ninth Mexican International Conference on Artificial Intelligence, 2010.

[5] H. Tabbal, W. El-Falou and B. Monla. "Analysis and Implementation of an Automated Delimiter of "Quranic" Verses in Audio Files using Speech Recognition Techniques", Robust SpeechRecognition and Understanding, Michael Grimm and Kristian Kroschel (Ed.), ISBN: 978-3-902613-08-0, (2007)

[6] M. Habash, “How to memorize the Quran”, Dar al-Khayr, Beirut 1986.

[7] A. Sheikh, “Teachings and reflections of Qur’an and Sunnah about Science, Engineering, Technology and Management,” Quaid-e-awan University research journal of engineering, science and technology, volume 1, No. 1, (Jan-Jun 2000).
[8] http://cmusphinx.sourceforge.net/sphinx4/
[9] Sphinx group, “Sphinx-4: A flexible Open Source Framework for Speech Recognition”, Sun Microsystems, 2004.

[10] http://www.mathworks.com/products/matlab/
[11] D.Ning, "Developing an Isolated Word Recognition System in MATLAB", MATLAB Digest, TheMathwork

[12] http://www.animations.physics.unsw.edu.au/jw/dB.htm
[13] http://www.thefreedictionary.com/threshold
[14] http://www.ivoronline.com/Science/Signals/Digital%20Filters%20-%20Preemphasis/Digital%20Filters%20-%20Preemphasis.pdf
[15] http://www.originlab.com/index.aspx?go=Products/Origin/DataAnalysis/SignalProcessing/STFT
[16] http://haroon.99k.org/page21.html
[17] http://www.uio.no/studier/emner/matnat/math/MAT-INF1100/h07/undervisningsmateriale/kap6.pdf
[18] http://www.ee.uwa.edu.au/~roberto/research/speech/local/entropic/HAPIBook/node85.html
[19] http://www.cic.unb.br/~lamar/te073/Aulas/mfcc.pdf
[20] http://cmusphinx.sourceforge.net/wiki/tutorialam
[21] http://en.wikipedia.org/wiki/Acoustic_model
[22] http://ayesha.lti.cs.cmu.edu/twiki/pub/Main/UploadsAndWritings/Mark_Gales_ASRU_talk09.pdf

- The reason of choosing this subject
The reasons of choosing :
- Quranic Recognition
- E-hafiz
- Acoustic model
- db (decibel which is used to measure sound level )
- Threshold which is mean level of energy on signal


i- Necessary prolongation of 6 vowels.
ii- Obligatory prolongation of 4 or 5 vowels.
iii- Permissible prolongation of 2, 4 or 6 vowels.
iv- Normal prolongation of 2 vowels.
v- Nasalization (ghunnah) of 2 vowels.
vi- Silent unannounced letters.
vii- Emphatic pronunciation of the letter R.

Literature review
- Art of Tajweed
- The impact on acoustic model
- Repetition of the vowel n-corresponding times.
- Prolongation
- Nasalization (ghunnah)
- Used another phoneme or voice
- Emphatic pronunciation of the letter R.
- Kaf, Khaa when they are voweled by a fatha

Deal with rules :

- By Tabbal Hassan, Al-Falou Wassim and Monla Bassem from Lebanese University.
- At 2007
- Used Sphinx Framework

Speech recognition Techniques

- Quranic recognizer application
- E-hafiz
- By Waqar Mirza Muhammad, Rizwan Muhammad, Aslam Muhammad, Martinez-Enriquez A.M
- At 2010
- Used Matlab Framwork

The environments of applications
- MATLAB® is a high-level language
- Interactive environment for numerical computation, visualization, and programming.
- Analyze data, develop algorithms, and create models and applications.

It has important toolboxes :
- Data Acquisition Toolbox™
- Signal Processing Toolbox™
- Statistics Toolbox™
- It's open source framework.
- It is a state-of-the-art speech recognition system written entirely in the JavaTM
- It is based on HMM (Hidden Markov Model)
- The Sphinx-4 framework has been designed with a high degree of flexibility and modularity.

Sphinx iv
Sphinx iv

- The LanguageModel module of the Linguist provides word-level language structure,

Language Model

Grammar formats

Java Speech Grammar Format JSGF why?
- The Dictionary provides pronunications for words found in the LanguageModel.
- The pronunciations break words into sequences of sub-word units found in the AcousticModel.

- The AcousticModel module provides a mapping between a unit of speech and an HMM
- Created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word

Table of phonemes
Full transcript