**Apply Speech recognition Techniques to Serve Holy Quran**

Experiments and results

Quranic Recognizer

Experiments and Result

E-Hafiz

Testing Phase

Training Phase

**Conclusion**

**Comparison between Speech Recognition Techniques of E-hafiz Application and Quranic Recognize Application**

By Bayan Bagasi

Data preparing

E-hafize

Quranic Recognition

Pre Emphasis

Silence removal

Speech Segment Extraction

-Short term energy method

Audio signal of "Al-Rhman" before and after silence removal

Using a threshold

-sets threshold to -10 db

- Used to reduce differences component of signal.

-Pre-emphasis filter

A 2-stage pre-emphasis filter with different factor values (0.92 and 0.97).

H(z) =1−aZ-1

- It's filter noise.

- First-order Finite Impulse Response (FIR) filter on digitized speech signal by means

Where a is pre-emphasis parameter whose value is close to 1, they used 0.935

Feature Extracting

Mel-Frequency Cepstral Coefficient

(MFCC)

E-hafiz

Quranic Recog.

Short Time Fourier Transformation(STFT)

- A signal processing method used for analyzing non-stationary signals, whose statistic characteristics vary with time.

- This analysis useful to extract property of the sound such as its local amplitude and frequency.

- The Mel frequency filter bank is a series of triangular band pass filters, which mimics the human auditory system.

- This goal is achieved by using a set of 30 triangular Mel filters

Mel filter bank

Discrete Cosine Transformation (DCT)

- Transform the audio data to a form that lends itself well to compression

- Express a finite sequence of data in terms of a sum of cosine functions at different frequencies.

- result to extract the MFCC

The Cepstral Mean Normalization (CMN)

- Used to reduce the distortion effect introduced by the transmission medium (microphone).

-Subtracting the cepstral mean, calculated across the utterance,

from each frame

Framing

- Segment signal in to frames. In this application

- The signal is segmented into 20-40 msec frames

- Result of this process Fourier transformation process is enabled.

Windowing

- Used Hamming window function.

- Used to minimize and eliminate discontinuity from start and end of each frame of the signal.

w(n)=0.54-0.46 cos(2πn⁄(N-1))

N is the total number of speech samples in the frame, n is the nth speech sample in the frame.

Where 0 ≤ n ≤ N-1.

. To converted frame sample from time domain into frequency domain used DFT .

Y2[n] is the complex number representing the magnitude and phase of frequency component.

Y1[k] is windowed signal.

Then Y2[n] is Fourier Transformation Y1[k]. Where k=0,1,2,3..N-1

Discrete Fourier Transform (DFT)

- Emphasize low frequency component in speech

- Anumber of triangular filter bank is used and created during MFCC calculation.

Frequency(Mel Scale)=2592*log(1+f/700)

Where f is linear of frequency .

Mel Filter-bank

- Used logarithm function into Mel Filter-bank by replacing each value by its natural log .

- Used log command on MATLAB

Logarithm:

Applied inverse of DFT on the result of the logarithm step is final step .

Inverse Discrete Fourier Transformation (IDFT)

x[k] is logged value of each Mel filter speech segment obtained from previous step

General Model

E-hafiz

Quranic Recog.

- Feature vectors (Cepstrum, delta cepstrum and delta delta cepstrum).

- An array of MFCC features

1- Decoding phase with acoustic model to create phoneme.

2- Search on Search graph to find match phoneme.

1 - Produce a feature vector as the same steps of MFCC .

2- Obtained same verse from Database.

3- Calculate the average of value of both vectors and calculate difference of averages .

4- Compare the result of with a threshold which is setting by default.

5- If the result is greater than the threshold then it has an error.

- Recite sourat “Al-Ikhlass” several times

- Recorded the number of "Ayate" that were recognized correctly

-Calculated a mean recognition ratio for each tester was .

-Sample is 20 men and 20 women

- Selecting the verse from the list available in E-Hafiz system.

- For testing goal, the reciters made some mistakes intentionally.

- The mean recognition ratio is calculated by taking the ratio of wrong read verses pointed by the E-hafiz

-The result of two application is similar in men persons

- Quranic Recognizer:

- More complexity

- But it's learning machine where is used HMM.

- Ability to improvement system

- It's not allow one mistake and not appear mistake

- E-Hafiz,

- It's the simplest workflow

- Used available toolboxes in Matlab which is accurate.

- It's not learning machine while it's dependent on the vector features from expert reciters only.

Future work study how to improve application to achieve help Arabic Muslims and non-Arabic to recite and memorize Quran by perfect way. .

- The reason of choosing this subject

The reasons of choosing :

- Quranic Recognition

- E-hafiz

- Acoustic model

- db (decibel which is used to measure sound level )

- Threshold which is mean level of energy on signal

**Terminology**

i- Necessary prolongation of 6 vowels.

ii- Obligatory prolongation of 4 or 5 vowels.

iii- Permissible prolongation of 2, 4 or 6 vowels.

iv- Normal prolongation of 2 vowels.

v- Nasalization (ghunnah) of 2 vowels.

vi- Silent unannounced letters.

vii- Emphatic pronunciation of the letter R.

Literature review

- Art of Tajweed

- The impact on acoustic model

- Repetition of the vowel n-corresponding times.

- Prolongation

- Nasalization (ghunnah)

- Used another phoneme or voice

- Emphatic pronunciation of the letter R.

- Kaf, Khaa when they are voweled by a fatha

Deal with rules :

- By Tabbal Hassan, Al-Falou Wassim and Monla Bassem from Lebanese University.

- At 2007

- Used Sphinx Framework

Speech recognition Techniques

- Quranic recognizer application

- E-hafiz

- By Waqar Mirza Muhammad, Rizwan Muhammad, Aslam Muhammad, Martinez-Enriquez A.M

- At 2010

- Used Matlab Framwork

**The environments of applications**

- MATLAB® is a high-level language

- Interactive environment for numerical computation, visualization, and programming.

- Analyze data, develop algorithms, and create models and applications.

MATLAB

It has important toolboxes :

- Data Acquisition Toolbox™

- Signal Processing Toolbox™

- Statistics Toolbox™

- It's open source framework.

- It is a state-of-the-art speech recognition system written entirely in the JavaTM

- It is based on HMM (Hidden Markov Model)

- The Sphinx-4 framework has been designed with a high degree of flexibility and modularity.

Sphinx iv

Sphinx iv

- The LanguageModel module of the Linguist provides word-level language structure,

Language Model

SimpleWordListGrammar

JSGFGrammar:

LMGrammar

FSTGrammar

LargeTrigramModel

Grammar formats

Java Speech Grammar Format JSGF why?

- The Dictionary provides pronunications for words found in the LanguageModel.

- The pronunciations break words into sequences of sub-word units found in the AcousticModel.

Dictionary

- The AcousticModel module provides a mapping between a unit of speech and an HMM

- Created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word

AcousticModel

Table of phonemes