Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Arabic reCAPTCHA

No description
by

Menna Nabil

on 14 October 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Arabic reCAPTCHA

Menna Nabil Bakry
Supervised By:
Arabic RECAPTCHA
Prof. Dr. Slim Abdennadher
M.Sc. TA Mohamed Khamis

What is CAPTCHA & RECAPTCHA?
RECAPTCHAs are used to digitize books

* 100 million CAPTCHAs are typed everyday
* Each CAPTCHA takes 10 seconds to be filled
Can we use this effort for the Good of humanity?
ik>SAL + 6S
CKioa3Q
Books Digitization
How can we trust the user ?
RECAPTCHA has two words: one is known, and the other is unknown
Image
Text
OCR
CAPTCHA
Books Digitization
Example from a Book
Top Arabic OCR engines
3. APTI Arabic Words
My project (Arabic RECAPTCHA)
Recent events made us lose many Egyptian books

Arabic RECAPTCHA is introduced for digitizing Arabic books

Newspaper
2. OmniPage
5. Tesseract
4. Word Spotting
6. ABBYY
1. Sakhr Automatic Reader
ABBYY Vs. Tesseract
Output Information
Tesseract which is developed by Google was chosen due to
3 main reasons:
1-Word-recognition is more suitable for the application.

2- has no limit for digitization amount per request since it is free.

3-Recognizes images with multiple columns and gives better results.

ABBYY Output
Tesseract Output
Samples
Database insertion
Database Schema
Client side code

* Adding some degradations of random dots and lines
* Following the same standards of the english recaptcha
The Backend
Arabic RECAPTCHA web service recieves two http requests:
1. The first request retrieves a new AreCAPTCHA.
2. The second validates the input of the user.
The Code is divided into 3 components
Client Side
Server Side
Linking File
Sends html code
Recieves html code
Arabic_recaptcha_get_html()
english_recaptcha_get_html()
http://recaptchadomain.webege.com/AreCaptcha/AreCaptcha.php
Future Work:
1. Sustain the formation signs of Arabic text after digitization.

2. Use an ICR or an OCR that gives better detection for Arabic

3. Making a game with the purpose of classifying the words instead of the admin to save his time.
Words's classification process
Different CSS Styling colors
Testing Phase
Testing Phase output
ABBYY
Tesseract
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
20
CAPTCHA Examples:
Word's Extraction
Full transcript