Arabic Researchers' Stance
Arabic Researchers' Stance
- EventCorps (2013) consist of 5393 documents
TREC CLIR tasks in 2001/2002
TREC-1&2
- International Corpus of Arabic (ICA) (2006-2013)
The proposed Idea
CLIR tasks in 2001/2002
The proposed idea
(The evaluation campaign)
In 2001 and 2002 the Text REtrieval Conference (TREC) dedicated a track to test the effectiveness of techniques in Arabic monolingual and cross-lingual retrieval.
Spelling of translated or transliterated proper names in general tends to be inconsistentin Arabic, some of them could be considered as typos.
- Providing standard collections for evaluation, where different systems can be compared and evaluated fairly and effectively.
- Small text collection used for testing compared with those used in English. 75 queries which cover a tiny proportion of all Arabic terms.
- TREC-6 ad hoc topics (350 English Topics)
- TREC-9 web topics (600 English topics)
- Offering resources and test data collections that are expensive to build (beyond the capability of any individual researcher) for public.
This problem will affect the performance of any AIR system and will cause them to report incomplete results.
The Problem Statement
Conclusion
1-The cost of producing relevance judgments for a large collection is very high and dominates the cost of developing test collections
Conclusion
2-Evaluation tracks or shared-tasks in campaigns that pertain to Arabic are few compared to other languages, such as TREC for English and FIRE that focuses on the Indian language.
- There is only one standard large collection for Arabic IR evaluation, the TREC 2001/2002 collection.
- The next generation of Arabic IR systems will need to consider working with richer data.
- The amount of research reported for Arabic IR is considerably limited compared to what is done in other languages.
- Current data found on the Internet could be a starting point for building a considerable resource adequate to challenge systems and reflect their ability to handle real data.
Information Retrieval
IR Test Collection
These are the links between the topics and relevant documents. Relevance assessment provides the relevance information that is required for subsequent experimentation.
Developing an Arabic Test Collection for Information Retrieval Evaluation
Test Collection for Arabic IR
Information Retrieval
is the activity of obtaining information resources relevant to an information need from a collection of information resources.
March -2014
The topics should represent typical user information needs and are often expressed in the form of queries.
The most visible IR applications are:
• Search engines
• Online library
Developing an Arabic Test Collection for Information Retrieval Evaluation