Loading…
Transcript

Arabic Researchers' Stance

Arabic Researchers' Stance

  • EventCorps (2013) consist of 5393 documents

TREC CLIR tasks in 2001/2002

TREC-1&2

  • International Corpus of Arabic (ICA) (2006-2013)

The proposed Idea

CLIR tasks in 2001/2002

The proposed idea

(The evaluation campaign)

In 2001 and 2002 the Text REtrieval Conference (TREC) dedicated a track to test the effectiveness of techniques in Arabic monolingual and cross-lingual retrieval.

Spelling of translated or transliterated proper names in general tends to be inconsistentin Arabic, some of them could be considered as typos.

  • Providing standard collections for evaluation, where different systems can be compared and evaluated fairly and effectively.
  • Small text collection used for testing compared with those used in English. 75 queries which cover a tiny proportion of all Arabic terms.
  • TREC-6 ad hoc topics (350 English Topics)
  • TREC-9 web topics (600 English topics)

  • Offering resources and test data collections that are expensive to build (beyond the capability of any individual researcher) for public.

This problem will affect the performance of any AIR system and will cause them to report incomplete results.

The Problem Statement

Conclusion

1-The cost of producing relevance judgments for a large collection is very high and dominates the cost of developing test collections

Conclusion

2-Evaluation tracks or shared-tasks in campaigns that pertain to Arabic are few compared to other languages, such as TREC for English and FIRE that focuses on the Indian language.

  • There is only one standard large collection for Arabic IR evaluation, the TREC 2001/2002 collection.
  • The next generation of Arabic IR systems will need to consider working with richer data.
  • The amount of research reported for Arabic IR is considerably limited compared to what is done in other languages.
  • Current data found on the Internet could be a starting point for building a considerable resource adequate to challenge systems and reflect their ability to handle real data.

Relevance

Assessments

Information Retrieval

IR Test Collection

These are the links between the topics and relevant documents. Relevance assessment provides the relevance information that is required for subsequent experimentation.

Developing an Arabic Test Collection for Information Retrieval Evaluation

Test Collection for Arabic IR

Information Retrieval

is the activity of obtaining information resources relevant to an information need from a collection of information resources.

March -2014

Document collection

Thank you

Topics

The topics should represent typical user information needs and are often expressed in the form of queries.

The most visible IR applications are:

• Search engines

• Online library

Developing an Arabic Test Collection for Information Retrieval Evaluation