المستخلص: |
This paper introduces an innovative methodology by which to delimit named entities and nominal successive mentions in the Arabic Wikipedia. The research presents a linguistic analysis of both named entities and successive mentions including coverage, complexity and predictability. The study employed a supervised machine learning classifier, which utilizes a domain-transferred corpus that incorporates the capacity to extract features such as lexis, context, morphology and syntax. In addition to the classifier, post processing step and mention detection algorithm were developed to efficiently ascertain the boundaries of the named entity phrases and also to identify successive mentions. The study results described from an extended experiment scores an F-measure of 80.30%.
|