المستخلص: |
In this paper we present a statistical measure which for the first time is used to evaluate the quality of Arabic corpora. This measure is entirely based on statistical data and language-independent. However, the values which might be obtained in the experiments could be very different for corpora written in different languages. Our experiments were conducted using Arabic corpora. We have chosen four corpora of different types in order to determine the corpus charcteristics reflected by our quality measure. The preliminary results show that the measure is significantly correlated with the writing style and the nature of the text.
|