I’ve gotten an answer back from “a friend”, which I’m relaying:
Generally, the problems are:
1) parsing and post-tagging must be done in the same time (no good corpora means almost everything is based on rules)
2) severe polysemy if the text is without vowel marks (99.9% texts are without vowels)
3) no capitalization. means, all entity extraction rules are context-based. At the same time, Arab entities have a lot of names like city “City”, village “Village”, guy name “Good” etcIt’s just the first most crucial things which came in my mind.