What are the main current challenges in doing Arabic NLP?

I’ve gotten an answer back from “a friend”, which I’m relaying:

Generally, the problems are:
1) parsing and post-tagging must be done in the same time (no good corpora means almost everything is based on rules)
2) severe polysemy if the text is without vowel marks (99.9% texts are without vowels)
3) no capitalization. means, all entity extraction rules are context-based. At the same time, Arab entities have a lot of names like city “City”, village “Village”, guy name “Good” etc

It’s just the first most crucial things which came in my mind.

Leave a Reply

Your email address will not be published. Required fields are marked *