What are the main current challenges in doing Arabic NLP?

I’ve gotten an answer back from “a friend”, which I’m relaying:

Generally, the problems are:
1) parsing and post-tagging must be done in the same time (no good corpora means almost everything is based on rules)
2) severe polysemy if the text is without vowel marks (99.9% texts are without vowels)
3) no capitalization. means, all entity extraction rules are context-based. At the same time, Arab entities have a lot of names like city “City”, village “Village”, guy name “Good” etc

It’s just the first most crucial things which came in my mind.

Answered 2017-05-09

[Originally posted on http://quora.com/What-are-the-main-current-challenges-in-doing-Arabic-NLP/answer/Nick-Nicholas-5]

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Leave a Reply Cancel reply