Why aren’t more people using machine learning on historical linguistics?

Please God no.

For the sentiment this proposal awakens in the soul of historical linguists, refer:

xkcd: Physicists

Plenty of people use machine learning on historical linguistics. They usually end up being picked up by science reporters, getting all the publicity that historical linguists don’t. And when they do, historical linguists roll their eyes, and turn the page.

Historical linguistics involves dirty data. Historical linguists know how to clean it up, and they know what the standards of proof are: that’s the comparative method. The Linguistatron 3000 someone did as their Honours thesis usually doesn’t know how to clean it up, and they get stuck on learning noise.

Why yes, I am arrogant. Why do you ask?

The non-arrogant version of this answer is Brian Collins’.

EDIT: See Steve Rapaport’s answer for a most entertaining instance of linguists cleaning up after a Linguistatron 3000 paper in Science. Pro tip: if you want to know about linguistics, don’t read Science.

Leave a Reply

Your email address will not be published. Required fields are marked *