What is the degree of intelligibility between Standard Modern Greek and Cretan Greek?

I’ve done the Swadesh list lexicostatistics: 89 of 100 core words, which is comparable to Russian and Ukrainian. (I get the same figure for Cypriot.) Mutually intelligible, but just. Much more now that the dialect is dying out.

I was exposed to the dialect 30 years ago when it wasn’t doing as badly; so I’m not necessarily the right person to ask; I’d be interested in others’ opinion.

Is there any NLP tool that can extract affix and stem of English words?

Yes, the Porter Stemmer is the most popular approach by far. See A survey of stemming algorithms in information retrieval  for a survey,  nltk.stem package  for NLTK implementations, and Porter Stemming Algorithm  for Porter’s own description of it. There are tweaks of it around, but noone has gone for anything different; and English being the way it is, there’s no real interest in the more powerful lemmatisers, which would do actual dictionary work.

As a linguist, I (and I’m sure many another linguist) am aghast at what the Porter Stemmer doesn’t do. stupider for example goes to stupid, but bigger does not go to big: Porter does not touch bisyllabic words—there’s too much risk of error. Similarly, Porter has no idea or interest in irregular forms.

It is a decent compromise on doing too much versus doing too little (and doing too much is a real problem). What people always forget is that it has to be customised, to deal with the vocabulary you’re likely to encounter, with an exceptions list. That applies in particular to its use in Lucene/SOLR.

If I want to learn Greek for the purpose of understanding science etymology, which kind of Greek should I learn?

What Spyros Theodoritsis said. In addition, for medical terminology in particular, many of the words for parts of the body have changed in Modern Greek, and the ancient words only survive in… Greek medical terminology.

Modern Greek has replaced the word for liver with συκώτι, “figged”; it’s the same dish that gave us French foie. Nothing in Modern Greek will tell you that the word used in medicine is hepar, or that the compounds are formed with hepato-.

How do you say “you’re welcome” in Greek? Is there more than one way to say it?

What Niko Vasileas said: παρακαλώ “please”, as in “please don’t mention it”. Add δεν κάνει τίποτα, “it does nothing” (presumably, “it costs nothing”), truncated to τίποτα “nothing!”

Why is it that most of the brilliant philosophers are Germans if the history tells us that philosophy came from Greece?

Why are the best tomato-based pasta sauces Italian, if history tells us that tomatoes came from the Americas?

2500 years is a long time; and in at least some ways, what the Germans were doing with philosophy in the 18th and 19th century was far from what the Greeks did in the 5th century BC (though unlike other disciplines, maybe not far enough!) At any rate, just because a people invent a field of endeavour, does not mean they get to dominate it forevermore. Cultural artefacts aren’t genetic.

What is the most difficult non-English tongue twister you know?

A couple from Modern Greek:

  • Μια πάπια μα ποια πάπια. mja papja ma pja papja. “A duck, but which duck?” Surprisingly difficult.
  • Άσπρη πέτρα ξέξασπρη κι απ’ τον ήλιο ξεξασπρότερη. aspri petra kseksaspri c ap ton iʎo kseksasproteri. “White stone, utterly white, even more utterly white than the sun.”
  • Ο παπάς ο παχύς έφαγε παχιά φακή. Γιατί παπά παχύ έφαγες παχιά φακή; o papas o paçis efaʝe paça faci. ʝati papa paçi efaʝes paça faci? “The fat priest ate thick lentil soup. Why, fat priest, did you eat thick lentil soup?”

What is language?

Hah. Having lectured Intro To Linguistics, I should be able to come up with a definition without going to Wikipedia.

Ok: a language is a system of signs that are associated with meaning, and which can be combined to express more complex meanings.

That doesn’t limit language to spoken languages, hearing languages, or human languages; it also lets in maths, logic, and computer languages. Which I think is fair. It does however insist on compositionality: signs in isolation don’t make a language. And it insists on them being a system: a wonderfully powerful yet vague term…

Is the Modern Greek letter beta (Ββ) pronounced “b” or “v”?

To make explicit what others are hinting at: it is pronounced /v/, but is often transliterated as /b/ for consistency with ancient Greek. You won’t see it with modern names, but you may see it library catalogues, for example, which often use the same transliteration for ancient and modern Greek.

And if a name is ancient, or entered English via classicists, or someone is being pretentious, b will be used. So for example Basil, Rhangabe, Bryennius.

What are languages you can understand even though you never learned them?

I have high school French, self taught Latin, and Esperanto. I’ve never studied Italian, but between working in an Italian languages department, exposure to classical music, and some guesswork, I’ve actually had basic Italian conversations while in Italy.

What is the etymology of the word “egotism”?

ego + ism is just about the complete story, but not quite.

ego + ism = egoism. In fact, when French coined the word in 1755 (Online Etymology Dictionary ), they coined it as égoisme; and when Greek took the word in from French, they kept it as εγωισμός.

But someone somewhere early on found that ego-ism sounded weird. With good reason: Latin nouns ending in –o would not be stuck next to an –ism normally—a bunch of inflectional and derivational affixes would intervene (natio > natio-n-al-ism). Ego is a pronoun, not a noun, so those kinds of affixes were just not available.

So, because ego-ism looks strange, someone decided to stick in a consonant to break up the ego– and the –ism. The consonant here happened to be a –t-; WordReference.com Dictionary of English  (which I believe is OED material) thinks it was by analogy with despot-ism.

There is a recherché distinction that some people have made between egotism and egoism in English: egotism is a bad thing, egoism isn’t. But that distinction is pretty much made up, and noone really bothers with it any more.