The Decalogue of Nick #2: I’ve trained as a linguist, and I have done computational linguistics stuff

For Audrey Ackerman and Brian Collins and Zeibura S. Kathau.

Ask a Greek what they can tell you about Byzantium, and they won’t tell you what the millennium of the East Roman Empire achieved. They won’t tell you about the Palaeologan Renaissance, or the ambivalence about the Classical past, or the edifices of Roman Law, or the architectural marvels.

All they’ll tell you is that the Empire fell. And they don’t even pick the right instance when it fell. (The Empire at 1453 wasn’t worth saving.)

Well, so it is with my linguistics career. Scratch me just a little bit, and I will lament the defining woe of my life, that I did not become a professional linguist: Nick Nicholas’ answer to What is your personal experience with obtaining a linguistics degree? (If I’m feeling prudent, I’ll admit that the outcome was the right one: Nick Nicholas’ answer to What are your 3 worst mistakes? Would you fix any of them if you could go back in time? But only if I’m feeling prudent.)

What’s harder for me to do, as a glass half empty kinda guy, is admit how much I gained in the experience.

Linguistics gave me a sense of purpose when I had none. Linguistics gave me friends and companionship and stimulation. Linguistics gave me the place where I could act as glue between my peers—that trait that Clarissa Lohr continues to find approval-worthy in me. Linguistics gave me the opportunity to teach—and once I’d gotten to teach, nunc dimittis: I could have died a happy man, even if it was just three semesters.

And in truth, Linguistics gave me the opportunity to turn away from it, to say that no, I deserved better than to be strung along, and to regain myself even at the cost of losing myself.

I sidled into linguistics and out of electrical engineering towards the end of my undergraduate degree. Through a masters I did enough of the bridging undergraduate courses that they let me in to the PhD programme. They recognised, I guess, that I had some talent.

The Master’s was in discourse theory: Rhetorical Structure Theory, to be exact. Discourse structure and implicature was my early fascination in linguistics, and RST was all the rage back then (1994) in text generation. (Do people even call it text generation any more?)

When time came for me to pick a PhD topic, though, I wanted to go back to historical linguistics. Actually I wanted to go forward from discourse theory to historical linguistics: grammaticalisation gave a way for implicature to motivate language change that intrigued me. While I initially wanted to work on Tibetan (because Squiggles), I had a conversation with the doyen of grammaticalisation theory, Elizabeth C. Traugott, during which she asked, wouldn’t it be fascinating to look at how grammaticalisation interacted with diglossia.

Two years in to my thesis, I’d worked out that no, that was a dumb idea. But I was too far into my work by then. My topic was the development of the modern relativiser pu, and how it had diversified in meaning. I’d intended to work backward, and unearth all sorts of awesome instances of implicature and analogy from Early Modern Greek.

But theses don’t go as you’d planned. On the way towards internal reconstruction, I became captive to the diversity of Greek dialect—nature’s historical linguistic laboratory: they have a common starting point with dozens of divergent endpoints, so you can get an amazing sense of what is possible in language change. By the time I’d worked out what was happening in all the dialects, and detoured into what was happening in the rest of the Balkans, I’d run out of time. And space: the Balkan chapter ended up on the cutting room floor.

From the thesis, I’d gained an encyclopaedic familiarity with modern Greek dialect; a good knowledge of Early Modern Greek anyway (which I put to use in my later coauthored monograph, An Entertaining Tale of Quadrupeds); and a smattering of Balkan linguistics. I’d planned to use my knowledge to write a reference grammar of Early Modern Greek by the time I was 50. That isn’t happening; and the guys who were working on it (Greek Grammar to fill the gap) have run out of funding and have retired.

What I did not get is any Ancient Greek; I don’t have any formal training in that, although once you’re a linguist, you can make sense of a grammar book just fine. (And I picked up what I needed to later.)

I also picked up a fair bit of linguistic typology from a decade of working as a research assistant, mainly under John Hajek. It was a rocky relationship, as you can well imagine from someone with my ego in a second fiddle role. But it was a good schooling too. And working on a phonological survey of Papua New Guinea, I got at least some of the phonology I did not get from the department.

I wrote a bunch of papers after I finished the PhD. Some got published. Some got submitted at the time journals got switched over from paper to electronic submission, and got lost in the mail. It was fun to write the papers; but it was also writing in a vacuum. I didn’t really have a network of peers to care about what I was writing (part of the problem of not being in Europe), and the problems I was working on seem to have been too obscure to have stimulated any interest anyway. In fact, the papers that generated the most interest were about social history (the Greek colony in Corsica). I have 8 finished unsubmitted papers, and 8 more incomplete, from when I stopped writing in 2008. I’m not strongly motivated to do anything with them.

I got more interaction, if anything, out of the Ἡλληνιστεύκοντος blog I used to do (and will do again, if Quora disappears in a puff of smoke). And some of my favourite questions on Quora are when I do my own detective work, to solve a linguistic problem I don’t already know the answer to.


For Amy Dakin.

I have also done some computational linguistic stuff. Most of it has been at the Thesaurus Linguae Graecae, where I had worked from February 1999 through to June 2016.

I’ve been reluctant to go publicly into the specifics of why I’m no longer employed there, until now. But then again, my time at the TLG should not suffer the same fate as Byzantine History: what I achieved (what *I* achieved) is more important than the way I ceased to.

I have very high regard for my fellow programmer of 13 years Nishad Prakash; and if anything even more regard for my fellow programmer of 4 years, John Salatas, who is still working there. They are far better craftsmen than I am. And I don’t mean to take anything away from their achievements by what I’m about to say.

But anything you see at the TLG that involves linguistics? Me. Anything that involves stylometrics? Me. Anything that involves Natural Language Processing? Formatting? Peculiar sigla? Comparison of texts? Me.

There’s a lot of computer science things that I’m proud of working out while there. Some algorithmic refinements to recursive Longest Common Subsequence detection, to work out common phrases between passages. Some fiendish DFA and NDFA work, to deal with the quirky ASCII encoding of Greek we have in character-by-character and wildcard search. A lot of cleverness in contextual grammatical disambiguation, that I’m not confident will ever see the light of day (or will be highlighted for users if it does).

And my crowning work: the morphological analyser of Greek. It originated in Perseus’ Morpheus, but I have stretched and pulled and broadened and narrowed and reranked it over the past 15 years, to deal with all the stages of Greek the TLG has thrown at it, from Homer through to misspelled 17th century Cretan land deeds—and to still yield some semblance of order. In the process, I dare say I have developed as intuitive a sense of what grammatical wackiness Byzantine authors could indulge in as anyone living: I’ve had to deal with it all.

I didn’t get to write the reference grammar of Early Modern Greek. But the morphological analyser I curated, with all the proper names of Athenian courtesans and Albanian chieftains, of Egyptian decans and minor saints, with all the mangled Byzantine optatives and grammarians’ fictional conjugations, with every last utterance of Sappho accounted for, and as much of Theodore Metochites’ as I could disentangle: that has been just as great an achievement.

Which I now no longer can contribute to.

But those of you with access to a TLG subscription: click on some words’ analyses, and do some parallel text comparisons, and look up some of the online lexica. And taste some of the joy of the Greek language and the Greek literary corpus, that I got to savour in my time.

And you’re welcome.

What do linguists think of the movie Arrival?

You have waited a long time, Hansolophontes, for me to answer this A2A. I did not read any spoilers. I did not read any of the other answers (which may make this look silly this late).

I finally watched Arrival last night. Very well made movie: great sense of atmosphere, and fear, and awe. I was annoyed at the plot twist: it’s annoying and cheap whenever it shows up in science fiction (it was a letdown whenever it was used in Star Trek). But given that it was going to happen, I have to say, it was handled poetically by the movie. As long as you don’t think about the plot holes (and associated plot laziness) too closely.

What did I think about it as a linguist?

  • They fast-forwarded the best part, how Louise worked out the language past the first two words. They got the start of the process, but not the heart of the process. But that’s OK: not many people would have found it cinematic.
  • The start of the process of working out the aliens’ language was beautifully handled.
  • The whole non-linearity thing about the aliens’ language? Shoehorned in to connect to the plot twist. It wasn’t explained so as to make sense: all I could see was a circle with a bunch of words in it, I wasn’t persuaded there was anything intrinsically non-linear going on.
  • Movies with any degree of complexity have an obligatory whiteboard scene. The whiteboard scene was well done: the questions Louise was raising about what basic concepts they had to establish were rattled through rather quickly, but they all made sense, and were well thought through.
  • The derision of Ian wanting to talk to the aliens in maths was silly. And mercifully, the guys in Australia did not think it was silly. It’s been accepted for decades that if you want to confirm alien sentience, you use maths that does not occur in nature. (Although that means primes, not the Fibonacci sequence.)
  • What sort of a linguist was she? There’s hints she’s an historical linguist (she knows some Sanskrit, and she knows both the anecdote about kangaroo = “I don’t understand”, and the fact that when someone bothered to record the language where Cook landed, it turned out not to be true). But what historical linguist has a photo of fricking Chomsky at her desk? Chomsky is a big part of the reason why historical linguists don’t get jobs.
  • That was probably a freshman lecture on linguistics; the textbook she was cradling certainly looked like Linguistics 101. But what Linguistics 101 course dedicates a whole lecture to Portuguese (outside the Lusosphere)? And who the hell explains Portuguese by saying that the mediaeval Galicians thought language was art? You don’t say Onde é o banheiro? in Portuguese as an act of art.
  • I’m amused that Ian brought up the Sapir-Whorf hypothesis, and Louise didn’t immediately start guffawing. My own opinion is that there is a little bit (a little bit) to the hypothesis. But derision of Sapir-Whorf within linguistics is universal, and in fact is something of a shibboleth: it is ideologically driven because of how linguistics currently thinks of language. It’s only non-linguists who take Sapir-Whorf seriously.
  • Oh, the army guy dumping the tape recording and saying “translate this”?! Come on. Even grunts aren’t that silly…

The director has obviously talked to linguists, and has certainly read up on linguistics. The details (such as the university setting) did look to have come from someone who wasn’t that clear about how university linguistics actually works. The linguistically challenging bits were swept under the carpet. But the core scenes about establishing communication (including the whiteboard scene) were right.

OK, now to read what everybody else said…

Was Greece created by Germany?

Minority view here, and I’m astonished noone’s picked up on it.

The Modern Greek state was established in 1829; and while Greeks like to think they won the Greek state with their sword, the Greek War of Independence had pretty much been quelled by 1827. It was the Great Powers’ intervention at the Battle of Navarino that guaranteed an independent Greek state, because the Great Powers thought that would be handy. Don’t take my word for it: First Hellenic Republic – Wikipedia.

The Great Powers were Britain, France and Russia. The major political parties of independent Greece, for decades, were the British party, the French party, and the Russian party. There was no Germany, and Germany did not create independent Greece.

There was, however, Bavaria, and the Kingdom of Greece was established in 1832 under a Bavarian king, Otto of Greece. Otto brought Bavarian administrators with him, and they ran the country for the first five years of the kingdom. Per Wikipedia:

During the early years of his reign a group of Bavarian Regents ruled in his name, and made themselves very unpopular by trying to impose German ideas of rigid hierarchical government on the Greeks, while keeping most significant state offices away from them. Nevertheless, they laid the foundations of a Greek administration, army, justice system and education system. […]

The Bavarian Regents ruled until 1837, when at the insistence of Britain and France, they were recalled and Otto thereafter appointed Greek ministers, although Bavarian officials still ran most of the administration and the army. But Greece still had no legislature and no constitution. Greek discontent grew until a revolt broke out in Athens in September 1843. […] Power then passed into the hands of a group of politicians, most of whom had been commanders in the War of Independence against the Ottomans.

So Germany did not create Greece; but the Kingdom of Greece was certainly initially set up by Bavarians.

How offensive is the word “cunt” in Australia?

Just to round off what others have said: yes, it is mostly a more vulgar counterpart of the Australian term bastard, and it almost always refers to men rather than women. (The reductionist misogynist use of cunt to refer to women is unknown here. I only discovered it a few years ago)

Just like bastard, if it is qualified by an adjective, it is typically informal, jocular, or dismissive, rather than outright offensive, in “lower” social contexts. (Australia does have classes, but it also has a lot of mobility between class registers: the new money millionaire can float between low and high class discourse. Old Money doesn’t, but Old Money isn’t as prominent as it used to be.)

Used on its own, though, it is still vicious. When someone called me a cunt because my dog crapped on his nature strip? He was getting ready to punch me, the roid rage rising to his head, the fists clenching; and cunt was the most hostile term he could spit out at me.

And you do have to judge your registers for appropriateness. There is a jocular, low register with ribbing and swearing and no actual harm done. But that’s not 24/7, even for the so-called lower socioeconomics.

How do I tell a girl she has a nice rack?

When I was 12, I found in my local library a copy of Brush up your pidgin.

It’s a textbook of Tok Pisin, the pidgin of Papua New Guinea, played for laughs. It is hardly a serious textbook: the protagonists are a clueless British missionary and his sex starved wife, the Tok Pisin is respelled to look more familiar to English speakers, it pokes fun (though not, from memory, sneeringly) at the local culture.

Even though it was played for laughs, I actually learned a lot from that book. You could tell, even from that book, that Tok Pisin is a language with its own internal genius, which is quite far removed from English — even if its vocabulary is deceptively English baby talk. It may well have gotten me started as a linguist.

The final dialogue of the book introduces an Australian pilot, who flies the couple into the interior. Up to that point the dialogues are bilingual, British English and Tok Pisin. With the pilot, Australian English is also introduced.

And the pilot sees fit to comment to the missionary’s sex starved wife as follows:

  • Australian English: Geez, you got a beaut pair of norks!
  • Tok Pisin: Mi laikim susu bilong yu.
  • British English: … You have a lovely blouse.

What are the differences between standard modern Greek and the Griko dialect?

I am delighted to be A2A’d this question.

There has been long-running, nationalistically driven, and tedious argument about how old the Greek dialects spoken in Southern Italy are, with to and fro from Italian linguists and Greek linguists, and with the great Romanist Gerhard Rohlfs kinda weighing in on the Greek side.

There is a significant difference between the Griko of Calabria, and the Griko of Salento. The Griko of Calabria, which is moribund, is much more obviously archaic: it has many more fossilised bits of Ancient Greek which only make sense if it was continuously spoken in place. The Calabrian Mafia’s heartland is in Greek-speaking territory, and its name, ‘Ndrangheta, sounds like something straight out of Sparta: Andragathia, “Manly Virtue”.

Salentino Griko, on the other hand, which is much healthier, is closer to Modern Greek both grammatically and lexically. My own pet theory is that Calabrian Greek is a continuation of Magna Graecia, while Salentine Greek reflects resettlement from Greece in Byzantine times. I’m not seeing much to refute that.

As for differences: if you picture Shakespearean English spoken in a Vaudeville Italian accent, you’ll be reasonably close.

  • Salentine Greek, at least, is just about mutually intelligible, though with a fair bit of difficulty.
  • A lot of the difficulty will be around the massive amount of vocabulary taken from Italian (and Calabrian/Salentino dialect)
  • Some of the difficulty will also be because Griko has become aligned to Italian phonotactics. No final -s anywhere. In most villages, no consonants alien to the Romance dialects: [θ, x, ɣ, ð] gone. Clusters alien to the Romance dialects gone: [ks] > [ts], for instance. Geminates all over the place (like in Cypriot, but unlike all other dialects of Modern Greek), and in fact the characteristic /ll/ > /ɖɖ/ of the Romance dialects.
  • The grammar is certainly archaic: the infinitive survives to a similar extent with Mediaeval Greek, after modals (telo pai ‘I want to go’ instead of θelo na pao ‘I want that I should go’). Participles are much more used as well.

Let me try out a parallel text. The Salentine lament on migration “My Man’s Gone Away” (Andra mu pai) was a hit in Greece in the 70s—which means it was mutually intelligible enough. To try and explain what’s going on, I’ll italicise the Italian words (which in fact Italian linguists routinely do), and I’ll boldface words that Greeks won’t recognise as too archaic. I’ll then give a calque into pseudo-Modern Greek (and bracketted Italian), so you can see the differences.

Klama (Andramu pai)

Telo na mbriakeftò.. na mi’ ppensefso,
na klafso ce na jelaso telo artevrài;
ma mali rràggia evò e’ nna kantaliso,
sto fengo e’ nna fonaso: o andramu pai!

θelo na meθiso (ubriacare), na mi skeftome (pensare)
na klapso ke na ɣelaso θelo tora to vraði
me meɣali orɣi (rabbia) eɣo θe na traɣuðiso (cantare)
sto feŋɡari θe na fonakso o andras mu pai.

I want to get drunk, not to think,
to cry and laugh is what I want tonight.
I will sing with great rage
I will shout to the moon: my husband is gone!

Fsunnìsete, fsunnìsete, jinèke!
Dellàste ettù na klàfsete ma mena!
Mìnamo manechè-mma, diàike o A’ Vrizie
Ce e antròpi ste‘ mas pane ess‘ena ss’ena!

ksipnisete, ksipnisete, ɣinekes!
elate eðo na klapsete me mena!
miname monaxes mas, ðiavike o ai vritsios
ke i anθropi stekun mas pane eks ena se ena

Wake up, wake up, women!
Come here and cry with me!
We have been left alone, the feast of St Britius has passed
And our men are leaving, one by one.

E antròpi ste‘ mas pane, ste’ ttaràssune!
N’arti kalì ‘us torùme ettù s’ena chrono!
è’ tui e zoì-mma? è’ tui e zoì, Kristè-mu?
Mas pa’ ‘cì sti Germania klèonta ma pono!

i anθropi stekun mas pane, stekun tarasune
na arti kali tus θorume eðo se ena xrono.
ine tuti i zoi mas, ine tuti i zoi, xriste mu?
mas pane eki sti ɣermania kleondas me pono

Our men are leaving on us, they’re going.
If things go well, we will see them back here in a year.
Is this our life? Is this a life, Christ?
They are going over there to Germany, crying with pain.

Linguistically speaking, are Swedish, Danish, and Norwegian different languages or dialects of a modern Norse language?

There’s one hiccup which I’m surprised other respondents have not brought up, Habib le toubib.

There are two standard languages of Norway, and a mess of dialects in between.

Norway used to be ruled by the Danish. The official language of Norway at the time it gained independence, Bokmål (“Book Language”), has been uncharitably described as Danish with a Norwegian accent. That was pretty much the language of Oslo. Given how bizarre Danish accents are (as others have pointed out), that makes Danish with a Norwegian accent quite different from Danish with a Danish accent.

But Norwegians resented their official language being Danish with a Norwegian accent. So Ivar Aasen, one of their language activists, went out to the fjords, recorded the West Norwegian dialects that were the furthest away from the hated Danish with a Norwegian accent of Oslo, mooshed them together, and came up with Nynorsk (“Neo-Norse”). So there are now two official languages of Norway.

Nynorsk advocates will still occasionally snarl that Bokmål is “Dano-Norwegian” (or if they’re being particularly bolshie, “Danish”; I red-lined that out of a colleague’s PhD thesis once). In practice: Bokmål has moved further away from Danish with time, and with some gentle nudging from the government. 10% or so of Norwegians claim to use Nynorsk, but in reality just speak their local West Norwegian dialect.

Are Bokmål and Danish dialects of a modern Dano-Norwegian language, then? Only if you’re being uncharitable and a Nynorsk activist. 🙂

Instead of creating Pinyin, why didn’t the CCP use IPA (International Phonetic Alphabet)?

Practical Roman alphabets do need to stick as close to ASCII as possible. Particularly before computerised typography, getting hold of letters outside the Latin-1 and Latin-2 repertoire (letters and standard diacritics) was painful, and you’d avoid it if you could.

So if you had a choice between

tʰiantɕʰi pu xao

and

Tianqi bu hao

… well, really, that’s not much of a choice at all, is it. Practicality is going to overrule the universality of the IPA, by far: once everyone agrees that <q> corresponds to /tɕʰ/, there’s no reason you have to stock up on those extra odd letters again. Linguists working with Chinese can certainly remember that much.

There was fine print in the history of Pinyin, involving previous transliterations and the initial attempt at a Cyrillic based transcription; but really, this was an issue of practicality, no less than Albanian picking <x> and <xh> for /dz/ and /dʒ/.

In fact, the only practical orthography that in any way significantly depends on the IPA is the Africa Alphabet and its successor the African reference alphabet, which is used for several African languages. And that involved inventing uppercase versions of a lot of IPA letters, because the IPA had never been used in a practical as opposed to scholarly function before 1928. Hence:

Ɓ Ɖ Ɛ Ǝ Ƒ Ɣ Ŋ Ɔ Ʃ Ʋ Ʒ

So yes, there is a capital schwa and a capital esh. Who knew!

How did the pre-Persian Semitic peoples of the Levant, Assyrian and Babylonian call the Greeks?

As OP clearly knows (by his “pre-Persian” restriction), the main Semitic name for Greeks, Yunan, derives from Persian contact with Ionian Greeks.

We know that the Hittites used the term Achiyawa to refer to what we reasonably guess were the Achaeans; that’s contact dating from Mycenaean times.

From Greek Contact with the Levant and Mesopotamia in the first half of the first millennium BC: a view from the East by Amelie Kuhrt, University College London – Academia.edu, it looks like Assyrians in the 7th century BC were already referring to Ionians. (See also Ionians.) I’d be surprised if the Babylonians knew the Greeks at all before then.

The really interesting question is what did the Phoenecians call the Greeks, before the Ionians settled Asia Minor. I’m not finding it online, and you know, there may well not have been an established term. If the Phoenecians had one, I’d have thought it’d have shown up in Wikipedia.