Is there a graph showing the percentage of usage covered by the X most common words in a language?

I think what you are asking for is graphs illustrating Zipf’s law. Google that.  The links at the bottom of the Wikipedia page give graphs from various languages using online corpora.

Regrettably the Zipf’s Law topic here doesn’t have any content yet.

Not sure the graphs would look essentially different, whatever the register of language is: the tail drops off pretty steeply anyway, which is the point of it being a logarithmic distribution.

In Indo-European languages using a Latin alphabet, what’s up with these two letters “ch” that are pronounced (phonetics) so differently?

Roman alphabet digraphs were invented with the digraphs Latin used to represent Greek aspirated letters: <ch th ph>. So <ch> was available very very early on to languages using the Roman alphabet, to represent new sounds.

Palatal sounds are notoriously unstable phonologically: once /k/ goes to [c] (as it did in late Latin), it can then move on to any of [tɕ, tʃ, ʃ, s].

As a back consonant, <c> could be used to convey anything velar or palatal, or even palatoalveolar, given the possible targets of phonetic change for a fronted /k/.

  • So <ch> could end up being conscripted as something velar—like a velar fricative /x/, in German.
  • Or it can be used to mean that the velar is velar and not palatalised, like <chi> in Italian.
  • Or it can be used to mean something palatal instead of velar—like the palatal stop /c/  in Old French.
  • Or it can be used to represent any phoneme that the unstable /c/ ends up sounding like, including the palatoalveolar /ʃ/ in Modern French, or /tʃ/ in English and Spanish.

Can you say anything using a vocabulary of 100 words?

The claim of Natural semantic metalanguage is that you can with around 60. It was a party trick of Australian linguistics undergrads to speak in NSM; it becomes very stilted very quickly, but in principle you can define a lot of notions with a limited vocabulary, as the asker alludes to. NSM is of course a definition language, rather than a communicative language, but that seems to be what OP is after.

Basic English tries with 850 words, and xkcd’s Up Goer Five English that Robert Collins mentions seems to be of the lineage of Basic English.

Why do Greek and Cyrillic have different collation order than Roman alphabet?

The collation of Greek and Roman are pretty similar, as Philip said, once you factor out archaisms, and the tendency to insert new letters at the end of the alphabet.

The original Roman alphabet matches to the original Greek alphabet pretty well:

A Α
B Β
C Γ
D Δ
E Ε
F Ϝ
G —
—  Ζ
H Η
— Θ
I ~ J  Ι
K  Κ
L Λ
M Μ
N Ν
— Ξ
O Ο
P Π
Q Ϙ
R Ρ
S Σ
T Τ
U ~ V Υ
— Φ
— Χ
— Ψ
— Ω
X (Ξ)
Y (Υ)
Z (Ζ)

The Greek equivalents of F and Q fell out of use. J and V are variants of original I and U, and appended after them. W, when it developed, was a variant of V, and appended after it. Ζ, Θ, Ξ, Φ, Χ, Ψ, Ω were left out of the original Roman alphabet—although as it turns out, Χ in the western Greek alphabet corresponded to Ξ in the eastern, so it was in the right kind of place. X, Y, Z were imports from Greek, and stuck on the end.

The only real oddity is G and Ζ being in the same position. G was invented in the Roman alphabet as a variant of C; the theory is that it was slotted in where Greek Ζ used to be, precisely because Greek Z had dropped out after F. See G

Cyrillic patterns closely to Greek too, if you allow for variants of letters being inserted in place, and new letters being appended at the end.

See early Cyrillic alphabet:
А Α
БВ Β
Г Γ
Д Δ
Е Ε
ЖЅЗ Ζ
И Η
— Θ
І Ι
К Κ
Л Λ
М Μ
Н Ν
— Ξ
О Ο
П Π
Р Π
С Σ
Т Τ
Ѹ Υ
Ф Φ
Х Χ
— Ψ
Ѡ Ω
ЦЧШЩЪЫЬѢЮѤѦѨѪѬѮѰѲѴҀ

Б was inserted as a variant of Β. Ж and Ѕ were inserted as variants of З = Ζ. Θ, Ξ, Ψ were left out as unnecessary to Slavic, though there were then re-appended at the end, to transliterate Greek:  ѮѰѲ. In fact the very last letter appended, Ҁ, was appended for Greek numerals: ϟ, koppa (which earlier looked like Ϙ).

What language is spoken in Athens, Greece?

To add to the other answers, and to answer a slightly different question 🙂 : between the 1300s and the 1800s, the region *around* Athens was substantially Albanian-speaking (Arvanitika). That’s why the map Brian Collins included in his answer has a patch of white. (A friend of mine once called that patch of white the αλβανότρυπρα, “the Albanian hole”.)

In fact, the village of Athens itself had two languages: Old Athenian, an archaic dialect of Modern Greek related to the dialect of Mani; and Arvanitika. Which is where the district of Plaka got its name from, from the Albanian pllakë, “old”: it’s the old town.

Any Jews living in Athens would have spoken Jewish Greek (Yevanic language). Romani would presumably also have been spoken—although speakers of the Agia Varvara variant of Romani, which is famous for having been studied linguistically [A Glossary of Greek Romany As Spoken in Agia Varvara (Athens)], are refugees from Turkey. The Muslims of Athens, I assume, spoke Turkish: I’m not aware of mass conversions of ethnic Greeks to Islam there, as had taken place in Crete.
 
Of course once Athens became the capital of Greece, both Old Athenian and Arvanitika were wiped out by the influx of speakers of what was to become Modern Standard Greek—a mixture of Peloponnesian and Katharevousa.

What is the most interesting grammar in Lojban?

The harder bits. 🙂 In particular:

* The fine differentiations in aspect and tense, including Lexical aspect (achievement, event, accomplishment, state). Hard to speak, not sure how successfully they’ve been taken up, but fascinating.
* The abstraction particles: Events, Qualities, Quantities, And Other Vague Words: On Lojban Abstraction. Even more fascinating, even harder to speak.
* Raising (linguistics) constructions, and the eye opener of how prevalent they are in natural languages: Events, Qualities, Quantities, And Other Vague Words: On Lojban Abstraction [tu’a, jai]
* The genuinely eccentric 🙂 approach to anaphora
* Like James Wood said, the open-ended case roles/prepositions, and their kaleidoscopic version of case grammar
* The argument structures of compounds (lujvo), which was more case grammar, and which I may have had a role in advocating: seljvajvo – La Lojban
* The text structure markers, which I found neat, at least in writing 🙂 Putting It All Together: Notes on the Structure of Lojban Texts

The attitudinals and tenses, actually, not so much: they were quite attention grabbing, so I didn’t find them as appealing. 🙂

The logical connectives were most useful to me in demonstrating how little  natural language connectives have to do with truth-conditional logic. 🙂 If is seldom translated successfully as ganai.

How did your life change after learning Lojban?

Gave me a podium to be a language pioneer for a little while. I gather I am still revered in some circles as the first fluent speaker. 🙂

Gave me a severe sequence of intellectual challenges at a time when I needed it; helped me sharpen several of my skills, including writing in English. 🙂

Made me a linguist and not just a linguaphile. The emphasis of Lojban on formal semantics (not just formal logic) made me more acutely aware of language structure, and drove me to study it formally—even if what I ended up studying was rather fluffier (a common outcome for those who come to Linguistics from Computer Science).

How do I convert names from English to Lojban?

The rules are here: The Shape Of Words To Come: Lojban Morphology

The summary of the rules are:
* End in a consonant
* Start in a pause or consonant
* Do not include “la”, “lai”, or “doi”, since they are particles introducing names
* Come close to Lojban phonology
* Allow stress not to be on the penult, but indicate it by capitalisation.

Because of the American basis of the Lojban community, you will see djan /dʒan/ more often than djon /dʒon/ for John.

-s is the default final consonant, though some have used other defaults, including -j or -x —because they are less common.

English being schwa-rich, you will see lots of <y> in English names. So Melissa [mɛˈlɪsə] > melisy + s > .melisys. Washington [wɑʃɪŋtən] would be
.UAcintyn.

Are there any short expletives that sound the same in different languages?

Nick Enfield [Page on sydney.edu.au]  (who I did linguistics with, and boy does he look different twenty years on) just got an Ig Noble [Improbable Research] for claiming the universality of Huh? (The Syllable Everyone Recognizes, Is ‘Huh?’ a universal word?)

Of course the realisation of Huh? does differ by language; in the Mediterranean, for example, it is E? But the general idea is a mid vowel (as close to a schwa as your language allows), with a questioning tone.

I’ll note anecdotally that the Greek for Ouch! is ox! or ax!—but that because of cartoons, Greek kids now spontaneously say ouch! (I heard my young cousins do it twenty years ago.) Even though /tʃ/ is not even a phoneme of standard Greek. So short expletives, discomfitingly, can be borrowed between languages, just as everything else can.