How To Show Two Languages Are Related

Interesting little graph here from an unpublished paper by Stefan Georg. Now according to linguistic consensus, Eskimo-Aleut and Uralic are simply not related. They have never been proven to have been related. Uralic is a group consisting of Finnic (Finnish and related tongues), Ugric (Hungarian and related languages) and Samoyedic (a variety of different languages stretching from the Urals far into Siberia. Uralo-Eskimo does not exist. It is the author’s name for a hypothetical language family intended to show the probable genetic relationship going on here.

Below is the paradigm for personal possessive suffixes in both groups. Look how well they line up. This is the sort of thing we look for when we try to see if two languages are related. For one, personal pronouns and their derivatives are rarely borrowed between languages. For another thing, entire sets such as listed below, which are called paradigms, are almost never or never borrowed. Morphology is also not borrowed much. Entire paradigm sets of suffixal morphology in personal pronouns is typically considered prima facie evidence of a genetic relationship between tongues. Here we have an entire paradigm of pronoun morphology between two supposedly unrelated language families lining up almost perfectly. The skeptical argument is that this paradigm could have been borrowed. You know what? That didn’t happen. Getting down to brass tacks, there is no way to explain charts like below other than genetically.

      Uralo-Eskimo         Samoyedic         Eskimo-Aleut
     Singular Plural    Singular Plural    Singular Plural
1sg  -m       -t-m      -mǝ      -t-mǝ     -m-(ka) -t-m-(ka)
2sg  -t       -t-t      -tǝ      -t-tǝ     -n/t    -tǝ-n/t
3sg  -sa      -i-sa     -sa      -i-sa     -sa     -i-sa
1pl  -mǝ-t    -n/t-mǝ-t -ma-t    -t/n-ma-t -mǝ-t   -mǝ-t
2pl  -tǝ-t    -t-mǝ-t   -ta-t    -t-ta-t   -tǝ-t   -tǝ-t
3pl  -sa-t    -i-sa-t   -i-to-n  -to-n     -sa-t   -i-sa-t

The problem with historical linguistics is that it has gotten away from its roots. Typically languages were determined to be related through simple observation. Later on, efforts at reconstructing the ancient proto-language with possible sound laws and regular sound correspondences can be done. This is what Sir William Jones did when he announced the discovery of the Indo-European language family at a speech to an academic society in India in the late 1700’s. No one had done any reconstruction at that time and to this day, there are many problems with the reconstruction of Proto Indo European to say nothing of lesser known large families.

What happened was the reconstruction crowd took over the field and historical linguistics became much more conservative. First you had to do reconstruction and find cognates and regular sound correspondences, and then and only then could two languages be shown to be related. This was not so much true with obviously closely related languages but surely it was the case with the larger macrofamilies. This became known as “the comparative method” and to this day, it remains supreme in our silly field of linguistics.

This is how it works.

  1. Determine that the languages are related. First via observation, you look at a group of languages and determine them to be related by finding such dead giveaways as the paradigm above.
  2. Reconstruct. Later, often much later, you reconstruct the proto-language that they descended from and try to find cognates and regular sound correspondences.

The new Comparative Method Conservatives do it like this:

  1. Reconstruct. First you reconstruct the proto-language that a number of possibly related languages descended from, hopefully with regular sound correspondences.
  2. Determine that the languages are related. Then and only then can a group of languages be said to be related.

The new way is ass-backwards, and in recent years, we have not been discovering many new language families due to the conservatism of this silly approach.


Georg, Stephan. 2001. Cross-Bering Comparisons. Unpublished paper. (presented at Leiden University).

Constituents Exercise

John, having seen Jack’s blue bat, painted his bat blue.

Break that sentence into three constituent clauses, and represent each of them as a complete sentence. See you in the comments.

Shakespearean English in the Original Accent

I definitely could not understand all of that. I think maybe I got ~78%. Sure you can understand a lot of it but definitely not all.


600-650 Years of Linguistic Separation

Sounds something like this.

That is from The Canterbury Tales. They were written around 1390, which is about 620 years ago. I do not know about you guys, but my intelligiblity score of Middle English was 5%. I think there might be around 100 words in that sample, not sure. Middle English is quite simply not the same language as Modern English. It’s a different language altogether.

So if languages are split for 600-650 years, they may only have 5% intelligibility. That is if they do not continue to have connections with each other. If they continue to have linguistic connections with each other via speaking together and living in the same vicinity as the other tongue, the score can be a lot higher.

For instance, Scots separated from English ~500 years ago but I can get a lot more of Scots than I can of Chaucer. My intelligibility of Modern Scots is ~40%. But you see, Scots and English continued to be in regular contact. If Scots had taken off to Sweden or someplace like that, the score might be a lot lower. Scots’ continued interaction with English slows the rate of differentiation between tongues.

So after 500-650 years linguistic separation, you should have separate languages, and intelligibility may only be 5-40% (average 22%).


What Shakespeare Sounded Like

Quite possibly something like this.

How much of that could you understand? Frankly, I found him a bit hard to listen to, but after I decoded his speech somewhat, I could understand him a lot better. I got 91% intelligibility in the first half of the recording, which might be about 300 words.

That is a modern Bristol accent from 1932. Honestly, I found him rather hard to understand. If you listen to it in bits and pieces you understand it better than if you listen to one long flow. Also some words that you don’t understand the first time you get the second time.

The Universal Language Is Here.

The universal language is called “Broken English.”

Well at least we can all communicate now.


Romance Languages and Latin

A linguist named Mario Pei undertook a study of Romance languages to determine how far they had deviated from Latin. This is what he came up with. Lower scores means closer to Latin and higher scores means further from Latin:

Sardinian  8% 
Italian    12% 
Spanish    20% 
Romanian   23.5% 
Occitan    25% 
Portuguese 31% 
French     44%

I had always heard that Sardo was like Latin frozen in time. Italian is also said to be quite close to Latin still. In fact, it is from this land that Latin emerged in the first place. Spanish has deviated quite a bit, but I am not certain why that is. For one thing, quite a bit of Arabic has gone into Spanish. As far as other influences, I am not sure. There are influences from pre-Latin languages, but I am not sure how significant they are. The impact of Basque (which would be included under pre-Latin influences, is also not known, but it has effected Aragonese and Aranese.

Romanian has obviously been flooded with Slavic words.

Occitan is also different, but this is probably due to the French influence as Occitan is sort of a Spanish-French hybrid language like Catalan.

Portuguese is also very different, but I am not sure why that is. Clearly the Portuguese vowels have gone crazy, but why is that? Brazilian Portuguese had influence from Indian languages, but that did not affect European Portuguese.

French is the most different of all. The odd vowels appear to originate from a Celtic base (Gaulish). In addition, quite a bit of Germanic has gone in via the Franks and there was a strong Norse influence in the far north. Basque and Breton influences are not known. It is due to this strong differentiation that other Romance language speakers say that no one can understand the French.


The Roots of English

Here is a little quiz for you.

1. Looking at the 2,000 most commonly used words in English, what language represents the greatest percentage of that total? Extra points if you come close to telling us what the percentage is.

Language A:

2. Looking at a dictionary, name the three languages that have contributed the most words to the English language as a whole. Extra points for roundabout percentages, but honestly, all three have contributed close to the same % of words.

Language B:

Language C:

Language D:

Note that Language A may also be one of the languages in Languages B-D.


Scots Texts

Here are some texts in the Scots language. I am getting really tired of people who keep insisting that this is just a dialect of English. And I bet if you heard it spoken you would understand even less than you do when it is written. Written down, you can make sense of some of it by figuring out the words. Good looking doing that when it’s spoken.

Embro to the Ploy (Robert Garioch 1909 – 1981)

The tartan tred wad gar ye lauch;
nae problem is owre teuch.
Your surname needna end in –och;
they’ll cleik ye up the cleuch.
A puckle dollar bill will aye
preive Hiram Teufelsdröckh
a septary of Clan McKay
it’s maybe richt eneuch,


In Embro to the ploy.

The Auld High Schule, whaur mony a skelp
of triple-tonguit tawse
has gien a heist-up and a help
towards Doctorates of Laws,
nou hears, for Ramsay’s cantie rhyme,
loud pawmies of applause
frae folk that pey a pund a time
to sit on wudden raws,

gey hard

in Embro to the ploy.

The haly kirk’s Assembly-haa
nou fairly coups the creel
wi Lindsay’s Three Estatis, braw
devices of the Deil.
About our heids the satire stots
like hailstanes till we reel;
the bawrs are in auld-farrant Scots,
it’s maybe jist as weill,


in Embro to the ploy.

From Hannlin Rede [yearly report] 2012–2013 (the Männystèr o Fairms an Kintra Fordèrin, 2012)

We hae cum guid speed wi fettlin tae brucellosis, an A’m mintin at bein haleheidit tae wun tae tha stannin o bein redd o brucellosis aathegither. Forbye, A’m leukkin tae see an ettlin in core at fettlin tae tha TB o Kye, takkin in complutherin anent a screengin ontak, tha wye we’ll can pit owre an inlaik in ootlay sillert wi resydentèrs. Mair betoken, but, we’ll be leukkin forbye tae uphaud an ingang airtit wi tha hannlins furtae redd ootcum disayses. An we’r fur stairtin in tae leukk bodes agane fur oor baste kenmairk gate, ‘at owre tha nixt wheen o yeirs wull be tha ootcum o sillerin tae aboot £60m frae resydentèrs furtae uphaud tha hale hannlin adae wi beef an tha mïlk-hoose.


Intelligibility Figures for Romance Languages

Here is some new work I did on mutual intelligibility in the Romance family. If you speak any of these languages, feel free to chime in. The one figure I am worried about is 0% of Italian understanding of Romanian. One informant said that, but I have a feeling it is higher than that.

Intelligibility Figures for Romance Languages

Intelligibility for Spanish speakers, oral: 80% of Asturian, Aragonese and and Extremaduran, 78% of Galician, 62% of Catalan, 50% of Portuguese, 25% of Italian, 6% of Romanian, 1% of French, and 0% of Sicilian.

Spanish has 95% written intelligibility of Ladino, 93% of Galician, 87% of Catalan, 78% of Portuguese, 50% of Italian and Romanian, and 16% of French.

Catalan has 94% oral intelligibility of Valencian, 63% intelligibility of Belearic, 27% of Italian, 5% of French.

Catalan has 27% written intelligibility of Italian.

Asturian has 82% oral intelligibility of Mirandese and 71% of Portuguese.

Mirandese has 82% oral intelligibility of Asturian and 71% of Portuguese.

Portuguese has 95% oral intelligibility of Almedilha dialect, 86% of Galician, 71% of Mirandese and Asturian, 58% of Spanish, 40% of Hermisende dialect, 55% of Catalan, 25% of Leonese and Italian, 17% of French, and 5% of Romanian.

Portuguese has 90% written intelligibility of Italian.

Galician has 58% intelligibility of Catalan, and 0% of Extremaduran and Andalucian Spanish.

French has 30% oral intelligibility of Catalan, 27% of Portuguese, 16% of Italian, 13% of Spanish, 7% intelligibility of Romanian, and 0% of Sicilian.

French has 90% written intelligibility of Catalan and 70% of Portuguese.

Romanian has 70% oral intelligibility of Istroromanian, 40% of Italian, 25% of Spanish, and 15% of French and Portuguese.

Romanian has 60% written intelligibility of French, 45% of Galician and Piedmontese and 33% of Italian.

Italian has 40% oral intelligibility of Catalan, 16% of Portuguese, 11% of French, and 0% of Romanian, Arpitan and Sicilian.

Italian has 75% written intelligibility of French and Spanish, 25% of Portuguese, and 20% of Catalan.

Piedmontese has 0% intelligibility of Arpitan.


