Category Archives: Comparitive

Massive Update of A Reworking of Chinese Language Classification

My Internet enemies (you know who you are) love to rip me to pieces over this stuff, but I suspect that is because they operate under the cover of anonymity plus the general loud-mouthed jerk “troll culture” of the Internet combines to provides a Linguisticus Sociopathicus that is seldom found in the hallowed halls of reserved academe.

The funny this is, if this Chinese work is so horrible, why has it earned praise from some of the world’s top Sinologists, who in fact actually assisted me with the project? Perhaps they should answer that. If I “know less about Linguistics than a Linguistics 10 student” then why do I sit on the review board of a peer-reviewed linguistics academic journal? Why did an 80 page paper of mine that will soon be published in a book make through two peer reviews and a dozen editors, including some of the world’s top Turkologists?

The funny thing is that I get along pretty well with other linguists outside of the Internet. We work together calmly, chat about this, that and the other, share papers and gather information from each other, all the things that academics do. I even get addressed as Dear Colleague. And then on the Internet, suddenly I’m so stupid I don’t know what a verb is. Whatever.

Anyway, a huge project of mine, A Reworking of Chinese Language Classification, has received a massive update. It underwent a ton of fixes, a lot of dead links were removed, and many matters were cleared up or explained better. Also the language count jumped by 200 from ~360 to 573. Now some of these may not be full languages and I may be exaggerating but I believe that using the 90% intelligibility criterion, there are a good 2,000 separate languages within Sinitic alone.

We simply cannot carve them out because the Chinese government will go crazy, and no Sinologist wants to make the Chinese government mad. The Chinese government lies and says there is one Chinese language with 3,000+ dialects in it, including such massive lects as Cantonese, Hakka, Min, Hui, Wu, Peng, Gan and Ji? Not to mention that Mandarin itself is of course not a single language but is actually a collection of scores or more languages inside of itself.

The project involves a brief description in English of the Chinese lects, stating such things as names, where they are spoken, the number of speakers, classification, degree of endangerment, linguistic history and development, classification issues, mutual intelligibility issues, dialects within, membership in language groups, the language/dialect question, anthropological history, sociolinguistic issues historical and modern, future trends, controversies, and sometimes more arcane linguistic data.

I am not trying to brag here and I am not real familiar with the literature, but my account of Chinese dialects is the most thorough such account I have ever run across so far in English. Now there may be better publications out there, but I am not aware of them. Further, most do not seem to have tackled the dialect vs. language problem.

Almost all of the good material on this stuff is in Chinese, and I do not read Chinese, so this caused massive problems, but I seem to be able to deal with them ok, as a lot of the research that I referenced was in Chinese and I am able to sort of make my way through it to get the gist of it despite the language barrier. I have also come up with a few native speaker informants who have given me excellent information on their particular lects. For instance, I recently ran into a speaker of something called Cambodian Teochew (I had no idea such a thing existed) who told me that the four SE Asian Teochew lects, Malay Teochew, Thai Teochew, Cambodian Teochew and Vietnamese Teochew, were not mutually intelligible. That is, there are four separate languages within Overseas Teochew alone! Unbelievable.

6 Comments

Filed under Asia, Cantonese, China, Chinese language, Comparitive, Dialectology, Government, Language Classification, Language Families, Linguistics, Mandarin, Regional, Sinitic, Sino-Tibetan, Sociolinguistics

Western Europe: What Native Languages Are Spoken in the Netherlands?

Montleek: Robert, is it possible that in Western Europe, the regional lects have been preserved better, while in eastern Europe are preserved worse? There was communism/socialism in Eastern Europe, therefore more tendency not to continue speaking with regional lect.

In the Netherlands, regional lects of Dutch Low Saxon, Limburgs, Dutch, Frisian, Low Dietsch and Southeast Limburgs are spoken.

Dutch is spoken in a bewildering variety of lects. There is nearly a separate lect in every village or city.

Limburgs is spoken a bit in the far south and there is a different lect in every town here too.

Dutch Low Saxon is spoken in the north and center of the country, once again as a different lect in every town. Whether this is really Macro-German or Macro-Dutch is not certain, but I would call it more Dutch than German.

Frisian is less dialectally diverse.

There are also very strange languages like Low Dietsch and Southeast Limburgs spoken in the far south. These are classification nightmares. After a lot of study, I concluded that these are neither German nor Dutch but actually something completely in between. With Southeast Limburgs and Low Dietsch, you also run into a the dialect in every town situation.

There area number of separate languages within Dutch in the Netherlands, probably over a dozen. There are three Dutch Low Saxon languages, but the situation is very confused and is almost a classification nightmare. There are probably 3-4 languages inside Frisian, though the vast majority speak the standard lect. There are probably two lects inside Limburgs. Southeast Limburgs and Low Dietsch are separate languages, though each seems to have a few languages inside of it.

Leave a comment

Filed under Balto-Slavic-Germanic, Comparitive, Dutch, Europe, Germanic, Indo-European, Indo-Hittite, Language Classification, Language Families, Linguistics, Netherlands, Regional

How To Show Two Languages Are Related

Interesting little graph here from an unpublished paper by Stefan Georg. Now according to linguistic consensus, Eskimo-Aleut and Uralic are simply not related. They have never been proven to have been related. Uralic is a group consisting of Finnic (Finnish and related tongues), Ugric (Hungarian and related languages) and Samoyedic (a variety of different languages stretching from the Urals far into Siberia. Uralo-Eskimo does not exist. It is the author’s name for a hypothetical language family intended to show the probable genetic relationship going on here.

Below is the paradigm for personal possessive suffixes in both groups. Look how well they line up. This is the sort of thing we look for when we try to see if two languages are related. For one, personal pronouns and their derivatives are rarely borrowed between languages. For another thing, entire sets such as listed below, which are called paradigms, are almost never or never borrowed. Morphology is also not borrowed much. Entire paradigm sets of suffixal morphology in personal pronouns is typically considered prima facie evidence of a genetic relationship between tongues. Here we have an entire paradigm of pronoun morphology between two supposedly unrelated language families lining up almost perfectly. The skeptical argument is that this paradigm could have been borrowed. You know what? That didn’t happen. Getting down to brass tacks, there is no way to explain charts like below other than genetically.

      Uralo-Eskimo         Samoyedic         Eskimo-Aleut
     Singular Plural    Singular Plural    Singular Plural
1sg  -m       -t-m      -mǝ      -t-mǝ     -m-(ka) -t-m-(ka)
2sg  -t       -t-t      -tǝ      -t-tǝ     -n/t    -tǝ-n/t
3sg  -sa      -i-sa     -sa      -i-sa     -sa     -i-sa
1pl  -mǝ-t    -n/t-mǝ-t -ma-t    -t/n-ma-t -mǝ-t   -mǝ-t
2pl  -tǝ-t    -t-mǝ-t   -ta-t    -t-ta-t   -tǝ-t   -tǝ-t
3pl  -sa-t    -i-sa-t   -i-to-n  -to-n     -sa-t   -i-sa-t

The problem with historical linguistics is that it has gotten away from its roots. Typically languages were determined to be related through simple observation. Later on, efforts at reconstructing the ancient proto-language with possible sound laws and regular sound correspondences can be done. This is what Sir William Jones did when he announced the discovery of the Indo-European language family at a speech to an academic society in India in the late 1700’s. No one had done any reconstruction at that time and to this day, there are many problems with the reconstruction of Proto Indo European to say nothing of lesser known large families.

What happened was the reconstruction crowd took over the field and historical linguistics became much more conservative. First you had to do reconstruction and find cognates and regular sound correspondences, and then and only then could two languages be shown to be related. This was not so much true with obviously closely related languages but surely it was the case with the larger macrofamilies. This became known as “the comparative method” and to this day, it remains supreme in our silly field of linguistics.

This is how it works.

  1. Determine that the languages are related. First via observation, you look at a group of languages and determine them to be related by finding such dead giveaways as the paradigm above.
  2. Reconstruct. Later, often much later, you reconstruct the proto-language that they descended from and try to find cognates and regular sound correspondences.

The new Comparative Method Conservatives do it like this:

  1. Reconstruct. First you reconstruct the proto-language that a number of possibly related languages descended from, hopefully with regular sound correspondences.
  2. Determine that the languages are related. Then and only then can a group of languages be said to be related.

The new way is ass-backwards, and in recent years, we have not been discovering many new language families due to the conservatism of this silly approach.

References

Georg, Stephan. 2001. Cross-Bering Comparisons. Unpublished paper. (presented at Leiden University).

If you think this website is valuable to you, please consider a contribution to support the continuation of the site. Donations are the only thing that keep the site operating.

Leave a comment

Filed under Comparitive, Eskimo-Aleut, Finnic, Finno-Ugric Languages, Indo-European, Indo-Hittite, Language Classification, Language Families, Linguistics, Ugric

600-650 Years of Linguistic Separation

Sounds something like this.

That is from The Canterbury Tales. They were written around 1390, which is about 620 years ago. I do not know about you guys, but my intelligiblity score of Middle English was 5%. I think there might be around 100 words in that sample, not sure. Middle English is quite simply not the same language as Modern English. It’s a different language altogether.

So if languages are split for 600-650 years, they may only have 5% intelligibility. That is if they do not continue to have connections with each other. If they continue to have linguistic connections with each other via speaking together and living in the same vicinity as the other tongue, the score can be a lot higher.

For instance, Scots separated from English ~500 years ago but I can get a lot more of Scots than I can of Chaucer. My intelligibility of Modern Scots is ~40%. But you see, Scots and English continued to be in regular contact. If Scots had taken off to Sweden or someplace like that, the score might be a lot lower. Scots’ continued interaction with English slows the rate of differentiation between tongues.

So after 500-650 years linguistic separation, you should have separate languages, and intelligibility may only be 5-40% (average 22%).

5 Comments

Filed under Comparitive, English language, Language Classification, Language Families, Linguistics, Literature, Scots

A Scots Lexicon

Here is a brief lexicon of some common words in the Scots language. The notion that Scots is a separate language from English frequently evokes howls of rage for all sorts of ignorant quarters. Whereas we calm linguists rarely get worked up about such things.

Look at that list below. Does that look like the English language? If someone came into your house and started talking to you using a lot of words like those, would you be able to understand them? How could you?

Obviously Scots and English are two separate languages. They split apart about 1500 for some reason. Anyone know why they might have split apart around that time? I do not.

a'thing      everything
ablo         lowest
adee         wrong
ae           one
ahint        behind
aiblins      perhaps
airselins    backwards
aisedom      leisure
anent        about, concerning
aneth        beneath
athort       across
atweesh      between
awfu         bursting
awgates      always
ay           always
ayont        beyond
bairnag      little
bairn        child
bann         curse
beard        bread
below        lower
ben          in
bide         live
birling      spinning
bittock      little bit
bosie        hug
bouat        lantern
boun         ready
bowk         retch
brae         slope
braw         fine, handsome
brawlies     splendidly
breeks       britches
brulzie      broil
buiner       upper
buinmaist    topmost
bummer       foggy
burnie       small
burn         stream
byken        wasps' nest
cast         drop
caumie       calm
caur         calves
chap         knock
Cheordag     Geordie
chield       fellow
claik        gossip
cludgie      toilet
clum         climbed
cowp         overturn
cuit         ankle
darg         work
daunter      saunter
dicht        wipe
dous         pigeons
dowp, dock   butt
dree         endure
dreich       dreary
dunch        push
een          eyes
endweys      straight ahead
evyte        avoid
Fa?          Who?
fair         very
Fan?         When?
fauchelt     tired
fauch        fallow
Faur?        Where?
feartie      coward
fell         kill
feth         faith
Filk?        Which?
fillie       long time
Fit?         What?
fly          cup of tea
fon          folly
forenicht    evening
forenuin     morning
forfochten   tired
fowkgates    culture
fuishen      fetched
futrat       weasel
Fy?          Why?
gaberlunzie  a beggar
gaed         went
gamie        gamekeeper
gate         street
gealt        cold
geylies      pretty well
girse        grass
gloamin      early morning
gnegum       tricky nature
grieve       overseer
gulsochs     sweets, cream cakes, donuts, caramels
haingles     influenza
hauflins     partly
hause        neck
heuch        cliff
hidlins      secretly
hooseockie   small house
hypothec     shebang
ilkagate     everywhere
ilkawey      everywhere
ingangin     reception
kent         knew
knapdarloch  dung knots in wool on a sheep's bottom
kye          cows
lavvy        toilet
ligaun       dusk, day
louns        boys
lown         calm
luif         palm
luitten      let
maistlins    almost
maunna       mustn't
maw          seagull
mayat        meat, food
Menzies      Mackenzie
muith        sultry
nether       lower
ngan         onion
onygate      anyhow
oo           wool
pad          path
piece        food
playock      toy
pooshun      poison
qoho         for whom
queans       girls
rax          stretch
raxt         reached
ream         cream
reive        steal
rhodie       rhododendron
ruise        praise
sark         shirt
scaith       damage
sheuch       ditch
skelpit      smacked
skelp        smack
sour rock    sorrel
spae         foretell
spate        flood
speir        inquire
speirt       asked
stank        a drain
steek        shut
stoursucker  vacuum cleaner
stroup       spout
sybae        onion
the hairst   autumn
the nou      at the moment
thir         these
thrang       busy
tint         lost
twaloors     midday
twalt        twelfth 
weeoors      twilight
wey          at times
whit wey     how
wifeockie    little woman
wyte         blame
yett         gate

5 Comments

Filed under Balto-Slavic-Germanic, Comparitive, English language, Germanic, Indo-European, Indo-Hittite, Language Classification, Language Families, Linguistics, Scots

A Reclassification of Many Common European Languages

Many common European languages are better seen as more than one language. I have been studying this issue for years, and this is some of my preliminary data. It is not yet in a publishable form, but it will give you some idea of the concepts that I am working with.

 

Kashubian

Really two separate languages as opposed to one.

North and South Kashubian are separate languages. Speakers in the north can’t understand those in the south.

 

Cimbrian

Really three separate languages as opposed to one.

Lusernese Cimbrian, Sette Comuni Cimbrian, Tredici Communi Cimbrian (Tauch). Based on structural and intelligibility differences, the three dialects could be considered separate languages.

 

West Frisian

Really three separate languages as opposed to one.

Schiermonnikoogs (Skiermuontseagersk) is an archaic West Frisian dialect, poorly understood by the rest of West Frisian, that is spoken on the island of Schiermonnikoog. It is actually spoken more in the north of Groningen than in Friesland.

It is in serious decline since WW2 due mostly to immigration from the mainland. The newcomers arrive speaking a West Frisian dialect from Groningen, Vastewal. There are only about 100 speakers left. However, many others speak a “weak” Schiermonnikoogs. Courses in Schiermonnikoogs have been popular since the 1960’s, and there have been a number of publications in the language.

Hindeloopers is an archaic West Frisian dialect, really a separate language, that is spoken on the SW coast of Friesland in the town of Hindeloopen. It has very conservative phonetics and vocabulary, much of it from Old Frisian. Hindeloopers is slowly becoming more like Standard Frisian due to increased exposure of its speakers to Standard Frisian and immigrants moving to the area. It is hard for other Frisian speakers to understand.

 

North Frisian

Really five separate languages as opposed to one.

North Frisian is four different languages as far as % cognates is concerned. Mainland (including Halligen Frisian), Öömrang-Fering, Sölring and Halunder/Heligolandic. Also, Hallig is not very intelligible with other mainland varieties like Mooring.

 

Manx Gaelic

Really a living language as opposed to an extinct one.

There are now 2,000 people who claim to speak Manx. Some are raising their children in Manx.

 

Breton

Really probably five or six separate languages instead of one.

Vannetais is a separate language. It is not intelligible with Leonard, another main dialect. Spoken in Brittany – the entire area of the department of Morbihan (with the exception of Belle Isle and regions around the Faouët and Gourin): Valves, Pontivy, Lorient, Plouay, Guémené-sur-Scorff, Baud, Auray, Quiberon, Sarzeau and the commune of Finistère Arzano.

Further, West Vannetais cannot understand East Vannetais.

Leonard is a separate language, not intelligible with Vannetais. Spoken in Leon (Leon or Bro Leon), the northern third of the department of Finistère (Brest, Morlaix, Plouguerneau, Landerneau, Saint-Pol-de-Léon, Landivisiau, Ouessant).

Leonard is about as far from Vannetais as it is from Cornouaillais. Intelligibility between Vannetais and Cornouaillais is not known.

Cornouaillais may be a separate language due to its distance from Leonard.

Groisillon, spoken in the Groix, is reportedly hard to understand for speakers of other dialects. It may be extinct, but more likely there are a few speakers left. Breton reportedly has 77 different dialects.

The new Neo-Breton taught in the schools often can’t be understood by traditional speakers because it is full of borrowings from Cornish and Welsh.

 

Asturian

There are two languages – Eastern Asturian and Central/Western Asturian instead of one.

 

Leonese

There are two languages – Eastern Leonese/Extremaduran and Central/West Leonese instead of one. Extremaduran is intelligible with Eastern Asturian.

 

Aragonese

Navarese is not really spoken anymore or it is just a Spanish dialect. Benasquesque/Ribacorgano is a separate language in between Aragonese and Catalan. Far northern and far southern Aragonese cannot understand each other.

 

Gascon

Apparently more than one language. Aranese is apparently a separate language.

 

Languedocien

Apparently more than one language.

 

Auvergnat

Apparently more than one language.

 

Limousin

Apparently more than one language.

 

Provencal

Apparently more than one language.

 

Walloon

Walloon is four separate languages instead of one.

East Walloon – Barvaux, Huy, Liège, Hesbaye Liégois, East Liégeois, Verviers, Malmédy. South Walloon – Marche-en-Fanenne, Bastogne, Neufchâteau, Saint-Hubert, Bouillon. Central Walloon – Basse-Sambre, Nivelles, Rochefort, Dinant, Namur, Charleroi, Beaumont, Chimay, Philippeville, La Louvière. West Walloon – East Brabançon, Jodoigne, Wavre, Hesbaye Namur, Gembloux, Sombreffe, Eghezée.

 

Francoprovençal

This is more than one language. It may well be up to an incredible 24 different languages or even more.

Dauphinois, Jurassien, Lyonnais, Savoyard, Vaudois, Valdotan and Piedmont and are the major dialects, and all are probably separate languages.

Franche-Comte, spoken in Neuchâtel, Vaud North, Pontassilien, Ain, Valserine is a separate language.

Faetar is a separate language from Arpitan. It split off in 1400 and has undergone heavy influence from Standard Italian and Apulian. It has 1,400 speakers in two towns, Celle and Faeto in Apulia in southern Italy. Language use is still vigorous even though most people in the towns are unemployed or retired. A few work in the fields.

Bressan has some internal diversity. The youngest speakers are about 60 years old now, but there are still dialect associations that promote it strongly. Bressan was the main mode of communication here until the 1970’s. Bressan itself is probably a separate language.

Forézien is now almost extinct. Forezien is apparently a separate language.

Geneva, Fribourgeois, Neuchatel, Valaisan and Vaudois are the dialects of Switzerland, and all of those are probably separate languages too.

Valais has some of the strongest dialectal differentiation in the entire Arpitan region. Valais is divided into two large languagesWest Valais spoken around Lake Geneva and East Valais spoken around Sion. Intelligibility is poor between the two poles.

In Valloire, Valmeinier and Valle Arvan at the far southern end of Savoyard, between St. Jean de Maurienne and Modane, a Savoyard dialect – Southern Savoyard – is spoken that is not intelligible with the rest of Savoyard. It is also different in Valloire, Valmeinier and Valle Arvan, but intelligibility among those three varieties is not known. Probably heavy influence of Occitan in this region. Possibly three separate languages here.

In Valloire, all persons over 60 use Arpitan as a daily language. St. Michel-Modana Savoyard is a separate language.

Valloire is a separate language. It is not intelligible with the dialect spoken in Albanne near St. Jean de Maurienne. Valmeinier, Valle Arvan and St. Michael de Maurienne also appear to be separate languages. The speech of Albertville and Chambery could be called South Savoyard. Dauphinois is still widely spoken in the villages around Villard de Lans south of Grenoble.

In the Savoyard area from Mt. Blanc to Geneva to Montreaux to Evian to Abondance, there is good intelligibility among dialects. This could be called North Savoyard. As one moves to the south, it gets harder to understand. North Savoyard and South Savoyard seem to be two different languages. In the Val d’Illiez area between Montreaux and Martigny, some Arpitan dialects are spoken that are very different from everything else.

 

Romansch

There are actually five or more separate languages instead of one. Each dialect is a separate language.

Upper Engadine: Puter, Lower Engadine: Vallader, Upper Rhine: Surselva, Lower Rhine: Sutselva, in between: Surmeiran. Romansh is actually 5 different languages, at least. Intelligibility is probably on the order of 80% or so, though testing might be nice.

Val Bregaglia/Valtellina Romansch (Bergajot) is an old Romansch dialect formerly widely spoken in the Val Bregaglia and Valtellina region of Italy. It is now only spoken by the elderly and a few younger people. It is mostly a mixture of Puter Romansch and Ladin with an overlay of Western Alpine Lombard Italian. It was the lingua franca in the region 100 years ago, but has since been replaced by Western Alpine Lombard Italian. Not intelligible with the rest of Romansch or with Italian. Some intelligibility of Ladin, some of Romansch, less of Ticinese Italian.

Bergajot is spoken in the Bregaglia Valley near Chiavenna and upwards towards Switzerland. It is more Italian than Puter Romansch, but Puter Romansch and Bergajot speakers can understand each other. This was probably the natural extension of Romansch to the south, but the language was never written down, and Italian was adopted as the written language, so what developed was a cross between Romansch and Italian.

Unknown whether Bergajot is a separate language or part of Puter Romansch.

 

Ladin

Ladin is a number of separate languages instead of one. Possibly 12 or more different languages.

Western Ladin includes Fassan, Gardenese, Novi, Nones and Solandro.

Fascian Ladin or Fassan Ladin: Spoken in Val di Fassa and variants in Moena and Canazei in the Fassatal Valley of the Dolomites. There are 8,620 residents, of whom 60-75% speak Lain as a mother tongue. There are two main varieties, Canazei Fascian in the upper valley and Moena in the lower valley. Heavy Italian influence. Fassan is Dolomitic Ladin. Spoken in Trentino Province.

Brach Fascian: Spoken in the center of the valley in Soraga, Pozza di Fassa and Vigo di Fassa. Intelligibility with Moena or Canazei is unknown, but may be nearly intelligible. Possibly not intelligible with Fiemmese Ladin.

Moena Fascian: Spoken in the lower part of the Val di Fassa. Canazei Fascian has problems understanding Moena Fascian. Spoken in Moena, Mazzin, Vigo de Fassa, Pozza and Soraga. Intelligibility with Fiemmese or Brach is unknown but may be nearly intelligible.

Gherdëina Ladin: spoken in Val Gardena or Gröden Valley, South Tyrol, by 8,148 inhabitants, 80-90% of the population. This dialect is close to German. Spoken in Bolzano, extremely protected. Gherdëina is described as “completely different” from Fascian, Anpezan and Cadore. Val Badia can understand Gherdëina but Fassa cannot. Part of South Tyrolean Ladin. Intelligibility between Gherdëina and Novi Ladin is unknown but probably good.

Nones/Solandro Ladin: spoken in Val di Non (as Nones) and with variations in different parts of the valley and the adjacent lower Val di Sole (as Solandro) in Trento Province just north of Trento and just west of Bolzano.

Nones has a lot of German words in it. Two different forms – Nones and Solandro or Solander. Solandro is spoken in Val di Sole, Val di Peio and Val di Rabbi (as Rabies). The last linguistic census of 2001 found that more than 7,000 residents in Val di Non and Val di Sole spoke Ladin. It is uncertain whether Nones/Solandro is a language of its own. Some say it is part of the Trentino language. Nones/Solandro is basically a Ladin dialect transitional to Trentino East Lombard. Often referred to as Anaunico Ladin. Val Badia and Fassa cannot understand Nones.

Intelligibility between Nones and Solandro is uncertain, but they are considered to be part of one language. There are two main dialects of Solandro, one in the lower valley and one in the upper valley. The lower valley has heavy Nones influence, and the upper valley is more conservative and has Celtic influences.

Lower Valley Solandro in the lower valley is spoken by 4,000 people in the towns of Caldes, Terzolas and Male and has heavy Nones influence.

La Montàgna Solandro is very conservative and very different. It is spoken in Termenago and Castello in Pellizzano and in Ortisé and Menàs in Mezzana. It is very conservative and has almost nothing to do with the valley dialects such as Pellizzano and Ossana.

Pellizzano-Ossana Solandro is spoken in the towns of those names and the two are very similar. This dialect resembles Eastern Lombard. Many miners came from Lecce and Como in the 14th Century to work in mines here, and this accounts for the Lombard influences on the lect. It is spoken by 500 people in Pellizzano and 800 in Ossana. May be intelligible with Vermiglio Solandro.

Rabies Solandro spoken in the Val di Rabbi is one of the most conservative forms of Ladin in existence.

Nones has 30,000 speakers, but there is some debate over whether it it Ladin or not. Solandro is also under question about whether or not it is Ladin. It has 15,000 speakers.

Central Ladin: (transitional to Alpine Venetian).

Val Badia-Marebbe Ladin (Maréo/Badiot Enneberg/Abtei): Gadertal and Val Marebbe (formerly in Val Luson and lower Val Badia), South Tyrol, by 9,229 inhabitants, 95% as their mother tongue. Mareo/Enneberg/Marebbe are three names for the Mareo version which is spoken in the lower valley. Badiot is spoken in the upper valley.

The language varies from town to town. Less Germanized than Gherdëina, probably the closest to a pure Ladin. Spoken in Bolzano, extremely protected. Maréo/Badiot is said to be “completely different” from Fascian, Anpezan and Cadore. Part of South Tyrolean Ladin. Intelligible with Gherdëina. Not intelligible with Fodom.

Fodom, Alta Val Cordevole, Buchenstein or Livinallese Ladin: spoken in the municipalities of Livinallongo Col di Lana, Colle Saint Lucia and Arabba in the villages of Cherz, Alfauro and Varda in Belluno by about 80 to 90% of the population as their mother tongue. Fodom has two very different dialects, one in the main valley, Livinallongo Col di Lana Ladin, resembling Val Badia and the other, Colle Saint Lucia Ladin, looking more Italian. Heavy Venetian and Italian influence. Considered part of Dolomitic Ladin. Not intelligible with Val Badia. Similar to Agordo Ladin Venetian.

Intelligibility with Anpezan is not known. Intelligibility with Rocchesano Ladin is unknown but may be good.

Eastern Ladin (transitional to Alpine Venetian-Friulian)
Near Belluno in Belluno Province.

In practice, Eastern Ladin except Anpezan is regarded as a separate language from Dolomitic Ladin.

Eastern Ladin – differences.

Anpezan, Ampezzo or Ampezzano Ladin: Cortina d’Ampezzo, Belluno. Similar to Cadore Ladin. Spoken in the Ampezzo Valley of the Dolomites. Heavy Venetian influence, but has many archaic qualities since it was under Austrian rule for 400 years – longer than the surrounding areas. Halfway between Ladin and Venetian. Anpezan is said to be “completely different” from Fascian, Maréo/Badiot, Gherdëina and Cadore.

Considered part of Dolomitic Ladin. Intelligibility with Fodom is not known, but Anpezan is not intelligible with Val Badia. Anpezan can understand Central Cadore, especially Oltrechiusano Ladin. Oltrechiusano and Anpezan form a sort of a grouping.

Central Cadore Ladin (Cadorino): Spoken in Valle di Cadore, Pieve di Cadore, Perarolo di Cadore, Calalzo di Cadore and Domegge di Cadore, except Comelico and Sappada, with Venetian influences. It is spoken in the Cadore all the way down to Perarolo di Cadore. Below Perarolo, it turns into Venetian. It is not uniform and differs greatly across the area. Pozzale Ladin is very archaic, with Oltrechiusano traits. Calalzo Ladin and Domegge Ladin are also archaic.

Pieve di Cadore Ladin, Tai di Cadore Ladin, Sottocastello Ladin, Valle di Cadore Ladin, Calalzo di Cadore Ladin, Domegge di Cadore Ladin, Ospitale di Cadore Ladin and Perarolo di Cadore Ladin have few speakers left. In these places, a variety of Cadore Venetian is now spoken. Sometimes included in Ladin and sometimes not.

Eastern Cadore Ladin (Cadorino): Spoken in Lozzo di Cadore, Vigo di Cadore, Lorenzago di Cadore and Auronzo di Cadore. More conservative than Central Cadore. The Laggio Ladin of Vigo and Auronzo is very archaic, similar to Comelico. This is apparently a separate language from Central Cadore.

Aurunzo di Cadore speaks Aurunzo Ladin, an Eastern Cadore dialect. Also spoken in Rizzio. The dialect of Aurunzo is very archaic, similar to Comelico. Aurunzo is very similar to Oltrepiavano, but it is very different from Comelicese. Oltrepiavano/Aurunzo di Cadore may be a single language.

Comelico, Comelicese or Comeliano Ladin: widespread in Comelico, Belluno. It is the most conservative of the Eastern Cadore dialects, even more conservative than Anpezan. Similar to Cadore but could also be confused with Friulian. The Comelico dialect could be divided into two sections: 1) Eastern Comelico: towns of Costalissoio, Campolongo, San Pietro di Cadore, Mare, Presenzio and Cosalta di Cadore; 2) Western Comelico: towns of Candide, Casamazzagno, Dosoledo, San Nicolò, Cosat, Parola, Danta, Santo Stefano, Campitello and Casta.

 

Friulian

Friulian may be up to five separate languages instead of one.

The tiny towns of Erto e Casso (dialects Ertano and Cassanese), Claut and Cimolais in Friuli Venezeia Giulia speak a Rhaeto-Romansch dialect that is transitional between Friulian and Ladin. Later it came under Venetian influence. Ladin was formerly spoken in a nearby area, which explains the Ladin influence.

The people say they speak Friulian, but the towns voted not to be included in the Friulian speaking region. The variety is not intelligible with the rest of Friulian. It is probably not intelligible with Ladin either. The name is Vajontino. The nearby village of Casso speaks some sort of Venetian, possibly Ladino Venetian. It is not really known what this lect is, whether it it is Friulian or Ladin at its base. It is probably a Friulian lect that came under serious Cadore Ladin influence.

In the town of Forni di Sotto on the border between the Comelico Ladin and the Friulian region, a dialect called Fornese is spoken that is often considered to be a part of Ladin. However, it is a cross between Carnico or Carnian Friulian and Cadore Ladin, especially Comelicano. It is said to be so different from the rest of Carnico that it is not even a part of that language. At the same time, it does not seem to be Ladin either.

Probably similar to Vajontino, but intelligibility between this lect and Vajontino is not known. Probably not intelligible with Cadore Ladin. This is basically a Friulian dialect that has undergone profound Cadore Ladin influence.

The Central Friulian of Gemona di Friuli in the north of the province has difficult intelligibility with Northern Friulian dialects spoken in Moggia Ugidense only 10-15 miles away.

In addition, Low Friulian has a hard time understanding Carnian Friulian in the far north.

 

Karaim

Karaim is two separate languages instead of one, Halich Karaim and Trakai Karaim.

 

Crimean Tatar

Crimean Tatar is two separate languages instead of one, Crimean Tatar and Turkish Crimean Tatar.

 

Gaguaz

Maritime Gaguaz and Balkan Gaguaz are two separate languages instead of one – see Ethnologue.

 

Basque

Basque is actually four separate languages instead of one- Standard Basque, Souletin, Vizcayan, and Gipuzcoan.

There is a unified Basque that everyone speaks so that they can understand each other.

However, there are cases where Guipuzcoan cannot understand Viscayan.

Souletin and Biscayan (France) do not understand each other.

Zuberoan or Souletin is spoken in France. It is not intelligible with the other Basque dialects. Souletin has influence from Béarnese, a dialect of Gascon (Occitan).

 

Yiddish

Yiddish is two separate languages instead of one, Western Yiddish and Eastern Yiddish.

 

Ladino

I am not sure Ladino is a separate language as it appears to be intelligible with Spanish.

 

Channel Islands French

This is actually four languages instead of one, Jerriais, Serquiais, North Guernesiais and South Guernesiais.

Jèrriais or Jersey French is a French language spoken on Jersey Island. Jèrriais has some intelligibility of Guernésiais. There are 2,874 speakers left. 15% of the population understands the language. The language is being revived. It is recognized as a regional language by the British government. Monolingual children were showing up at school as late as 30 years ago. There is a heavy English and some Breton influence.

Serquiais is a separate language spoken on Sark, descended from the Jèrriais of the colonists of the 1500’s. The remaining speakers are mostly elderly. It has suffered in recent years due to the influx of tax exiles. It is not inherently intelligible to Jèrriais or Guernésiais, nor with the Norman spoken on coast. There are only 20 speakers left. Serquiais is the most different of all compared to Standard French.

Guernésiais is spoken in Guernsey. It is recognized by the British government as a regional language. Guernésiais and Jèrriais have some intelligibility. There are 1,327 speakers. Speakers are mostly over age 64. 14% of the population have some understanding of the language. No intelligibility of Serquiais.

There are two Guernésiais languages, North Guernésiais, spoken in the lower parishes, and South Guernésiais, spoken in the upper parishes. There is poor intelligibility between them. Only one variety is being revived. Most Guernsey residents use some Guernésiais words in everyday speech without even knowing it. Speakers were evacuated to the mainland during WW2, and they quit speaking the language.

 

Arbëreshë Albanian

Arbëreshë Albanian is actually five separate languages instead of one, Sicilian Albanian, Calabrian Albanian, Central Mountain Albanian, Campo Marino Albanian and Molise Albanian.

Arbëreshë Albanian spoken in Italy is actually five separate languages, Sicilian Albanian, Calabrian Albanian, Central Mountain Albanian, Campo Marino Albanian and Molise Albanian. From a migration in the 1400’s-1500’s. Not intelligible with Standard Albanian. 80,000 speakers. Taught in some schools.

 

Arvanitika Albanian

Arvanitika Albanian is actually three separate languages instead of one.

Arvanitika Albanian is spoken in Greece. Thracean Arvanitika, Northwestern Arvanitika, South Central Arvanitika, dialects of Arvanitika, are actually separate languages. 50,000 speakers.

 

Greek

Greek is made up of at least seven different languages instead of one – Standard Greek, Cappodachian Greek, Cypriot Greek, Cretan Greek, Pontic Greek, Olympos Greek and Mariupolitan Greek.

Cappadocian Greek is not extinct at all as was previously thought. Thought extinct in the 1960’s, it was rediscovered in 2005.

Cypriot Greek and Cretan have marginal intelligibility with Standard Greek. Cretan has ~80% intelligibility and Cypriot ~60% with Standard Greek. Mariupolitan Greek is probably a dialect of Pontic Greek. See The Story of Pu: The Grammaticalization in Space and Time of a Modern Greek Complementizer by Nick Nicholas.

The dialect of Olympos, a village on the Greek island of Karpathos, is not even intelligible to other residents of the island.

Mariupolitan Greek is spoken in Mariupol in the Ukraine. This is a group of Greeks who moved into the area 200 years ago. Their Greek lect is still spoken to this day. It has a great deal of Turkic in it from Crimean Tatar so it is hard for Greeks to understand.

 

Turkmen

Turkmen and Trukhmen are two separate languages.

10 Comments

Filed under Altaic, Aragonese, Asturian, Balto-Slavic-Germanic, Basque, Celtic, Comparitive, Crimean Tatar, Europe, French, Frisian, Friulian, Gaelic, Gagauz, Germanic, Greek, Hellenic, Indo-European, Indo-Hittite, Indo-Irano-Armeno-Hellenic, Isolates, Italic, Italo-Celtic, Italo-Celtic-Tocharian, Ladin, Language Classification, Language Families, Leonese, Linguistics, Oghuz, Regional, Romansch, Turkic, Turkmen, West Frisian

Is Dravidian Related to Japanese?

Thirdeye writes:

The Tamil-Japonic connection isn’t quite as off the wall as one might think at first glance. There’s apparently a strong Andaman-Indonesian language connection. The convention of repeat plurals seems to have found its way to Japan. There’s also some similarity between the Finno-Ugric languages, which are Uralic outliers in a sea of Indo-European languages, and Dravidian languages that have a remnant in Pakistan. Contact between proto-Dravidian-Uralic and Altaic languages is a real possibility.

If Uralic is close to anything, it is close to Altaic and Indo-European and probably even closer to Chukto-Kamchatkan, Eskimo-Aleut, Yukaghir and Nivkhi. Yukaghir may actually be Uralic itself, or maybe the family is called “Uralic-Yukaghir.”

There is no connection between Austronesian (Indonesian) and the Andaman Islanders. Austronesian is indeed related to Thai though (Austro-Tai); in my opinion, this has been proven. If the Andaman languages are related to anything at all, they may be related to some Papuan languages and an isolate in Nepal called Nihali. A good case can be made connecting Nihali with some of the Papuan languages.

Typology is not that great of way to classify. Typology is areal and it spreads via convergence. What you are looking in search genetic relationship among languages more more than anything else is morphology. After that, a nice set of cognates.

There is probably no connection between Dravidian and Uralic in particular. Dravidian is outside of most everything in Eurasia. It if is close to anything, it might be close to Afro-Asiatic. There also looks to be a connection with Elamite.

Dravidian and Afro-Asiatic are probably older than the rest of the Eurasian languages, and they were located further to the south. Afro-Asiatic is very old, probably ~15,000 YBP.

39 Comments

Filed under Afroasiatic, Altaic, Andaman Islanders, Austro-Tai, Austronesian, Chukotko-Kamchatkan, Comparitive, Dravidian, Eskimo-Aleut, Indo-European, Indo-Hittite, Isolates, Japanese, Japonic, Language Classification, Language Families, Linguistics, Negritos, Paleosiberian, Race/Ethnicity, SE Asians, Tamil, Thai

Mutual Intelligiblility in the Romance Family (Reading)

Just a personal anecdote. I have been reading a lot of Italian lately (with the help of Google Translate). I already read Spanish fairly well. I have studied French, Portuguese and Italian, and I can read Portuguese and French to some extent, Portuguese better than French.

But I confess that I am quite lost with Italian. This is worse than French and worse than Portuguese. A couple weeks of wading through this stuff hasn’t made me understand it any better.

Portuguese and Galician are said to be so close that they are a single language. I don’t agree with that at all, but they are very close, much closer to Spanish and Portuguese. Intelligibility may be on the order of 80-90%.

Nevertheless, the other day I tried to read a journal article on Galician. It looked like it was written in Portuguese, and who would write in Galician anyway? I copied the whole thing into Google Translate and let it ride. I waded through the whole article, and I must say it was a disaster. I had a very hard time understanding many of the main points of the article.

Then I remembered that Translate works on Galician now, so I decided on an off chance that the guy may have written the piece in Galician for some nutty reason. I ran it through Translate using Galician as target. The article went through perfectly. You could understand the whole thing. It was then that I realized how far apart Portuguese and Galician really are.

You can try some other experiments.

Occitan is said to be nearly intelligible with Spanish or maybe even French, better if you know both. There’s no Google Translate for Occitan yet, but I had to deal with a lot of Occitan texts recently. I couldn’t make heads or tails of them despite by Romance reading background. So I tried using Translate to turn them into Spanish or French. French was a total wreck, and there was no point even bothering with that. Spanish was much better, but even that was a serious mess.

Now we come to the crux. Catalan and Occitan are said to be so close that they are nearly one language. Translate now works in Catalan. So I ran the Occitan texts through Translate using Catalan. The result was a serious mess, but you could at least understand some of what the Occitan texts were about. But no way on Earth were those the same languages.

People keep saying that if you can read Spanish, you can read Portuguese. It’s not true, but you can see why people say it. Try this. Take a Spanish text and run it through Translate using the Portuguese filter. Now take a Portuguese text and run it through Translate using the Spanish filter. See what a mess you end up with!

Despite the fact that I can read Spanish pretty well, I have tried to read texts in Aragonese, Asturian, Extremaduran, Leonese and Mirandese. These are so close that some even say that they are dialects of Spanish. But even if you read Spanish, you can’t really read any of those languages, and they are all separate languages, I assure you. Sure, you get some of it, but not enough, and it’s a very frustrating experience.

There are texts on the Net in something called Churro or Xurro. It’s a Valencian-Aragonese transitional dialect spoken around Teruel in Aragon in Spain. It also has a lot of Old Castillian and a ton of regular Castillian in it. Wikipedia will tell you it’s a Spanish dialect. Running it through both the Spanish and Catalan filters didn’t work and ended up with train wrecks. I doubt if Xurro is a dialect of either Catalan or Spanish. It’s probably a separate language.

There is another odd lect spoken in the same region called Chappurriau. It is spoken in Aguaviva in Teruel in the Franca Strip. The Catalans say these people speak Catalan, but the speakers say that their language is not Catalan. Intelligibility with Catalan is said to be good. So effectively this is a Catalan dialect.

I found some Chappurriau texts on the Net and ran them through Translate using Catalan as the output. The result was an unreadable disaster, and I couldn’t really figure out what they were saying. Then I tried the Spanish filter, and that was even worse. I am starting to think that maybe Chappurriau is a separate language as its speakers say and not a Catalan dialect after all.

I conclude that the ability to cross read across the Romance languages is much exaggerated.

Not only that, but many Romance microlanguages, transitional dialects and lects that are supposedly dialects of larger languages may actually be separate languages.

10 Comments

Filed under Applied, Aragonese, Asturian, Catalan, Comparitive, Dialectology, French, Galician, Indo-European, Indo-Hittite, Italian, Italic, Italo-Celtic, Italo-Celtic-Tocharian, Language Classification, Language Families, Language Learning, Leonese, Linguistics, Occitan, Portuguese, Romance, Sociolinguistics, Spanish

The Portuguese Language in Spain

Very interesting documentary about a variety of Galician-Portuguese lects spoken in Spain along the border with Portugal.

The first lect is Oliventino, spoken in the town of Olivenza in Bajadoz near the Portuguese border. It is an archaic Alentejan Portuguese dialect that dates back from 1801, when Portugal lost control of the area to Spain. Portugal continues to claim the town, but Spain won’t give it back. In the interim, Oliventino has been heavily influenced by Extremaduran Spanish. Standard Portuguese speakers are typically lost with Oliventino.

The language is now spoken by those older than 60 years old and is apparently not being passed on to children. There are few to no young speakers.

In Alcantara in Caceres and Bajadoz, several archaic Portuguese lects are spoken. They are close to Alentejan Portuguese. If they are close to Alentejan, then they may be difficult to understand for Portuguese speakers, as many Portuguese find the hard Alentejan lect difficult to follow.

In Herrera de Alcántara in Caceres, an ancient Portuguese from the 1200’s called Firrerenho is spoken. This area was made part of what is now Spain in 1297. Intelligibility with Portuguese is unknown.

In Cedillo and Valencia de Alcántara in Caceres, an archaic Portuguese dialect from the 1700’s called Cedilhero is spoken. Cedilhero is spoken here because Portuguese colonists were the first people to settle in the region at that time. Cedilhero is close to Alentejan Portuguese. The youngest speakers are in their 60’s. It may be difficult to understand for Portuguese speakers.

In the Xalima Valley in the towns of San Martín de Trevejo, Eljas and Valverde del Fresno, Fala is still spoken by almost all inhabitants. This is an Galician dialect that has been influenced by the Castillian and Extremaduran languages. Apparently the Galician settlers moved to the region long ago, got cut off from the rest of Galicia, and the lect underwent independent development. It’s fully intelligible with Galician, however, for some reason, the Fala speakers got subtitles in this documentary for Galician-language TV. It is probably not fully intelligible with Portuguese either.

A Portuguese lect is spoken in Almedilha in Salamanca Province. Little is known about this lect.

In the town of Calabor in Zamora, a Galician dialect with heavy Castillian and especially Senabrese Leonese influences is spoken. Little is known about this lect.

Map of the various lects is here.

If you speak Portuguese or Spanish, you might want to listen to these speakers and see if you can understand them. It’s better to cover up the subtitles though because that will help you understand better. Covering up the subtitles, I understood very little of what these folks were saying. But I only speak Spanish fairly well, and I don’t speak Portuguese at all, though I can read it a bit since I have studied it.

This video shows us that to some extent, categories like “Spanish” and “Portuguese” are more political than linguistic categories, since with a lot of these lects, it is hard to tell where one language ends or the other begins. It is also hard to put some of these lects into linguistic categories like “Spanish” or “Portuguese.”

2 Comments

Filed under Comparitive, Europe, European, Galician, History, Indo-European, Indo-Hittite, Italic, Italo-Celtic, Italo-Celtic-Tocharian, Language Classification, Language Families, Leonese, Linguistics, Portugal, Portuguese, Regional, Romance, Sociolinguistics, Spain, Spanish

Mutual Intelligibility of Languages in the Slavic Family

A more updated version of this paper with working hyperlinks can be found on Academia.edu here.

There is much nonsense said about the mutual intelligibility of the various languages in the Slavic family. It’s often said that all Slavic languages are mutually intelligible with each other. This is simply not the case.

Method: It is important to note that the percentages are in general only for oral intelligiblity and only in the case of a situation of a pure inherent intelligiblity test. An inherent pure inherent intelligibility test would involve a a speaker of Slavic lect A listening to a tape or video of a speaker of Slavic Lect A.

Written intelligibility is often very different from written intelligibility in that in a number of cases, it tends to be higher, often much higher, than oral intelligibility. Written intelligibility was only calculated for a number of language pairs. Most pairs have no figure for written intelligibility.

A number of native speakers of various Slavic lects were interviewed about mutual intelligibility, language/dialect confusion, the state of their language, its history and so on. In addition, a Net search was done of forums where speakers of Slavic languages were discussing how much of other Slavic languages they understand. These figures were tallied up for each pair of languages to be tabulated and were then all averaged together. Hence the figures are averages taken from statements by native speakers of the languages in question.

Complaints have been made that many of these percentages were simply wild guesses with no science behind them. This is not the case as all figures were derived from estimates by native speakers themselves, often a number of estimates averaged together.

True science would involve scientific intelligibility testing of Slavic language pairs. There has been no scientific intelligiblity testing of any Slavic language pairs that I am aware of. Obviously, in order to answer these questions in a scientific manner, scientific intelligibility testing needs to be done. The problem is that most linguists are not interested in scientific intelligibility testing of language pairs.

Conclusion:

Serbo-Croatian (Shtokavian) has 55% intelligibility of Macedonian (varies from 25-90%), 27% of Slovenian, 25% of Slovak, 20% of Ukrainian, 13% of oral Bulgarian and 25% of written Bulgarian, 10% of oral Russian and 22% of written Russian, 10% of Czech, and 5% of Polish.

Chakavian has 82% intelligibility of Kaikavian.

Kaikavian has 82% intelligibility of Chakavian.

Bulgarian has 80% intelligibility of Macedonian, 41% of Russian (varies from 7-75%) and 5% of Polish and Czech.

Macedonian has 65% oral and written intelligibility of Bulgarian.

Czech has 82% intelligibility of Slovak (varies from 70-95%), 12% of Polish and 5% of Russian and Bulgarian.

Polish has over 90% intelligibility of Sorbian, 22% of Silesian, 12% of Czech, 6% of Russian and 5% of Bulgarian.

Russian has 85% intelligibility of Rusyn, 74% of oral Belorussian and 85% of written Belorussian, 60% of Balachka, 50% of oral Ukrainian and 85% of written Ukrainian, 36% of oral Bulgarian (varies from 7-70%) and 80% of written Bulgarian, 38% of Polish, 30% of Slovak and oral Montenegrin and 50% of written Montenegrin, 12% of oral Serbo-Croatian and 25% of written Serbo-Croatian and 10% of Czech.

Belorussian has 80% intelligibility of Ukrainian and 55% of Polish.
Ukrainian has 82% intelligibility of Belorussian and Rusyn and 55% of Polish.

Slovak has 82% intelligibility of Czech (varies from 70-95%).

Eastern Slovak has 82% intelligibility of Rusyn and 72% of Ukrainian.
Saris Slovak has 85% intelligibility of Polish.

 

Reactions: So far there have been few reactions to the paper. However, a Croatian linguist has helped me write part of the Croatian section, and he felt that at least that part of the paper was accurate. A Serbian native speaker felt that the percentages for South Slavic seemed to be accurate.

A professor of Slavic Linguistics at a university in Bulgaria reviewed the paper and felt that the percentages were accurate. He was a member of a group of linguists who met periodically to discuss the field. He printed out the paper and showed it to his colleagues at the next meeting, and they spent some time discussing it. No professional linguist has yet to discount the percentages in this paper. The paper seems to have gone over well in the scientific linguistic community.

Now onto the discussion.

There is much nonsense floating around about Serbo-Croatian or Shtokavian. The main Shtokavian dialects of Croatian, Serbian, Montenegrin and Bosnian are mutually intelligible.

However, the Croatian macrolanguage has strange lects that Standard Croatian (Štokavian) cannot understand.

For instance, Čakavian Croatian is not intelligible with Standard Croatian. It consists of at least four major dialects, Ekavian Chakavian, spoken on the Istrian Peninsula, Ikavian Chakavian, spoken in southwestern Istria, the islands of Brač, Hvar, Vis, Korčula, and Šolta, the Pelješac Peninsula, the Dalmatian coast at Zadar and the outskirts of Split, and inland at Gacka, Middle Chakavian, which is Ikavian-Ekavian transitional, and Ijekavian Chakavian, spoken at the far southern end of the Chakavian language area on Lastovo Island, Janjina on the Pelješac Peninsula, and Bigova in the far south near the border with Montenegro.

Ekavian Chakavian has two branches – Buzet and Northern Chakavian. Buzet is actually transitional between Slovenian and Kaikavian. It was formerly thought to be a Slovenian dialect, but some now think it is more properly a Kaikavian dialect. There are some dialects around Buzet that seem to be the remains of old Kaikavian-Chakavian transitional dialects (Jembrigh 2014).

Ikavian Chakavian has two branches – Southwestern Istrian and Southern Chakavian. The latter is heavily mixed with Shtokavian.

Some reports say there is difficult intelligibility between Ekavian Chakavian in the north and Ikavian Chakavian in the far south, but speakers of Labin Ekavian in the far north say they can understand the Southeastern Istrian speech of the southern islands very well (Jembrigh 2014).

Čakavian differs from the other nearby Slavic lects spoken in the country due to the presence of many Italian words.

Chakavian actually has a written heritage, but it was mostly written down long ago. Writing in Chakavian started very early in the Middle Ages and began to slow down in the 1500’s when writing in Kaikavian began to rise. However, Chakavian magazines are published even today (Jembrigh 2014).

Although Chakavian is clearly a separate language from Shtokavian Croatian, in Croatia it is said that there is only one Croatian language, and that is Shtokavian Croatian. The idea is that the Kaikavian and Chakavian languages simply do not exist, though obviously they are both separate languages. Recently a Croatian linguist forwarded a proposal to formally recognize Chakavian as a separate language, but the famous Croatian Slavicist Radoslav Katičić argued with him about this and rejected the proposal on political, not linguistic grounds. This debate occurred only in Croatian linguistic circles and the public knows nothing about it (Jembrigh 2014).

Kaikavian Croatian, spoken in northwest Croatia and similar to Slovenian, is not intelligible with Standard Croatian.

Kaikavian is fairly uniform across its speech area, whereas Chakavian is more diverse (Jembrigh 2014).

In the 1500’s, Kaikavian began to be developed in a standard literary form. From the 1500’s to 1900, a large corpus of Kaikavian literature was written. Kaikavian was removed from public use after 1900, hence writing in the standard Kaikavian literary language was curtailed. Nevertheless, writing continues in various Kaikavian dialects, which still retain some connection to the old literary language, although some lexicon and grammar are going out (Jembrigh 2014).

However, Chakavian and Kaikavian have high, but not full, mutual intelligibility. Intelligibility between the two is estimated at 82%. Most Croatian linguists recognized Kaikavian as a separate language. However, any suggestions that Kaikavian is a separate language are censored on Croatian TV (Jembrigh 2014).

Nevertheless, the ISO has recently accepted a proposal from the Kaikavian Renaissance Association to list the Kaikavian literary language written from the 1500’s-1900 as a recognized language with an ISO code of kjv. The literary language itself is no longer written, but works written in it are still used in public for instance in dramas and church masses (Jembrigh 2014). This is heartening, although honestly, Kaikavian as an existing spoken lect also needs to be recognized as a living language instead of a dialect of “Croatian,” whatever that word means.

Furthermore, there is a dialect continuum between Kaikavian and Chakavian as there is between Kaikavian and Slovenian, and lects with a dialect continuum between them are always separate languages. There is an old Kaikavian-Chakavian dialect continuum of which little remains, although some of the old Kaikavian-Chakavian transitional dialects are still spoken (Jembrigh 2014).
Kaikavian differs from the other Slavic lects spoken in Croatia in that is has many Hungarian and German loans (Jembrigh 2014). Kaikavian is probably closer to Slovenian than it is to Chakavian.

Nevertheless, although intelligibility with Slovenian is high, Kaikavian lacks full intelligibility with Slovenian. Yet there is a dialect continuum between Slovenian and Kaikavian. Kaikavian, especially the Zagorje Kaikavian dialect around Zagreb, is close to the Shtajerska dialect of Slovene. However, leaving aside Kaikavian speakers, Croatians have poor intelligibility of Slovenian.

Molise Croatian is a Croatian language spoken in a few towns in Italy, such as Acquaviva Collecroce and two other towns. A different dialect is spoken in each town. Despite a lot of commonality between the dialects, the differences between them are significant. Intelligibility issues are not known. A koine is currently under development. The Croatians left Croatia and came to Italy from 1400-1500. The base of Molise Croatian was Shtokavian with an Ikavian accent and a heavy Chakavian base similar to what is now spoken as Southern Kaikavian Ikavian on the islands of Croatia. Molise Croatian is not intelligible with Standard Croatian.

Burgenland Croatian, spoken in Austria, is intelligible to Croatian speakers in Austria, Czech Republic, Slovakia and Hungary, but it has poor intelligibility with the Croatian spoken in Croatia.

Therefore, for the moment, there are five separate Croatian languages: Shtokavian Croatian, Kaikavian Croatian, Chakavian Croatian, Molise Croatian and Burgenland Croatian.

Serbian is a macrolanguage made up to two languages: Shtokavian Serbian and Torlak or Gorlak Serbian.

Shtokavian is simply the same Serbo-Croatian language that is also spoken in Croatia, Montenegro and Bosnia. It forms a single tongue and not separate languages as many insist. The claim for separate languages is based more on politics than on linguistic science.

Torlak Serbian is spoken in the south and southwest of Serbia and is transitional to Macedonian. It is not intelligible with Shtokavian, although this is controversial.

Torlakians are often said to speak Bulgarian, but this is not exactly the case. More properly, their speech is best seen as closer to Macedonian than to Bulgarian or Serbo-Croatian. The Serbo-Croatian vocabulary in both Macedonian and Torlakian is very similar, stemming from the political changes of 1912; whereas these words have changed more in Bulgarian.

The Torlakian spoken in the southwest is different. It is not really either Bulgarian or Serbo-Croatian but instead it is best said that they are speaking a mixed Bulgarian-Serbo-Croatian language. In the towns of Pirot and Vranje, it cannot be said that they speak Serbo-Croatian; instead they speak this Bulgarian-Serbo-Croatian mixed speech.

It’s also said that Serbo-Croatian can understand Bulgarian and Macedonian, but this is not true. However, the Torlak Serbians can understand both Macedonian and Bulgarian well, as this is a Serbo-Croatian dialect transitional to both languages. Intelligibility figures for Torlakian and Macedonian/Bulgarian are not known.

Intelligibility in the Slavic languages of the Balkans is much exaggerated.

Slovenian speakers find it hard to understand most of the other Yugoslavian lects except for Kaikavian Croatian. Serbo-Croatian intelligibility of Slovenian is 25-30%.

A lect called Čičarija Slovenian is spoken on the Istrian Peninsula in Slovenia just north of Croatia. This is a Chakavian-Slovenian transitional lect that is hard to categorize, but it is usually considered to be a Slovenian dialect.

Bulgarian and Macedonian can understand each other to a great degree (65-80%), but not completely. However, the Ser-Drama-Lagadin-Nevrokop dialect in northeastern Greece and southern Bulgaria and the Maleševo-Pirin dialect in eastern Macedonia and western Bulgaria are transitional between Bulgarian and Macedonian. The Aegean Macedonian dialects mostly spoken in Greece, such as the Lerinsko-Kostursko and Solunsko-Vodensko dialects, sound more Bulgarian than Macedonian.

Russian has a decent intelligibility with Bulgarian, possibly on the order of 50% (varies from 7-75%) but Bulgarian intelligibility of Russian seems lower. Nevertheless, Bulgarian-Russian intelligibility seems much exaggerated. Some Russians and Bulgarians say they understand almost nothing of the other language. Nevertheless, most Bulgarians over the age of 30-35 understand Russian well since studying Russian was mandatory under Communism.

However, Bulgarian-Russian written intelligibility is much higher. Bulgarian and Russian are close because the Ottoman rulers of Bulgaria would not allow printing in Bulgaria. Hence, many religious books were imported from Russia, and these books influenced Bulgarian. Russian influence only ended in 1878.

Serbo-Croatian and Bulgarian have 10-15% oral intelligibility, however, there are Bulgarian dialects that are transitional with Torlak Serbian. Written intelligibility is higher at 25%. Macedonian and Bulgarian would be much closer together except that in recent years, Macedonian has been heavily influenced by Serbo-Croatian, and Bulgarian has been heavily influenced by Russian.

This difference is because Bulgarian is not spoken the same way it is written like Serbo-Croatian is. However, Bulgarians claim to be able to understand Serbo-Croatian better than the other way around. There is a group of Bulgarians living in Serbia in the areas of Basilograd and Dimotrovograd who speak a Bulgarian-Serbian transitional dialect, and Serbs are able to understand these Bulgarians well.

Serbo-Croatian has variable intelligibility of Macedonian, averaging ~55%, while Nis Serbians have ~90% intelligibility with Macedonian. Part of the problem between Serbo-Croatian and Macedonian is that so many of the basic words – be, do, this, that, where – are different, however, much of the rest of the vocabulary is the same. Serbo-Croatian speakers can often learn to understand Macedonian well after some exposure.

Most Macedonians already are able to speak Serbo-Croatian well. This gives rise to claims of Macedonians being able to understand Serbo-Croatian very well, however, much of this may be due to bilingual learning. In fact, many Macedonians are switching away from the Macedonian language towards Serbo-Croatian.

The Macedonian spoken near the Serbian border is heavily influenced by Serbo-Croatian and is quite a bit different from the Macedonian spoken towards the center of Macedonia. One way to look at Macedonian is that it is a Serbo-Croatian-Bulgarian transitional lect. The intelligibility of Serbo-Croatian and Macedonian is highly controversial, and intelligibility studies are in order.
Croats say Macedonian is a complete mystery to them.

Czech and Polish are incomprehensible to Serbo-Croatian (Czech 10%, Polish 5%), but Serbo-Croatian has some limited comprehension of Slovak, on the order of 25%.

Serbo-Croatian and Russian have 10-15% intelligibility, if that, yet written intelligibility is higher at 25%.

Serbo-Croatian has only 20% intelligibility of Ukrainian.

Slovenians have a very hard time understanding Poles and Czechs and vice versa.

It’s often said that Czechs and Poles can understand each other, but this is not so. Much of the claimed intelligibility is simply bilingual learning. Czechs claim only 10-15% intelligibility of Polish.

The intelligibility of Polish and Russian is very low, on the order of 5-10%.
Polish is not intelligible with Kashubian, a language related to Polish spoken in the north of Poland, but figures are not known. Kashubian itself is a macrolanguage made up of two different languages, South Kashubian and North Kashubian, as the two have difficult intelligibility.

Silesian or Upper Silesian is also a separate language spoken in Poland, often thought to be halfway between Polish and Czech. It may have been split from Polish for up to 800 years, where it underwent heavy German influence. Polish lacks full intelligibility of Silesian, although this is controversial (see below). Some Poles say they find Silesian harder to understand than Belorussian or Slovak, which implies intelligibility of 20-25%.

The more German the Silesian dialect is, the harder it is for Poles to understand. In recent years, many of the German words are falling out of use and being replaced by Polish words, especially by young people. Poles who know German and Old Polish can understand Silesian quite well due to the Germanisms and the presence of many older Polish words, but Poles who speak only Polish have a hard time with Silesian.

Many Poles insist that Silesian is a Polish dialect, but this is based more on politics than reality. In fact, people in the north of Poland regard Silesian as incomprehensible. 40% of Silesian vocabulary is different from Polish, mostly Germanisms. The German influence is more prominent in the west; Polish influence is greater in the east. Many Silesian speakers now speak a watered down version of Silesian which is more properly seen as a Polish dialect with some Silesian words. Pure Silesian appears to be a dying language.

Silesian itself appears to be a macrolanguage as it is more than one language since as Opole Silesian speakers cannot understand Katowice Silesian, so Opole Silesian and Katowice Silesian are two different languages.

Cieszyn Silesian or Ponaszymu is a language closely related to Silesian spoken in Czechoslovakia in the far northeast of the country near the Polish and Slovak borders. It differs from the rest of Silesian in that it has undergone heavy Czech influence. Some say it is a part of Czech, but more likely it is a part of Polish like Silesian.

People observing conversation between Cieszyn Silesian and Upper Silesian report that they have a hard time understanding each other. Cieszyn Silesian speakers strongly reject the notion that they speak the same language as Upper Silesians. Ponaszymu also has many Germanisms which have been falling out of use lately, replaced by their Czech equivalents. Ponaszymu appears to lack full intelligibility with Czech. In fact, some say the intelligibility between the two is near zero.

Lach is a Czech-Polish transitional lect with a close relationship with Cieszyn Silesian. However, it appears to be a separate language, as Lach is not even intelligible within itself. Instead Eastern Lach and Western Lach have difficult intelligiblity and are separate languages so Lach itself is a macrolanguage. Lach is not fully intelligible with Czech; indeed, the differences between Lach and Czech appear to be greater than the differences between Silesian and Polish, despite the fact that Lach has been heavily leveling into Moravian Czech for the last 100 years.

Czechs say Lach is a part of Czech, and Poles say Lach is a part of Polish. The standard view among linguists seems to be that Lach is a part of Czech. However, another view is that Lach is indeed Lechitic, albeit with strong Czech influence.

Polish has excellent intelligibility of Upper Sorbian and Lower Sorbian, possibly over 90%. Furthermore, Upper and Lower Sorbian have over 90% intelligibility of each other, so instead of being two different languages, they are dialects of a single tongue, Sorbian.

It is often said that Ukrainian and Russian are intelligible with each other or even that they are the same language (a view perpetuated by Russian nationalists). It is not true at all that Ukrainian and Russian are mutually intelligible, as Russian only has 50% intelligibility of Ukrainian. For example, all Russian shows get subtitles on Ukrainian TV. Yet some say that the subtitles are simply put on as a political move due to Ukraine’s puristic language policy. Ukrainian and Russian only have 60% lexical similarity. Polish and Ukrainian have higher lexical similarity at 72%, and Ukrainian intelligibility of Polish is ~50%+.

However, there are dialects in between Ukrainian and Russian such as the Eastern Polissian and Slobozhan dialects of Ukrainian that are intelligible with both languages. Complicating the picture is the fact that many Ukrainians are bilingual and speak Russian also. Ukrainians can understand Russian much better than the other way around. Nevertheless Ukrainian intelligibility of Russian is hard to calculate because presently there are few Ukrainians in Ukraine who do not speak Russian. Most of the Ukrainian speakers who do not speak Russian are in Canada at the moment.

In addition, the Slobozhan dialects of Ukrainian and Russian such as (Slobozhan Ukrainian and Slobozhan Russian) spoken in the Kantemirovka (Voronezhskaya Oblast, Russia), and Kuban Russian or Balachka spoken in the Kuban area right over the eastern border of Ukraine are very close to each other. Slobozhan Russian can also be called Kuban Russian or Balachka.

It is best seen as a Ukrainian dialect spoken in Russia – specifically, it is markedly similar to the Poltavian dialect of Ukrainian spoken in Poltava in Central Ukraine. Although the standard view is that Balachka is a Ukrainian dialect, some linguists say that it is actually a separate language closely related to Ukrainian. An academic paper has been published making the case for a separate Balachka language. In addition, Balachka language associations believe it is a separate language. Intelligibility between Balachka and Ukrainian is not known. Russian only has 60% intelligibility of Balachka.

However, Balachka is dying out and is now spoken only by a few old people. Most people in the region speak Russian with a few Ukrainian words.

Slobozhan Russian is very close to Ukrainian, closer to Ukrainian than it is to Russian, and Slobozhan Ukrainian is very close to Russian, closer to Russian than to Ukrainian. Slobozhan Ukrainian speakers in this region find it easier to understand their Russian neighbors than the Upper Dniestrian Ukrainian spoken in the far west in the countryside around Lvov. Upper Dniestrian is influenced by German and Polish.

The Russian language in the Ukraine has been declining recently mostly because since independence, the authorities have striven to make the new Ukrainian as far away from Russian as possible by adopting  the Kharkiv Standard adopted in 1927 and jettisoning the 1932 Standard which brought Ukrainian more in line with Russian. For instance, in 1932, Ukrainian g was eliminated from the alphabet in order to make Ukrainian h correspond perfectly with Russian g. After 1991, the g returned to Ukrainian. Hence, Russians understand the colloquial Ukrainian spoken in the countryside pretty well, but they understand the modern standard heard on TV much less. This is because colloquial Ukrainian is closer to the Ukrainian spoken in the Soviet era, which had huge Russian influence.

The intelligibility of Belorussian with both Ukrainian and Russian is a source of controversy. On the one hand, Belorussian has some dialects that are intelligible with some dialects of both Russian and Ukrainian. For instance, West Palesian is a transitional Belarussian dialect to Ukrainian. Some say that West Palesian is actually a separate language, but the majority of Belarussian linguists say it is a dialect of Belarussian (Mezentseva 2014). Belarussian and Ukrainian have 85% similar vocabulary.

Nevertheless, Russian has high intelligibility of Belarussian, on the order of 75%. Belarussian is nonetheless a separate language from both Ukrainian and Russian.

From some reason, the Hutsul, Lemko, Boiko dialects of the Rusyn language are much more comprehensible to Russians than Standard Ukrainian is. Intelligibility may be 85%. Rusyn-Ukrainian intelligibility is described as similar to Czech-Slovak intelligibility – therefore, the intelligibility between Rusyn and Ukrainian is around 82%.

Rusyn-Ukrainian intelligibility is said to be the same as Ukrainian-Belorussian intelligibility, so Ukrainian and Belarussian also have ~82% intelligibility. At least the Lemko dialect of Rusyn has only marginal intelligibility with Ukrainian. Lemko is spoken heavily in Poland, and it differs from Standard Rusyn in that it has a lot of Polish vocabulary, whereas Standard Rusyn has more influences from Hungarian and Romanian.

The Rusyn language is composed of 50% Slovak roots and 50% Ukrainian roots, so some difficult intelligibility with Ukrainian might be expected. It has also been described as a transitional dialect between Polish and Slovak. Eastern Slovak has ~80% intelligibility of Rusyn.

Pannonian Rusyn is spoken by a group of Rusyns who migrated to northwestern Serbia (the Bachka region in Vojvodina province) and Eastern Croatia from eastern Slovakia and western Ukraine 250 years ago. Pannonian Rusyn is actually a part of Slovak, and Rusyn proper is really a part of Ukrainian. Pannonian Rusyn lacks full intelligibility of Rusyn proper. Not only that, but it is not even fully intelligible with the Eastern Slovak that it resembles most.

The intelligibility of Czech and Slovak is much exaggerated. It is true that Western Slovak dialects can understand Czech well, but Central Slovak, Eastern Slovak and Extraslovakian Slovak dialects cannot.

It is also said that West Slovak (Bratislava) cannot understand East Slovak, so Slovak may actually two different languages, but this is controversial. Western Slovak speakers say Eastern Slovak sounds idiotic and ridiculous, and some words are different, but other than that, they can basically understand it. Other Western Slovak speakers (Bratislava) say that Eastern Slovak (Kosice) is hard to understand. Bratislava speakers say that Kosice speech sounds 1/2 Slovak and 1/2 Ukrainian and uses many odd and unfamiliar words. Intelligibility testing between East and West Slovak would seem to be in order.

Much of the claimed intelligibility between Czech and Slovak was simply bilingual learning. Since the breakup, young Czechs and Slovaks understand each other worse since they have less contact with each other. In the former Czechoslovakia, everything was 50-50 bilingual – media, literature, etc. Since then, Slovak has been disappearing from the Czech Republic, so the younger people don’t understand Slovak so well.

Intelligibility of Czech and Slovak is 82% and varies from 70-95% depending on the dialect. Intelligibility problems are mostly on the Czech end because they don’t bother to learn Slovak while many Slovaks learn Czech. There is as much Czech literature and media as Slovak literature and media in Slovakia, and many Slovaks study at Czech universities. When there, they have to pass a language test. Czechs hardly ever study at Slovak universities.

Czechs see Slovaks as country bumpkins – backwards and folksy but optimistic, outgoing and friendly. Czechs are more urbane. The written languages differ much more than the spoken ones.

The languages really split about 1,000 years ago, but written Slovak was based on written Czech, and there was a lot of interlingual communication. A Moravian Czech speaker (Eastern Czech) and a Bratislavan Slovak (Western Slovak) speaker understand each other very well. They are essentially speaking the same language.

However, in recent years, there has also been quite a bit of bilingual learning. Young Czechs and Slovaks talk to each other a lot via the Internet. There are also some TV shows that show Czech and Slovak contestants untranslated (like in Sweden where Norwegian comics perform untranslated), and most people seem to understand these shows.

All foreign movies in both the Czech Republic and Slovakia are translated into Czech, not Slovak. Far Northeastern Slovak (Saris Slovak) near the Polish border is close to Polish and Ukrainian. Intelligibility data for Saris Slovak and Ukrainian is not known. Saris Slovak has high but not complete intelligibility of Polish, possibly 85%. Eastern Slovak may have 72% intelligibility of Ukrainian.

Southern Slovak on the Hungarian border has a harder time understanding Polish because they do not hear it much. This implies that some of the high intelligibility between Slovak and Polish may be due to bilingual learning on the part of Slovaks.

Russian has low intelligibility with Czech and Slovak, maybe 30%.

References

Jembrigh, Mario. Croatian linguist. December 2014. Personal communication.

Mezentseva, Inna. English professor. Vitebsk State University. Vitebsk, Belarus. December 2014. Personal communication.

If you think this website is valuable to you, please consider a contribution to support the continuation of the site. Donations are the only thing that keep the site operating.

70 Comments

Filed under Applied, Balto-Slavic, Balto-Slavic-Germanic, Bulgarian language, Comparitive, Czech, Dialectology, Indo-European, Indo-Hittite, Language Classification, Language Families, Language Learning, Linguistics, Multilingualism, Polish, Russian, Serbo-Croatian, Slavic, Slovak, Sociolinguistics