Daily Archives: March 13, 2009

Questions About the German Language Classification

Updated March 28. In response to some criticisms, I agreed with Professor den Besten in part by removing Limburgs and South Gulderish from Macro-German. I put them in Macro-Dutch, which I will hopefully do a post on in the future. With languages like these, it’s pretty hard to tell where Dutch ends and German begins.I also split Veluws into East Veluws, which stayed in German, and West Veluws, which moved over to Dutch.

In response to the German language post, Hans den Besten, a top Dutch Germanist linguist, first agrees broadly with my classification, but then makes a critique about the scope of Macro-German:

But what is lacking is a definition of what counts as a German dialect. This may differ from case to case. The Low Saxon dialects of the Netherlands may have been included because they are an extension of Low Saxon Niederdeutsch, even though they are under the roof of Schrift-Dutch rather than Schrift-Deutsch. That Limburgish has been included may be due to a couple of High German Low German isoglosses running through that area. But if even South Guelderish and Veluws are included (in so far as I know Veluws is Franconian but for the eastern strip along the Overijssel) I don’t see any reason why Brabantian-East Flemish, West Flemish, Zealandic and Hollandic should be excluded. Furthermore, the exclusion of West and North Frisian also needs some justification. Referring to Anglo-Frisian isoglosses will not be enough because unlike English these languages share a lot of syntax with the rest of Continental West Germanic, since they are SOV cum Verb Second. And if we try to set the two Frisians apart by referring to the idiosyncrasies of the verbal cluster (no Ersatz-Infinitiv, strict head-final order but for the Frisian “kortstaarten” litt. ’short tails’) then — maybe — Gronings should be taken out of your list of German dialects/languages because it shares a lot of verbal cluster syntax with West Frisian.

Hans poses a number of interesting questions about the borders of Macro-German. Let’s look at them one by one.

The Dutch dialects and languages are on this page. As Brabantian-East Flemish, West Flemish, Zealandic and Hollandic are listed as Dutch on that page, I am treating them as Macro-Dutch and not Macro-German.

As far as Gronings goes, I am treating it as Macro-German due to Ethnologue’s grouping here. As you can see, Low Saxon is treated as separate from Macro-Dutch there and also in my treatment. To me, it is better to see Macro-Dutch as something equal to Low Franconian and to put Low Saxon in with Macro-German.

In terms of Frisian, Ethnologue places it outside of Macro-German altogether along with English in the West Germanic Family. Keep in mind that Germanic and German are not synonymous. After all, Swedish is Germanic but not German.

As far as Veluws, Ethnologue sees it as Low Saxon and not as Low Franconian.

Ethnologue has no listing for South Guelderish or for anything similar. South Guelderish is a very confusing classification, that, if anything, looks like a sister language to Limburgs, if not a part of Limburgs itself.

After conferring with a Dutch linguistics professor, I have now decided to remove Limburgs, South Guelderish and the related Low Rhenish lects spoken across the border in Germany from Macro-German.

I have put them in Macro-Dutch, and hopefully will redo the classification of Dutch soon in a separate post. As far as two languages, one called Southeast Limburgs/Aachen, and another called Low Dietsch, they have stayed in Macro-German, because my professor friend described at least the SE Limburgs lects as Ripuarian, with Ripuarian automatically going to Macro-German.

The languages described above are collectively known as Meuse-Rhenish, and to be honest they are transitional between Low Franconian (Dutch) and Low Saxon (German).

We run into a situation like what one finds in Alsace-Lorraine, where curious travelers said that, “Some people speak German, some people speak French,  and others seem to speak languages that are neither French nor German.” Substitute “Dutch” for “French” in the above situation and you have a pretty good portrayal of the confusing language situation on the Dutch-German border.

These languages are confusing because really they are  transitional languages between Dutch and German.

Hans also raises some questions about Dutch Low Saxon. I have decided to throw all of Dutch Low Saxon into Macro-German, as this seems to be the consensus these days. Hans says that Veluws is Franconian, but I am not so sure. My professor said Veluws is regarded as marginally Low Saxon. I am going to hold to my guns and keep Groningen in Low Saxon. A friend of mine remarked today that in some ways, “Dutch” in terms of linguistics is almost a political construct.

A very tricky language to classify was East Frisian Low Saxon.  It’s clearly not Low Franconian (Dutch) but neither is it Low Saxon (Low German). So what is it? It’s really in its own category, which is something like Friso-Saxon, a Low German language with a heavy Frisian base. I put it in Low German because that is where most folks seem to be tossing it.

There have also been criticisms that my treatment was overbroad in scope. If anything, it is conservative.

There may be up to 40 separate languages within Swiss German. The Ripuarian lects are so diverse that 150 of them are different enough that they have had separate dictionaries written for them. Of those 150, about 120 of those have serious differences in lexicon, phonology and morphology. Speakers of Ripuarian frequently refer to “150 Ripuarian languages.” There are probably a number of separate languages within Tyrolean South Bavarian.

However, barring solid documentation for these separate apparent separate languages, it’s not reasonable to split them off yet.

The question comes up about where you split a dialect chain. Indeed, this is one of the trying questions of Linguistics. Once you get to the point where there are some dialects in Lect A that cannot talk to some dialects in Lect B, you have yourself as dialect chain. Hence, Czech and Slovak are split even though Eastern Czech can understand Western Slovak, because Western Czech can’t understand Eastern Slovak.

A commenter points out that the problem of doing this is you are going to end up with separate languages that have communicable dialects. Indeed this is true, but it’s the case in many world languages that they have dialects that communicate with dialects of neighboring tongues.

There is a dialect chain running from Belgium to Austria where each village can talk to the next. There is another dialect chain running from Portugal to Sicily. There is yet another running from about Turkey way over to Siberia.

A greater problem with dialect chains is refusing to split them at some point into separate tongues, because then you have one language with noncommunicative dialects which makes less sense than separate languages with communicative dialects.

I use the term “lect” to mean something that may be either a dialect or a language, or some speech form that we can’t figure out if it is a dialect or a language.

Finally, the perennial question of intelligibility came up. You often read that this or that lect can easily communicate with some other lect, that they are mutually intelligible, more or less mutually intelligible, etc.

It is commonly noted that, for instance, Dutch and Afrikaans are highly mutually intelligible. In fact, my investigation revealed that Dutch speakers say that they have ~80% intelligibility of Afrikaans. That’s probably about what it is.

80% is not mutually intelligible. It means separate languages. The problem with 80% intelligibility is that it is just enough lack of communication to cause what I would call “significant disruption in communication.” This gets more important as we discuss more high-level things. It is almost impossible to discuss complicated and important topics well with less than 90% intelligibility. That’s just enough disruption to throw a serious monkey wrench into things.

At the other end of the spectrum, when we are discussing, say, the weather, much lower levels of intelligibility may be tolerated and we are still able to get our point across.

Some of these determinations were made simply by intuition. On this page, you can look at many different translations, often in Low German, of a single text. Looking at that text in different lects, it become clear which are dialects and which are so different that they may be languages.

Let us take a look here: Hamburgisch, Ollands and
Oldenburg, three Low Saxon lects:




Quick observation shows us that Hamburgs and Ollands obviously must be dialects of one tongue. Yet Oldenburg seems so different that it seems dubious that Oldenburg speakers can converse with the others at 90%+ intelligibility.

Conclusion by “simple observation” (I prefer to call it “direct observation”) was criticized as somehow unscientific. However, direct observation is a well-known scientific technique involved in the hypothesis – testing – conclusion dance of the empirical method.

Keep in mind that much of science is simply observational, hunches, intuition, etc. Francis Crick visualized the double helix structure of DNA via sheer intuition while tripping on LSD.

Sir William Jones famous discovery of Indo-European certainly was simply obervational and intuitive also.


Filed under Dialectology, Europe, German, Germanic, Germany, Icelandic, Indo-European, Indo-Hittite, Language Classification, Language Families, Linguistics

What’s The Hardest Language To Learn?

It’s actually an interesting question. For English speakers anyway, results from the US Army School of Languages in Monterey, California, showed that Chinese, Japanese and Korean are the three hardest.

I believe that Chinese was the worst of all. The main problem with Chinese is that it has an ungodly and unwieldy writing system that is very difficult to learn.

Literate Chinese are supposed to know 3,000 characters. The government has set a goal of 80% of Chinese learning 3,000 characters or more. That’s never going to happen. At the bottom end, you are supposed to know 1,000 characters, but a lot of Chinese don’t even know that many. A very literate Chinese should know 10,000 characters, but I doubt if 1% know that many. There are about 30,000 Chinese characters, but I doubt if anyone knows all of them.

Many rural Chinese learned to read and write but then forgot it once they got to be about 40 or so. They are in the fields all day, and they don’t read and write much, so they just forget.

There is a good argument out there that the Chinese writing system is so unwieldy that it actually harms Chinese competitiveness and the economy. Economic calculations have actually been made of how much damage this orthographic system costs.

When the Communists took over, they introduced a pinyin system in addition to simplifying the Chinese characters. The simplification was a great idea, but the language was still very hard to read and write. The pinyin Romanization is only one of many that have been introduced over time. I don’t think that any of them have worked out well. It is said that the Chinese language being written enabled ~3,500 Chinese dialects to all speak to each other, but now that everyone is learning Putonghua anyway, why not just write Putonghua in pinyin?

There is a problem with writing the other major Chinese languages using the system designed for Mandarin. It is not so easy to write Cantonese, Min, Hakka,Wu, Xiang and Hui using the Mandarin character set. One problem is that Cantonese for instance has quite a few words that lack Chinese characters. To some extent this is true with the other languages also. Min has a Romanization scheme, but most Min speakers don’t know how to use it. The whole idea of writing something other than Putonghua using the traditional Chinese character set introduces all sorts of minefields.

I believe that alternate character sets or additions to the character sets have been introduced for some of the other languages. Due to tones, it gets hard to write Chinese using a Romanization scheme easily, since you have to put all sorts of diacritics on the letters in the isolating language of Chinese.

Something similar is happening with Japanese, which after all uses a character set borrowed from the Chinese.

After the war there were attempts to introduce a Romanization system to Japanese. However, by 1960 or so, Japanese nationalism had returned to Japan, at least to the extent that this Romanization system was seen as a Western imperial affront to the mystical Japanese super-culture. Since then, things have only been on the downswing. Japan actually uses three different kinds of symbol sets, Katakana, Kanji and Haragana. There are Chinese characters mixed into all of this stuff.

Even beyond that, the Japanese language, though not tonal, is mind-bafflingly complex. There are rules, but then there are tons of exceptions to those rules, but the exceptions are not really taught. A native speaker just more or less unconsciously figures out the rules and the exceptions in the course of growing up Japanese.

Anyway, this rule-exception mix is so chaotic and senseless that it’s almost impossible to codify it somehow and then teach it to non-native speakers. The Japanese also have strange concepts like using different counting systems when counting different types of things.

As if speaking it alone were not complicated enough, there is that writing system. Once again, reasonable people have figured out that the convoluted logographics costs the Japanese economy quite a bit per year. But the logographics is seen by the hyperethnocentric Japanese now as a mystical part of their Super-race and Super-culture, and they will not allow anyone to lay hands on it, especially not pesky Western imperialists who occupied their land tried to shove the West down their throats.

The Korean logographic system, Hangul, is actually excellent, and is one of the most logical alphabets ever devised, or so say scholars. It uses a limited character set like English. However, in some way that I am not familiar with, the Korean language is not easy at all for English speakers to learn. Whatever it is, it is not the writing system.

In terms of European languages, Finnish and Hungarian are said to be Godawful languages to learn. British diplomats who were placed all over Europe were notorious for refusing posts in Hungary due to the difficulty in learning the language. Finnish of course is one of the most case-marked major languages on Earth, with 14-15 different cases. Coming from a language like English that does not mark case very much, that must be awfully hard.

In a recent discussion on the Internet, the following languages were thought to be the hardest to learn:

- Navajo
- Tsez
- !Kung (language family)
- Pirahã
- Basque
- Comanche
- Archi
- Etruscan
- Northwest/Northeast Caucasian (language family)
- Aboriginal Australian (language family)

Navajo is a US Amerindian language that is the widest spoken of the Indian languages, with 120,000 speakers. However, there are reportedly over 900 different ways to conjugate a verb! Not to mention 7 different verbal modes, 4 classifier prefixes (whatever those are), 18 kinds of aspect, 25 different kind of verbal pronominal prefixes that mark subject and object at once and verbs that can take up 11 different verbal prefixes at once. Don’t forget hardly any nouns and a universe of verbs – most things we use a noun to describe, Navajo uses a verb – go figure.

The famous US code-talkers of WW2 spoke Navajo, but they also threw in non-Navajo code in there in case the Japanese broke the Navajo language of the code. Anyway, by the end of the war, the Japanese had still not figured out the Navajo behind the code language.

Nowadays with computers, I think any linguistically based code could be broken pretty easily by an advanced society with access to linguistic texts and high powered computers.

Kung is a Southwest African Khoisan language family spoken by Bushmen. It is a click-based language, with many of the sounds being made by clicking the tongue into your mouth in various ways. Although I think that click languages are some of our first languages (Think about it, that’s probably the first step to make in making up a language), that doesn’t mean that they are easier to learn.

Those have to be some of the hardest languages around to learn. As the Khoisan are said to have the lowest IQ’s on Earth, IQ cannot possibly be related to complexity of language.

Pirahã, an Amerindian language getting a lot of fame nowadays due to its grammatical paradoxes, is also said to be very hard to learn. David Everett, the linguist who has written the most about it, is one of the first outsiders to have actually mastered the languages.

Several other anthropologists went to live with them for a while, but they never seemed to master the language very well. Spanish priests made several attempts to learn the language in the past 200 years, but they never got much of anywhere either. Much of it is spoken in strange whistle-like sounds, in fact, it is can be whistled, put into music, or hummed, and often is. So much for those stupid primitives.

Basque is a notoriously difficult language to learn. It is said to be a language isolate, but I think it is related to languages spoken in the Caucasus.

Etruscan is extinct, but what we know of it suggests that if it were alive today, it would be very hard to learn. So much for those stupid ancients, huh?

Northwest and Northeast Caucasian languages are spoken in the Caucasus and are indeed some of the most complex languages on Earth. Fortunately, few people bother to try to learn them. Archi (1,000 speakers) and Tsez (15,000 speakers) are two of these Caucasian languages.

The alphabet of Archi looks daunting enough. Archi supposedly has something like 1.5 million possible noun declensions. Someone tell me how you make a spellchecker for a language like this?

Tsez, with 64 (!) cases, ergative typology, 20 thematic suffixes that are often difficult for even native speakers to use, 4 different, often non-transparent, noun classes, no 3rd person pronouns, 6 different kinds of aspect, 4 different kinds of mood, 18 different kinds of coverbs (Whatever those are), 2 different numeral forms, many different ways of conjugating a verb, many different ways of making up new nouns and verbs, many clitics that can be attached to any form of speech, on and on, is so crazy that it makes you almost glad to hear the language is endangered.

Three of the noun classes cover inanimate objects, so you can see why it is non-transparent.

Comanche is an Amerindian language that was considered by the US military as a code language in WW2 due to its mad complexity before Navajo was chosen for use in the Pacific. However, 17 young Comanche men were chosen as code-talkers on the Western front, and the code was never broken by the master-race brains of the Germans.

Aboriginal Australian languages are also said to be insanely complex, and next to the Khoisan, Aborigines have the second lowest IQ’s on Earth. What this shows us is that language is an essential part of the human tapestry, and you certainly do not need an high IQ to create a wildly complex language that would baffle even many linguists.

It also suggests that !Khoisan and Aborigines, even with IQ’s from 54-62, are not “retarded” in the same way that Westerners with 54-62 IQ’s would be. Complicating matters further, scholars who have worked with the !Khoisan have said that they did not get the impression that these people were unintelligent.

Even more mystifying is that the ancestors of !Khoisan, the Strandwalkers who lived on the beaches of SW Africa some 3,000 years ago, had the largest brains ever recorded in modern people. All the same, they never created Rome either. This suggests that brain size may not be particularly relevant to intelligence and capability of civilization as the White racists insist.

In fact, there seems to be a disconnect between the complexity of a language and the level of civilization of its speakers. The less developed a people are, often the more complex of a language they have. We linguists think that many primitive peoples do not have complicated or busy lives, and they have a lot of time on their hands. They don’t have computers or cell phones to mess around with, so they substitute language.

Humans are inherently highly intelligent – even Aborigines with 62 IQ’s – and they make up insanely complex languages so they can play games with language as a form of creativity and a way to exercise their brains.

As society gets more complex, a complicated language gets more and more in the way of doing things efficiently and even starts to hurt the economy, as the logographic issues of the NE Asians described above suggests.


Filed under Linguistics